Mixedbread JSON Format
This format is for specialized use cases where you need complete control over chunking. For most workflows, use the standard file upload — it handles chunking, metadata generation, and indexing automatically.
The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.
Schema CheckerLink to section
Validate your mxjson files before uploading. Drop a .mxjson or .mxjsonl file below to check it against the current schema. You can also download the JSON Schema.
Drop your file here or click to browse
.mxjson or .mxjsonl up to 10MB
File FormatsLink to section
| Format | Extension | MIME Type | Structure |
|---|---|---|---|
| JSON | .mxjson | application/vnd-mxbai.chunks-json | Array of chunk objects |
| JSON Lines | .mxjsonl | application/vnd-mxbai.chunks-jsonl | One chunk object per line |
Chunk StructureLink to section
Each chunk in an mxjson file follows the same structure as Store Chunks. The type field determines which properties are required:
text- Text contentimage_url- Image referenceaudio_url- Audio referencevideo_url- Video reference
Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.
Input PropertiesLink to section
When creating chunks for mxjson files, these properties control ingestion:
| Property | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Chunk type: text, image_url, audio_url, video_url |
mime_type | string | No | Content MIME type (defaults per chunk type) |
chunk_index | integer | No | Position in file. Auto-generated sequentially if omitted |
generated_metadata | object | No | Typed metadata structure — see Generated Metadata |
Text ChunksLink to section
| Property | Type | Required |
|---|---|---|
text | string | Yes (1-65536 characters) |
offset | integer | No (default: 0) |
{
"type": "text",
"text": "Sourdough fermentation relies on wild yeast and lactic acid bacteria.",
"mime_type": "text/plain",
"generated_metadata": {
"type": "text",
"file_type": "text/plain",
"language": "en",
"word_count": 10,
"file_size": 512
}
}Image ChunksLink to section
| Property | Type | Required |
|---|---|---|
image_url.url | string | Yes (HTTP URL or data URI) |
{
"type": "image_url",
"image_url": {
"url": "https://bakery.example.com/images/crumb-structure.png"
},
"mime_type": "image/png",
"generated_metadata": {
"type": "image",
"file_type": "image/png",
"file_size": 204800,
"width": 1200,
"height": 800
}
}Audio ChunksLink to section
| Property | Type | Required |
|---|---|---|
audio_url.url | string | Yes (HTTP URL or data URI) |
sampling_rate | integer | Yes |
{
"type": "audio_url",
"audio_url": {
"url": "https://bakery.example.com/audio/kneading-tutorial.mp3"
},
"mime_type": "audio/mpeg",
"sampling_rate": 44100,
"generated_metadata": {
"type": "audio",
"file_type": "audio/mpeg",
"file_size": 5242880,
"total_duration_seconds": 180.5,
"sample_rate": 44100,
"channels": 2,
"audio_format": 1
}
}Video ChunksLink to section
| Property | Type | Required |
|---|---|---|
video_url.url | string | Yes (HTTP URL or data URI) |
{
"type": "video_url",
"video_url": {
"url": "https://bakery.example.com/video/shaping-boule.mp4"
},
"mime_type": "video/mp4",
"generated_metadata": {
"type": "video",
"file_type": "video/mp4",
"file_size": 10485760,
"total_duration_seconds": 120.0,
"fps": 30.0,
"width": 1920,
"height": 1080,
"frame_count": 3600,
"has_audio_stream": true
}
}Chunk MetadataLink to section
Each chunk can optionally include generated_metadata — a typed structure that follows a fixed format discriminated by a type field. When files are processed by the system, this metadata is automatically generated. When using mxjson, you can provide it yourself to match the same structure. Only the type field is required within the metadata object; all other fields are optional. You can also add custom fields beyond the typed ones — any additional key-value pairs are preserved alongside the standard fields.
See Generated Metadata for the full reference of all metadata types and their fields.
{
"type": "text",
"text": "Autolyse is a rest period after mixing flour and water.",
"mime_type": "text/plain",
"generated_metadata": {
"type": "text",
"file_type": "text/plain",
"language": "en",
"word_count": 10,
"file_size": 2048
}
}File-level metadata (set during upload) applies to all chunks and participates in contextualization if enabled on the Store.
Complete ExampleLink to section
JSON Format (.mxjson)Link to section
[
{
"type": "text",
"text": "Baguette Shaping Guide",
"mime_type": "text/plain",
"chunk_index": 0,
"generated_metadata": {
"type": "text",
"file_type": "text/plain",
"language": "en",
"word_count": 3,
"file_size": 1024
}
},
{
"type": "text",
"text": "Pre-shape the dough into a loose rectangle. Let it rest for 15-20 minutes to relax the gluten before final shaping.",
"mime_type": "text/plain",
"chunk_index": 1,
"generated_metadata": {
"type": "text",
"file_type": "text/plain",
"language": "en",
"word_count": 20,
"file_size": 1024
}
},
{
"type": "image_url",
"image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"},
"mime_type": "image/jpeg",
"chunk_index": 2,
"generated_metadata": {
"type": "image",
"file_type": "image/jpeg",
"file_size": 153600,
"width": 800,
"height": 600
}
}
]JSON Lines Format (.mxjsonl)Link to section
{"type": "text", "text": "Baguette Shaping Guide", "mime_type": "text/plain", "chunk_index": 0, "generated_metadata": {"type": "text", "file_type": "text/plain", "language": "en", "word_count": 3, "file_size": 1024}}
{"type": "text", "text": "Pre-shape the dough into a loose rectangle.", "mime_type": "text/plain", "chunk_index": 1, "generated_metadata": {"type": "text", "file_type": "text/plain", "language": "en", "word_count": 8, "file_size": 1024}}
{"type": "image_url", "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"}, "mime_type": "image/jpeg", "chunk_index": 2, "generated_metadata": {"type": "image", "file_type": "image/jpeg", "file_size": 153600, "width": 800, "height": 600}}Schema EndpointLink to section
Retrieve the JSON Schema programmatically:
curl https://api.mixedbread.com/v1/schemas/mxjsonUse CasesLink to section
Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).
Pre-processed pipelines: When existing ETL pipelines produce chunked content.
Multimodal collections: When combining text, images, audio, and video from different sources.
Metadata preservation: When chunks carry structured metadata from source systems.
Migration: When importing pre-chunked data from other vector databases.
Validation ErrorsLink to section
| Error | Cause |
|---|---|
type is required | Missing type field |
text must be 1-65536 characters | Text empty or exceeds limit |
| Invalid URL format | Malformed URL or data URI |
| Unknown chunk type | Type not one of: text, image_url, audio_url, video_url |