Mixedbread JSON Format
The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.
File Formats
| Format | Extension | MIME Type | Structure |
|---|---|---|---|
| JSON | .mxjson | application/vnd-mxbai.chunks-json | Array of chunk objects |
| JSON Lines | .mxjsonl | application/vnd-mxbai.chunks-jsonl | One chunk object per line |
Chunk Structure
Each chunk in an mxjson file follows the same structure as Store Chunks. The type field determines which properties are required:
text- Text contentimage_url- Image referenceaudio_url- Audio referencevideo_url- Video reference
Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.
Input Properties
When creating chunks for mxjson files, these properties control ingestion:
| Property | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Chunk type: text, image_url, audio_url, video_url |
mime_type | string | Yes | Content MIME type |
chunk_index | integer | No | Position in file. Auto-generated sequentially if omitted |
generated_metadata | object | No | Arbitrary key-value metadata preserved on the chunk |
Text Chunks
| Property | Type | Required |
|---|---|---|
text | string | Yes (1-65536 characters) |
offset | integer | No (default: 0) |
{
"type": "text",
"text": "Sourdough fermentation relies on wild yeast and lactic acid bacteria.",
"mime_type": "text/plain"
}Image Chunks
| Property | Type | Required |
|---|---|---|
image_url.url | string | Yes (HTTP URL or data URI) |
ocr_text | string | No |
summary | string | No |
{
"type": "image_url",
"image_url": {
"url": "https://bakery.example.com/images/crumb-structure.png"
},
"mime_type": "image/png",
"ocr_text": "Figure 3: Open crumb structure",
"summary": "Cross-section of sourdough loaf showing irregular hole distribution"
}Audio Chunks
| Property | Type | Required |
|---|---|---|
audio_url.url | string | Yes (HTTP URL or data URI) |
sampling_rate | integer | Yes |
transcription | string | No |
summary | string | No |
{
"type": "audio_url",
"audio_url": {
"url": "https://bakery.example.com/audio/kneading-tutorial.mp3"
},
"mime_type": "audio/mpeg",
"sampling_rate": 44100,
"transcription": "Fold the dough over itself, rotate ninety degrees, and repeat."
}Video Chunks
| Property | Type | Required |
|---|---|---|
video_url.url | string | Yes (HTTP URL or data URI) |
transcription | string | No |
summary | string | No |
{
"type": "video_url",
"video_url": {
"url": "https://bakery.example.com/video/shaping-boule.mp4"
},
"mime_type": "video/mp4",
"transcription": "Pre-shape into a round, let it rest, then do the final shaping."
}Chunk Metadata
Each chunk can include generated_metadata with arbitrary key-value pairs. This metadata is preserved on the chunk and returned in search results.
{
"type": "text",
"text": "Autolyse is a rest period after mixing flour and water.",
"mime_type": "text/plain",
"generated_metadata": {
"technique": "autolyse",
"difficulty": "beginner",
"duration_minutes": 30
}
}File-level metadata (set during upload) applies to all chunks and participates in contextualization if enabled on the Store.
Complete Example
JSON Format (.mxjson)
[
{
"type": "text",
"text": "Baguette Shaping Guide",
"mime_type": "text/plain",
"chunk_index": 0,
"generated_metadata": {"section": "title"}
},
{
"type": "text",
"text": "Pre-shape the dough into a loose rectangle. Let it rest for 15-20 minutes to relax the gluten before final shaping.",
"mime_type": "text/plain",
"chunk_index": 1,
"generated_metadata": {"step": 1}
},
{
"type": "image_url",
"image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"},
"mime_type": "image/jpeg",
"summary": "Dough pre-shaped into loose rectangle on floured surface",
"chunk_index": 2,
"generated_metadata": {"step": 1}
}
]JSON Lines Format (.mxjsonl)
{"type": "text", "text": "Baguette Shaping Guide", "mime_type": "text/plain", "chunk_index": 0}
{"type": "text", "text": "Pre-shape the dough into a loose rectangle.", "mime_type": "text/plain", "chunk_index": 1}
{"type": "image_url", "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"}, "mime_type": "image/jpeg", "chunk_index": 2}Schema Endpoint
Retrieve the JSON Schema programmatically:
curl https://api.mixedbread.com/v1/schemas/mxjsonUse Cases
Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).
Pre-processed pipelines: When existing ETL pipelines produce chunked content.
Multimodal collections: When combining text, images, audio, and video from different sources.
Metadata preservation: When chunks carry structured metadata from source systems.
Migration: When importing pre-chunked data from other vector databases.
Validation Errors
| Error | Cause |
|---|---|
type is required | Missing type field |
text must be 1-65536 characters | Text empty or exceeds limit |
| Invalid URL format | Malformed URL or data URI |
| Unknown chunk type | Type not one of: text, image_url, audio_url, video_url |