Mixedbread JSON Format
This format is for specialized use cases where you need complete control over chunking. For most workflows, use the standard file upload — it handles chunking, metadata generation, and indexing automatically.
The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.
Schema Checker
Validate your mxjson files before uploading. Drop a .mxjson or .mxjsonl file below to check it against the current schema. You can also download the JSON Schema.
Drop a .mxjson or .mxjsonl file here, or click to select
File Formats
| Format | Extension | MIME Type | Structure |
|---|---|---|---|
| JSON | .mxjson | application/vnd-mxbai.chunks-json | Array of chunk objects |
| JSON Lines | .mxjsonl | application/vnd-mxbai.chunks-jsonl | One chunk object per line |
Chunk Structure
Each chunk in an mxjson file follows the same structure as Store Chunks. The type field determines which properties are required:
text- Text contentimage_url- Image referenceaudio_url- Audio referencevideo_url- Video reference
Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.
Input Properties
When creating chunks for mxjson files, these properties control ingestion:
| Property | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Chunk type: text, image_url, audio_url, video_url |
mime_type | string | No | Content MIME type (defaults per chunk type) |
chunk_index | integer | No | Position in file. Auto-generated sequentially if omitted |
generated_metadata | object | No | Typed metadata structure — see Generated Metadata |
Text Chunks
| Property | Type | Required |
|---|---|---|
text | string | Yes (1-65536 characters) |
offset | integer | No (default: 0) |
{
"type": "text",
"text": "Sourdough fermentation relies on wild yeast and lactic acid bacteria.",
"mime_type": "text/plain",
"generated_metadata": {
"type": "text",
"file_type": "text/plain",
"language": "en",
"word_count": 10,
"file_size": 512
}
}Image Chunks
| Property | Type | Required |
|---|---|---|
image_url.url | string | Yes (HTTP URL or data URI) |
{
"type": "image_url",
"image_url": {
"url": "https://bakery.example.com/images/crumb-structure.png"
},
"mime_type": "image/png",
"generated_metadata": {
"type": "image",
"file_type": "image/png",
"file_size": 204800,
"width": 1200,
"height": 800
}
}Audio Chunks
| Property | Type | Required |
|---|---|---|
audio_url.url | string | Yes (HTTP URL or data URI) |
sampling_rate | integer | Yes |
{
"type": "audio_url",
"audio_url": {
"url": "https://bakery.example.com/audio/kneading-tutorial.mp3"
},
"mime_type": "audio/mpeg",
"sampling_rate": 44100,
"generated_metadata": {
"type": "audio",
"file_type": "audio/mpeg",
"file_size": 5242880,
"total_duration_seconds": 180.5,
"sample_rate": 44100,
"channels": 2,
"audio_format": 1
}
}Video Chunks
| Property | Type | Required |
|---|---|---|
video_url.url | string | Yes (HTTP URL or data URI) |
{
"type": "video_url",
"video_url": {
"url": "https://bakery.example.com/video/shaping-boule.mp4"
},
"mime_type": "video/mp4",
"generated_metadata": {
"type": "video",
"file_type": "video/mp4",
"file_size": 10485760,
"total_duration_seconds": 120.0,
"fps": 30.0,
"width": 1920,
"height": 1080,
"frame_count": 3600,
"has_audio_stream": true
}
}Chunk Metadata
Each chunk can optionally include generated_metadata — a typed structure that follows a fixed format discriminated by a type field. When files are processed by the system, this metadata is automatically generated. When using mxjson, you can provide it yourself to match the same structure. Only the type field is required within the metadata object; all other fields are optional. You can also add custom fields beyond the typed ones — any additional key-value pairs are preserved alongside the standard fields.
See Generated Metadata for the full reference of all metadata types and their fields.
{
"type": "text",
"text": "Autolyse is a rest period after mixing flour and water.",
"mime_type": "text/plain",
"generated_metadata": {
"type": "text",
"file_type": "text/plain",
"language": "en",
"word_count": 10,
"file_size": 2048
}
}File-level metadata (set during upload) applies to all chunks and participates in contextualization if enabled on the Store.
Complete Example
JSON Format (.mxjson)
[
{
"type": "text",
"text": "Baguette Shaping Guide",
"mime_type": "text/plain",
"chunk_index": 0,
"generated_metadata": {
"type": "text",
"file_type": "text/plain",
"language": "en",
"word_count": 3,
"file_size": 1024
}
},
{
"type": "text",
"text": "Pre-shape the dough into a loose rectangle. Let it rest for 15-20 minutes to relax the gluten before final shaping.",
"mime_type": "text/plain",
"chunk_index": 1,
"generated_metadata": {
"type": "text",
"file_type": "text/plain",
"language": "en",
"word_count": 20,
"file_size": 1024
}
},
{
"type": "image_url",
"image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"},
"mime_type": "image/jpeg",
"chunk_index": 2,
"generated_metadata": {
"type": "image",
"file_type": "image/jpeg",
"file_size": 153600,
"width": 800,
"height": 600
}
}
]JSON Lines Format (.mxjsonl)
{"type": "text", "text": "Baguette Shaping Guide", "mime_type": "text/plain", "chunk_index": 0, "generated_metadata": {"type": "text", "file_type": "text/plain", "language": "en", "word_count": 3, "file_size": 1024}}
{"type": "text", "text": "Pre-shape the dough into a loose rectangle.", "mime_type": "text/plain", "chunk_index": 1, "generated_metadata": {"type": "text", "file_type": "text/plain", "language": "en", "word_count": 8, "file_size": 1024}}
{"type": "image_url", "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"}, "mime_type": "image/jpeg", "chunk_index": 2, "generated_metadata": {"type": "image", "file_type": "image/jpeg", "file_size": 153600, "width": 800, "height": 600}}Schema Endpoint
Retrieve the JSON Schema programmatically:
curl https://api.mixedbread.com/v1/schemas/mxjsonUse Cases
Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).
Pre-processed pipelines: When existing ETL pipelines produce chunked content.
Multimodal collections: When combining text, images, audio, and video from different sources.
Metadata preservation: When chunks carry structured metadata from source systems.
Migration: When importing pre-chunked data from other vector databases.
Validation Errors
| Error | Cause |
|---|---|
type is required | Missing type field |
text must be 1-65536 characters | Text empty or exceeds limit |
| Invalid URL format | Malformed URL or data URI |
| Unknown chunk type | Type not one of: text, image_url, audio_url, video_url |