Mixedbread JSON Format

This format is for specialized use cases where you need complete control over chunking. For most workflows, use the standard file upload — it handles chunking, metadata generation, and indexing automatically.

The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.

Schema CheckerLink to section

Validate your mxjson files before uploading. Drop a .mxjson or .mxjsonl file below to check it against the current schema. You can also download the JSON Schema.

Drop your file here or click to browse

.mxjson or .mxjsonl up to 10MB

File FormatsLink to section

Format	Extension	MIME Type	Structure
JSON	`.mxjson`	`application/vnd-mxbai.chunks-json`	Array of chunk objects
JSON Lines	`.mxjsonl`	`application/vnd-mxbai.chunks-jsonl`	One chunk object per line

Chunk StructureLink to section

Each chunk in an mxjson file follows the same structure as Store Chunks. The type field determines which properties are required:

text - Text content
image_url - Image reference
audio_url - Audio reference
video_url - Video reference

Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.

Input PropertiesLink to section

When creating chunks for mxjson files, these properties control ingestion:

Property	Type	Required	Description
`type`	string	Yes	Chunk type: `text`, `image_url`, `audio_url`, `video_url`
`mime_type`	string	No	Content MIME type (defaults per chunk type)
`chunk_index`	integer	No	Position in file. Auto-generated sequentially if omitted
`generated_metadata`	object	No	Typed metadata structure — see Generated Metadata

Text ChunksLink to section

Property	Type	Required
`text`	string	Yes (1-65536 characters)
`offset`	integer	No (default: 0)

{
  "type": "text",
  "text": "Sourdough fermentation relies on wild yeast and lactic acid bacteria.",
  "mime_type": "text/plain",
  "generated_metadata": {
    "type": "text",
    "file_type": "text/plain",
    "language": "en",
    "word_count": 10,
    "file_size": 512
  }
}

Image ChunksLink to section

Property	Type	Required
`image_url.url`	string	Yes (HTTP URL or data URI)

{
  "type": "image_url",
  "image_url": {
    "url": "https://bakery.example.com/images/crumb-structure.png"
  },
  "mime_type": "image/png",
  "generated_metadata": {
    "type": "image",
    "file_type": "image/png",
    "file_size": 204800,
    "width": 1200,
    "height": 800
  }
}

Audio ChunksLink to section

Property	Type	Required
`audio_url.url`	string	Yes (HTTP URL or data URI)
`sampling_rate`	integer	Yes

{
  "type": "audio_url",
  "audio_url": {
    "url": "https://bakery.example.com/audio/kneading-tutorial.mp3"
  },
  "mime_type": "audio/mpeg",
  "sampling_rate": 44100,
  "generated_metadata": {
    "type": "audio",
    "file_type": "audio/mpeg",
    "file_size": 5242880,
    "total_duration_seconds": 180.5,
    "sample_rate": 44100,
    "channels": 2,
    "audio_format": 1
  }
}

Video ChunksLink to section

Property	Type	Required
`video_url.url`	string	Yes (HTTP URL or data URI)

{
  "type": "video_url",
  "video_url": {
    "url": "https://bakery.example.com/video/shaping-boule.mp4"
  },
  "mime_type": "video/mp4",
  "generated_metadata": {
    "type": "video",
    "file_type": "video/mp4",
    "file_size": 10485760,
    "total_duration_seconds": 120.0,
    "fps": 30.0,
    "width": 1920,
    "height": 1080,
    "frame_count": 3600,
    "has_audio_stream": true
  }
}

Chunk MetadataLink to section

Each chunk can optionally include generated_metadata — a typed structure that follows a fixed format discriminated by a type field. When files are processed by the system, this metadata is automatically generated. When using mxjson, you can provide it yourself to match the same structure. Only the type field is required within the metadata object; all other fields are optional. You can also add custom fields beyond the typed ones — any additional key-value pairs are preserved alongside the standard fields.

See Generated Metadata for the full reference of all metadata types and their fields.

{
  "type": "text",
  "text": "Autolyse is a rest period after mixing flour and water.",
  "mime_type": "text/plain",
  "generated_metadata": {
    "type": "text",
    "file_type": "text/plain",
    "language": "en",
    "word_count": 10,
    "file_size": 2048
  }
}

File-level metadata (set during upload) applies to all chunks and participates in contextualization if enabled on the Store.

Complete ExampleLink to section

JSON Format (.mxjson)Link to section

[
  {
    "type": "text",
    "text": "Baguette Shaping Guide",
    "mime_type": "text/plain",
    "chunk_index": 0,
    "generated_metadata": {
      "type": "text",
      "file_type": "text/plain",
      "language": "en",
      "word_count": 3,
      "file_size": 1024
    }
  },
  {
    "type": "text",
    "text": "Pre-shape the dough into a loose rectangle. Let it rest for 15-20 minutes to relax the gluten before final shaping.",
    "mime_type": "text/plain",
    "chunk_index": 1,
    "generated_metadata": {
      "type": "text",
      "file_type": "text/plain",
      "language": "en",
      "word_count": 20,
      "file_size": 1024
    }
  },
  {
    "type": "image_url",
    "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"},
    "mime_type": "image/jpeg",
    "chunk_index": 2,
    "generated_metadata": {
      "type": "image",
      "file_type": "image/jpeg",
      "file_size": 153600,
      "width": 800,
      "height": 600
    }
  }
]

JSON Lines Format (.mxjsonl)Link to section

{"type": "text", "text": "Baguette Shaping Guide", "mime_type": "text/plain", "chunk_index": 0, "generated_metadata": {"type": "text", "file_type": "text/plain", "language": "en", "word_count": 3, "file_size": 1024}}
{"type": "text", "text": "Pre-shape the dough into a loose rectangle.", "mime_type": "text/plain", "chunk_index": 1, "generated_metadata": {"type": "text", "file_type": "text/plain", "language": "en", "word_count": 8, "file_size": 1024}}
{"type": "image_url", "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"}, "mime_type": "image/jpeg", "chunk_index": 2, "generated_metadata": {"type": "image", "file_type": "image/jpeg", "file_size": 153600, "width": 800, "height": 600}}

Schema EndpointLink to section

Retrieve the JSON Schema programmatically:

curl https://api.mixedbread.com/v1/schemas/mxjson

Use CasesLink to section

Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).

Pre-processed pipelines: When existing ETL pipelines produce chunked content.

Multimodal collections: When combining text, images, audio, and video from different sources.

Metadata preservation: When chunks carry structured metadata from source systems.

Migration: When importing pre-chunked data from other vector databases.

Validation ErrorsLink to section

Error	Cause
`type` is required	Missing `type` field
`text` must be 1-65536 characters	Text empty or exceeds limit
Invalid URL format	Malformed URL or data URI
Unknown chunk type	Type not one of: `text`, `image_url`, `audio_url`, `video_url`

Mixedbread JSON Format

On this page