Mixedbread
Ingest

Mixedbread JSON Format

The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.

Schema Checker

Validate your mxjson files before uploading. Drop a .mxjson or .mxjsonl file below to check it against the current schema. You can also download the JSON Schema.

Drop a .mxjson or .mxjsonl file here, or click to select

File Formats

FormatExtensionMIME TypeStructure
JSON.mxjsonapplication/vnd-mxbai.chunks-jsonArray of chunk objects
JSON Lines.mxjsonlapplication/vnd-mxbai.chunks-jsonlOne chunk object per line

Chunk Structure

Each chunk in an mxjson file follows the same structure as . The type field determines which properties are required:

  • text - Text content
  • image_url - Image reference
  • audio_url - Audio reference
  • video_url - Video reference

Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.

Input Properties

When creating chunks for mxjson files, these properties control ingestion:

PropertyTypeRequiredDescription
typestringYesChunk type: text, image_url, audio_url, video_url
mime_typestringNoContent MIME type (defaults per chunk type)
chunk_indexintegerNoPosition in file. Auto-generated sequentially if omitted
generated_metadataobjectNoTyped metadata structure — see

Text Chunks

PropertyTypeRequired
textstringYes (1-65536 characters)
offsetintegerNo (default: 0)
{
  "type": "text",
  "text": "Sourdough fermentation relies on wild yeast and lactic acid bacteria.",
  "mime_type": "text/plain",
  "generated_metadata": {
    "type": "text",
    "file_type": "text/plain",
    "language": "en",
    "word_count": 10,
    "file_size": 512
  }
}

Image Chunks

PropertyTypeRequired
image_url.urlstringYes (HTTP URL or data URI)
{
  "type": "image_url",
  "image_url": {
    "url": "https://bakery.example.com/images/crumb-structure.png"
  },
  "mime_type": "image/png",
  "generated_metadata": {
    "type": "image",
    "file_type": "image/png",
    "file_size": 204800,
    "width": 1200,
    "height": 800
  }
}

Audio Chunks

PropertyTypeRequired
audio_url.urlstringYes (HTTP URL or data URI)
sampling_rateintegerYes
{
  "type": "audio_url",
  "audio_url": {
    "url": "https://bakery.example.com/audio/kneading-tutorial.mp3"
  },
  "mime_type": "audio/mpeg",
  "sampling_rate": 44100,
  "generated_metadata": {
    "type": "audio",
    "file_type": "audio/mpeg",
    "file_size": 5242880,
    "total_duration_seconds": 180.5,
    "sample_rate": 44100,
    "channels": 2,
    "audio_format": 1
  }
}

Video Chunks

PropertyTypeRequired
video_url.urlstringYes (HTTP URL or data URI)
{
  "type": "video_url",
  "video_url": {
    "url": "https://bakery.example.com/video/shaping-boule.mp4"
  },
  "mime_type": "video/mp4",
  "generated_metadata": {
    "type": "video",
    "file_type": "video/mp4",
    "file_size": 10485760,
    "total_duration_seconds": 120.0,
    "fps": 30.0,
    "width": 1920,
    "height": 1080,
    "frame_count": 3600,
    "has_audio_stream": true
  }
}

Chunk Metadata

Each chunk can optionally include generated_metadata — a typed structure that follows a fixed format discriminated by a type field. When files are processed by the system, this metadata is automatically generated. When using mxjson, you can provide it yourself to match the same structure. Only the type field is required within the metadata object; all other fields are optional. You can also add custom fields beyond the typed ones — any additional key-value pairs are preserved alongside the standard fields.

See for the full reference of all metadata types and their fields.

{
  "type": "text",
  "text": "Autolyse is a rest period after mixing flour and water.",
  "mime_type": "text/plain",
  "generated_metadata": {
    "type": "text",
    "file_type": "text/plain",
    "language": "en",
    "word_count": 10,
    "file_size": 2048
  }
}

File-level metadata (set during upload) applies to all chunks and participates in if enabled on the Store.

Complete Example

JSON Format (.mxjson)

[
  {
    "type": "text",
    "text": "Baguette Shaping Guide",
    "mime_type": "text/plain",
    "chunk_index": 0,
    "generated_metadata": {
      "type": "text",
      "file_type": "text/plain",
      "language": "en",
      "word_count": 3,
      "file_size": 1024
    }
  },
  {
    "type": "text",
    "text": "Pre-shape the dough into a loose rectangle. Let it rest for 15-20 minutes to relax the gluten before final shaping.",
    "mime_type": "text/plain",
    "chunk_index": 1,
    "generated_metadata": {
      "type": "text",
      "file_type": "text/plain",
      "language": "en",
      "word_count": 20,
      "file_size": 1024
    }
  },
  {
    "type": "image_url",
    "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"},
    "mime_type": "image/jpeg",
    "chunk_index": 2,
    "generated_metadata": {
      "type": "image",
      "file_type": "image/jpeg",
      "file_size": 153600,
      "width": 800,
      "height": 600
    }
  }
]

JSON Lines Format (.mxjsonl)

{"type": "text", "text": "Baguette Shaping Guide", "mime_type": "text/plain", "chunk_index": 0, "generated_metadata": {"type": "text", "file_type": "text/plain", "language": "en", "word_count": 3, "file_size": 1024}}
{"type": "text", "text": "Pre-shape the dough into a loose rectangle.", "mime_type": "text/plain", "chunk_index": 1, "generated_metadata": {"type": "text", "file_type": "text/plain", "language": "en", "word_count": 8, "file_size": 1024}}
{"type": "image_url", "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"}, "mime_type": "image/jpeg", "chunk_index": 2, "generated_metadata": {"type": "image", "file_type": "image/jpeg", "file_size": 153600, "width": 800, "height": 600}}

Schema Endpoint

Retrieve the JSON Schema programmatically:

curl https://api.mixedbread.com/v1/schemas/mxjson

Use Cases

Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).

Pre-processed pipelines: When existing ETL pipelines produce chunked content.

Multimodal collections: When combining text, images, audio, and video from different sources.

Metadata preservation: When chunks carry structured metadata from source systems.

Migration: When importing pre-chunked data from other vector databases.

Validation Errors

ErrorCause
type is requiredMissing type field
text must be 1-65536 charactersText empty or exceeds limit
Invalid URL formatMalformed URL or data URI
Unknown chunk typeType not one of: text, image_url, audio_url, video_url
Last updated: March 17, 2026