Data Models

Understanding the core data structures in Mixedbread Stores helps you work effectively with the API and understand how your content is organized and retrieved.

StoreLink to section

A Store is the primary container for your searchable content. It holds your files, manages access permissions, and provides the foundation for semantic search operations.

Store PropertiesLink to section

Property	Type	Description
`id`	string	Unique identifier for the Store
`name`	string	User-defined name that serves as an identifier
`description`	string	Optional description of the Store's purpose
`is_public`	boolean	Whether the Store is publicly accessible
`metadata`	object	Additional metadata associated with the Store
`file_counts`	object	Counts of files in different processing states
`expires_after`	object	Expiration configuration based on activity
`status`	enum	Current status: `expired`, `in_progress`, `completed`
`created_at`	string	ISO timestamp when the Store was created
`updated_at`	string	ISO timestamp when the Store was last updated
`last_active_at`	string	ISO timestamp of the last activity
`usage_bytes`	integer	Total storage space used by indexed content
`expires_at`	string	Computed expiration timestamp (if expires_after is set)
`object`	string	Always "store"

File Counts ObjectLink to section

The file_counts object provides detailed breakdown of file processing states:

Property	Type	Description
`pending`	integer	Number of files waiting to be processed
`in_progress`	integer	Number of files currently being processed
`cancelled`	integer	Number of files whose processing was cancelled
`completed`	integer	Number of successfully processed files
`failed`	integer	Number of files that failed processing
`total`	integer	Total number of files

For detailed configuration options including expiration policies and public access, see Store Configuration.

Store ExampleLink to section

{
  "id": "c3d4e5f6-a7b8-9012-cdef-345678901234",
  "name": "product-documentation",
  "description": "Complete product documentation and API reference",
  "is_public": false,
  "metadata": {
    "category": "documentation",
    "language": "en"
  },
  "file_counts": {
    "pending": 2,
    "in_progress": 1,
    "cancelled": 0,
    "completed": 10,
    "failed": 0,
    "total": 13
  },
  "expires_after": {
    "anchor": "last_active_at",
    "days": 30
  },
  "status": "in_progress",
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-20T14:30:00Z",
  "last_active_at": "2024-01-20T14:30:00Z",
  "usage_bytes": 1048576,
  "expires_at": "2024-02-19T14:30:00Z",
  "object": "store"
}

Store FileLink to section

A Store File represents a complete file that you've uploaded to a Store. It tracks the file's processing status, metadata, and relationship to the searchable chunks created from its content.

File PropertiesLink to section

Property	Type	Description
`id`	string	Unique identifier for the file within the Store
`filename`	string	Original name of the uploaded file
`metadata`	object	Custom key-value pairs you've attached to the file
`status`	enum	Current processing status of the file
`last_error`	object	Details about any processing errors that occurred
`store_id`	string	ID of the Store containing this file
`created_at`	string	ISO timestamp when the file was added to the Store
`version`	integer	Version number of the file within the Store
`usage_bytes`	integer	Storage space used by the file's indexed data
`object`	string	Always "store.file"

For detailed information on file processing lifecycle and status meanings, see Store File Status.

For guidance on metadata structure and types, see Metadata Types.

File ExampleLink to section

{
  "id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "filename": "product-documentation.pdf",
  "metadata": {
    "category": "documentation",
    "department": "product",
    "version": "2.1",
    "last_updated": "2024-01-15"
  },
  "status": "completed",
  "last_error": null,
  "store_id": "c3d4e5f6-a7b8-9012-cdef-345678901234",
  "created_at": "2024-01-15T10:30:00Z",
  "version": 1,
  "usage_bytes": 245760,
  "object": "store.file"
}

Store ChunkLink to section

A Store Chunk represents a searchable segment of content created from a Store File. When you search, you get back chunks that contain the most relevant portions of your files.

Chunk PropertiesLink to section

Property	Type	Description
`chunk_index`	integer	Position of this chunk within the source file
`mime_type`	string	Content type of the chunk (text/plain, image/png, etc.)
`model`	string	Model used to generate the chunk's vector
`score`	number	Relevance score for this chunk (in search results)
`file_id`	string	ID of the file this chunk came from
`filename`	string	Name of the source file
`store_id`	string	ID of the Store containing this chunk
`external_id`	string	Optional external identifier for the source file
`metadata`	object	User-defined metadata inherited from the source file
`generated_metadata`	object	Ingestion-time structured metadata e.g. chunk size
`type`	enum	Type of content: `text`, `image_url`, `audio_url`, `video_url`

Content-Specific PropertiesLink to section

Text Chunks

Property	Type	Description
`text`	string	Text content of the chunk
`summary`	string	AI-generated summary of the text chunk †
`offset`	integer	Character offset of this chunk relative to the start of the file

Image Chunks

Property	Type	Description
`image_url`	object	Image URL and format information
`ocr_text`	string	Text extracted from images via OCR
`summary`	string	AI-generated summary of the image content †

Audio Chunks

Property	Type	Description
`audio_url`	object	Audio URL and format information
`transcription`	string	Speech-to-text transcription of the audio †
`summary`	string	AI-generated summary of the audio content †
`sampling_rate`	integer	Audio sampling rate in Hz

Video Chunks

Property	Type	Description
`video_url`	object	Video URL and format information
`transcription`	string	Speech-to-text transcription of the video †
`summary`	string	AI-generated summary of the video clip †

^† The summary, ocr_text and transcription fields are only populated when the file was ingested with the high_quality parsing strategy.

Chunk TypesLink to section

Text Chunks

{
  "type": "text",
  "text": "User authentication in our API requires a valid API key...",
  "summary": "API authentication requires a valid API key and uses request headers to identify callers.",
  "chunk_index": 2,
  "offset": 1024,
  "mime_type": "text/plain",
  "score": 0.89
}

Image Chunks

{
  "type": "image_url",
  "image_url": {
    "url": "https://signed-url-to-image.com/chunk_img_123",
    "format": "png"
  },
  "ocr_text": "Figure 1: Authentication Flow Diagram",
  "summary": "A diagram showing the authentication flow process",
  "chunk_index": 5,
  "mime_type": "image/png",
  "score": 0.76
}

Audio Chunks

{
  "type": "audio_url",
  "audio_url": {
    "url": "https://signed-url-to-audio.com/chunk_audio_456"
  },
  "transcription": "Welcome to our product overview. In this section, we'll cover...",
  "sampling_rate": 44100,
  "chunk_index": 3,
  "mime_type": "audio/mpeg",
  "score": 0.82
}

Video Chunks

{
  "type": "video_url",
  "video_url": {
    "url": "https://signed-url-to-video.com/chunk_video_789"
  },
  "transcription": "Hello everyone, today we're going to demonstrate...",
  "chunk_index": 1,
  "mime_type": "video/mp4",
  "score": 0.88
}

Complete Chunk ExampleLink to section

{
  "chunk_index": 3,
  "mime_type": "text/plain",
  "model": "mixedbread-ai/mxbai-omni-v1",
  "score": 0.92,
  "file_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "filename": "product-documentation.pdf",
  "store_id": "c3d4e5f6-a7b8-9012-cdef-345678901234",
  "external_id": "doc-auth-guide-v2",
  "metadata": {
    "category": "documentation",
    "department": "product"
  },
  "type": "text",
  "text": "To authenticate API requests, include your API key in the Authorization header: Authorization: Bearer YOUR_API_KEY. The API key identifies your account and provides access to your organization's resources.",
  "offset": 4096
}

Data Models

On this page