Data Models
Understanding the core data structures in Mixedbread Stores helps you work effectively with the API and understand how your content is organized and retrieved.
StoreLink to section
A Store is the primary container for your searchable content. It holds your files, manages access permissions, and provides the foundation for semantic search operations.
Store PropertiesLink to section
| Property | Type | Description |
|---|---|---|
id | string | Unique identifier for the Store |
name | string | User-defined name that serves as an identifier |
description | string | Optional description of the Store's purpose |
is_public | boolean | Whether the Store is publicly accessible |
metadata | object | Additional metadata associated with the Store |
file_counts | object | Counts of files in different processing states |
expires_after | object | Expiration configuration based on activity |
status | enum | Current status: expired, in_progress, completed |
created_at | string | ISO timestamp when the Store was created |
updated_at | string | ISO timestamp when the Store was last updated |
last_active_at | string | ISO timestamp of the last activity |
usage_bytes | integer | Total storage space used by indexed content |
expires_at | string | Computed expiration timestamp (if expires_after is set) |
object | string | Always "store" |
File Counts ObjectLink to section
The file_counts object provides detailed breakdown of file processing states:
| Property | Type | Description |
|---|---|---|
pending | integer | Number of files waiting to be processed |
in_progress | integer | Number of files currently being processed |
cancelled | integer | Number of files whose processing was cancelled |
completed | integer | Number of successfully processed files |
failed | integer | Number of files that failed processing |
total | integer | Total number of files |
For detailed configuration options including expiration policies and public access, see Store Configuration.
Store ExampleLink to section
{
"id": "c3d4e5f6-a7b8-9012-cdef-345678901234",
"name": "product-documentation",
"description": "Complete product documentation and API reference",
"is_public": false,
"metadata": {
"category": "documentation",
"language": "en"
},
"file_counts": {
"pending": 2,
"in_progress": 1,
"cancelled": 0,
"completed": 10,
"failed": 0,
"total": 13
},
"expires_after": {
"anchor": "last_active_at",
"days": 30
},
"status": "in_progress",
"created_at": "2024-01-15T10:00:00Z",
"updated_at": "2024-01-20T14:30:00Z",
"last_active_at": "2024-01-20T14:30:00Z",
"usage_bytes": 1048576,
"expires_at": "2024-02-19T14:30:00Z",
"object": "store"
}Store FileLink to section
A Store File represents a complete file that you've uploaded to a Store. It tracks the file's processing status, metadata, and relationship to the searchable chunks created from its content.
File PropertiesLink to section
| Property | Type | Description |
|---|---|---|
id | string | Unique identifier for the file within the Store |
filename | string | Original name of the uploaded file |
metadata | object | Custom key-value pairs you've attached to the file |
status | enum | Current processing status of the file |
last_error | object | Details about any processing errors that occurred |
store_id | string | ID of the Store containing this file |
created_at | string | ISO timestamp when the file was added to the Store |
version | integer | Version number of the file within the Store |
usage_bytes | integer | Storage space used by the file's indexed data |
object | string | Always "store.file" |
For detailed information on file processing lifecycle and status meanings, see Store File Status.
For guidance on metadata structure and types, see Metadata Types.
File ExampleLink to section
{
"id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"filename": "product-documentation.pdf",
"metadata": {
"category": "documentation",
"department": "product",
"version": "2.1",
"last_updated": "2024-01-15"
},
"status": "completed",
"last_error": null,
"store_id": "c3d4e5f6-a7b8-9012-cdef-345678901234",
"created_at": "2024-01-15T10:30:00Z",
"version": 1,
"usage_bytes": 245760,
"object": "store.file"
}Store ChunkLink to section
A Store Chunk represents a searchable segment of content created from a Store File. When you search, you get back chunks that contain the most relevant portions of your files.
Chunk PropertiesLink to section
| Property | Type | Description |
|---|---|---|
chunk_index | integer | Position of this chunk within the source file |
mime_type | string | Content type of the chunk (text/plain, image/png, etc.) |
model | string | Model used to generate the chunk's vector |
score | number | Relevance score for this chunk (in search results) |
file_id | string | ID of the file this chunk came from |
filename | string | Name of the source file |
store_id | string | ID of the Store containing this chunk |
external_id | string | Optional external identifier for the source file |
metadata | object | User-defined metadata inherited from the source file |
generated_metadata | object | Ingestion-time structured metadata e.g. chunk size |
type | enum | Type of content: text, image_url, audio_url, video_url |
Content-Specific PropertiesLink to section
Text Chunks
| Property | Type | Description |
|---|---|---|
text | string | Text content of the chunk |
offset | integer | Character offset of this chunk relative to the start of the file |
Image Chunks
| Property | Type | Description |
|---|---|---|
image_url | object | Image URL and format information |
ocr_text | string | Text extracted from images via OCR |
summary | string | AI-generated summary of the image content † |
Audio Chunks
| Property | Type | Description |
|---|---|---|
audio_url | object | Audio URL and format information |
transcription | string | Speech-to-text transcription of the audio † |
sampling_rate | integer | Audio sampling rate in Hz |
Video Chunks
| Property | Type | Description |
|---|---|---|
video_url | object | Video URL and format information |
transcription | string | Speech-to-text transcription of the video † |
† The summary and transcription fields are only populated when the file was ingested with the high_quality parsing strategy.
Chunk TypesLink to section
Text Chunks
{
"type": "text",
"text": "User authentication in our API requires a valid API key...",
"chunk_index": 2,
"offset": 1024,
"mime_type": "text/plain",
"score": 0.89
}Image Chunks
{
"type": "image_url",
"image_url": {
"url": "https://signed-url-to-image.com/chunk_img_123",
"format": "png"
},
"ocr_text": "Figure 1: Authentication Flow Diagram",
"summary": "A diagram showing the authentication flow process",
"chunk_index": 5,
"mime_type": "image/png",
"score": 0.76
}Audio Chunks
{
"type": "audio_url",
"audio_url": {
"url": "https://signed-url-to-audio.com/chunk_audio_456"
},
"transcription": "Welcome to our product overview. In this section, we'll cover...",
"sampling_rate": 44100,
"chunk_index": 3,
"mime_type": "audio/mpeg",
"score": 0.82
}Video Chunks
{
"type": "video_url",
"video_url": {
"url": "https://signed-url-to-video.com/chunk_video_789"
},
"transcription": "Hello everyone, today we're going to demonstrate...",
"chunk_index": 1,
"mime_type": "video/mp4",
"score": 0.88
}Complete Chunk ExampleLink to section
{
"chunk_index": 3,
"mime_type": "text/plain",
"model": "mixedbread-ai/mxbai-omni-v1",
"score": 0.92,
"file_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"filename": "product-documentation.pdf",
"store_id": "c3d4e5f6-a7b8-9012-cdef-345678901234",
"external_id": "doc-auth-guide-v2",
"metadata": {
"category": "documentation",
"department": "product"
},
"type": "text",
"text": "To authenticate API requests, include your API key in the Authorization header: Authorization: Bearer YOUR_API_KEY. The API key identifies your account and provides access to your organization's resources.",
"offset": 4096
}Metadata Filtering
Learn how to filter Store files and search results using powerful metadata queries with logical and comparison operators.
Overview
Utilize the Mixedbread Parsing API to transform complex documents (PDFs, DOCX, etc.) into clean, structured text elements or chunks. Improve data quality for RAG, embedding generation, and information extraction with our layout-aware parsing capabilities.