Metadata Filtering
Metadata filtering provides a powerful way to narrow down results based on the metadata attached to your files.
Need to understand metadata types? This page covers filtering syntax and operations. For supported metadata types and structure, see Metadata Types.
Quick ExampleLink to section
Here's a simple example of filtering files by category:
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
response = mxbai.stores.files.list(
store_identifier="my-knowledge-base",
limit=10,
metadata_filter={"key": "category", "value": "documentation", "operator": "eq"},
)
for file in response.data:
print(file)Filter StructureLink to section
Filters can be structured in two ways depending on your needs:
Single Field Filter (Direct Condition)Link to section
For simple single-field filtering, you can use a direct condition:
{"key": "metadata_key", "operator": "comparison", "value": "target_value"}Example:
{"key": "category", "operator": "eq", "value": "documentation"}Multiple Field Filter (Logical Operators)Link to section
For complex filtering with multiple conditions, use logical operators:
{
"logical_operator": [
{"key": "metadata_key", "operator": "comparison", "value": "target_value"}
]
}Example filter structure:
{
"all": [
{"key": "category", "operator": "eq", "value": "documentation"}
]
}Generated Metadata FieldsLink to section
You can target auto-generated chunk metadata by prefixing the key with generated_metadata..
This works the same way as regular metadata filters and is especially useful for filtering on
values described in Generated Metadata.
{
"any": [
{"key": "generated_metadata.file_type", "operator": "eq", "value": "text/markdown"},
{"key": "generated_metadata.language", "operator": "eq", "value": "en"}
]
}Use dot notation to drill into nested structures, e.g. generated_metadata.chunk_headings.level.
Logical OperatorsLink to section
Combine multiple conditions using logical operators to create sophisticated filters:
All (AND Operation)Link to section
All conditions must be true:
{
"all": [
{"key": "category", "operator": "eq", "value": "documentation"},
{"key": "language", "operator": "eq", "value": "python"},
{"key": "status", "operator": "eq", "value": "published"}
]
}Any (OR Operation)Link to section
At least one condition must be true:
{
"any": [
{"key": "language", "operator": "eq", "value": "python"},
{"key": "language", "operator": "eq", "value": "javascript"},
{"key": "language", "operator": "eq", "value": "typescript"}
]
}None (NOT Operation)Link to section
None of the conditions should be true:
{
"none": [
{"key": "status", "operator": "eq", "value": "deprecated"},
{"key": "status", "operator": "eq", "value": "draft"}
]
}Comparison OperatorsLink to section
Equality and Comparison OperatorsLink to section
// Equal to
{"key": "status", "operator": "eq", "value": "published"}
// Not equal to
{"key": "status", "operator": "not_eq", "value": "draft"}
// Greater than
{"key": "priority", "operator": "gt", "value": 5}
// Greater than or equal to
{"key": "created_at", "operator": "gte", "value": "2024-01-01"}
// Less than
{"key": "rating", "operator": "lt", "value": 3.0}
// Less than or equal to
{"key": "rating", "operator": "lte", "value": 4.5}
// Value in list
{"key": "tags", "operator": "in", "value": ["tutorial", "guide"]}
// Value not in list
{"key": "language", "operator": "not_in", "value": ["deprecated", "legacy"]}
// Regex matching (case-sensitive)
{"key": "title", "operator": "regex", "value": "^red.*$"}
// String starts with (case-sensitive)
{"key": "category", "operator": "starts_with", "value": "/Users"}
// String does not start with
{"key": "path", "operator": "not_like", "value": "/tmp/*"}Data Type FilteringLink to section
String ValuesLink to section
Case-sensitive by default - ensure consistent casing in your metadata:
// String Values (case-sensitive)
{"key": "category", "operator": "eq", "value": "Documentation"} // Won't match "documentation"
// Use consistent casing in metadata
{
"category": "documentation", // lowercase
"status": "published", // lowercase
"team": "engineering" // lowercase
}Numeric ValuesLink to section
Support integer and float comparisons:
// Numeric Values
{"key": "priority", "operator": "gt", "value": 5}
{"key": "score", "operator": "gte", "value": 0.8}Boolean ValuesLink to section
Support true/false conditions:
// Boolean Values
{"key": "is_public", "operator": "eq", "value": true}
{"key": "deprecated", "operator": "eq", "value": false}Date ValuesLink to section
Recommend ISO 8601 format:
// Date Values (ISO 8601 format recommended)
{"key": "created_at", "operator": "gte", "value": "2024-01-01"}
{"key": "last_updated", "operator": "lt", "value": "2024-12-31T23:59:59Z"}Array/List ValuesLink to section
Support membership filtering:
// Array/List Values
{
"tags": ["tutorial", "python", "web"],
"authors": ["alice", "bob"]
}
// Filter by array membership
{"key": "tags", "operator": "in", "value": ["tutorial", "guide"]}Combined Logical OperationsLink to section
Nested ConditionsLink to section
Complex multi-level filtering example:
{
"all": [
{"key": "category", "operator": "eq", "value": "documentation"},
{
"any": [
{"key": "language", "operator": "eq", "value": "python"},
{"key": "language", "operator": "eq", "value": "javascript"}
]
}
],
"none": [
{"key": "status", "operator": "eq", "value": "deprecated"}
]
}Advanced Filtering ExampleLink to section
Here's a practical example demonstrating complex nested filters:
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
metadata_filter = {
"all": [
{"key": "status", "value": "published", "operator": "eq"},
{
"any": [
{"key": "priority", "value": 3, "operator": "gte"},
{
"all": [
{"key": "category", "value": "important", "operator": "eq"},
{"key": "reviewed", "value": True, "operator": "eq"},
]
},
]
},
]
}
response = mxbai.stores.files.list(
store_identifier="my-knowledge-base",
limit=10,
metadata_filter=metadata_filter,
)
for file in response.data:
print(file)