Overview

IntroductionLink to section

The Mixedbread Parsing API is your essential tool for transforming complex documents into clean text. But it goes beyond simple text extraction. It understands the layout and returns detailed information about various layout elements. You receive the content, element information and even bounding boxes. All from a single API call.

Typical Workflow: Parsing a DocumentLink to section

Initiate a parsing job by providing your document.

Retrieve the parsed results from the job once it's complete.

Key FeaturesLink to section

Multi-Format Support: Handles various document types including PDF, PPTX, HTML, and more.
Layout-Aware Extraction: Understands document structure beyond raw text.
Structured Output: Provides detailed information about content elements.
Multiple Output Formats: Choose from JSON, Markdown, or clean Text based on your needs.
Asynchronous Processing: Efficiently handle large or complex documents.
Improves Downstream Quality: Creates better input for embedding, RAG, and indexing.

Check out the Parsing API for detailed endpoints and code examples.

Overview

IntroductionLink to section

Typical Workflow: Parsing a DocumentLink to section

Key FeaturesLink to section

On this page