Rate Limiting
Documentation on API rate limits with tier-based quotas. Learn about request limits, how to handle rate limiting responses, and strategies for optimizing your API usage.
Quick Overview
- Each endpoint has its own rate limits
- Limits are based on requests per minute, tokens per minute, and requests per day
- Exceeding limits may result in request throttling or rejection
- Need higher limits? Contact us
Rate Limit Tiers
We offer five tiers with increasing limits. Here's a breakdown for the Embeddings & Reranking endpoint:
Tier | Requests/Min | Tokens/Min | Requests/Day | Burst |
---|---|---|---|---|
Home Baker (Free) | 100 | 250,000 | 5,000 | 10 |
Professional Baker | 300 | 500,000 | 10,000 | 20 |
Bakery Shop | 500 | 1,000,000 | 10,000 | 50 |
Bakery Chain | 1,000 | 10,000,000 | 50,000 | 100 |
Bakery Franchise | 2,000 | 10,000,000 | 100,000 | 100 |
Custom tiers are available upon request.
Handling Rate Limits
When you hit a rate limit:
- You'll receive a
429 Too Many Requests
response - The response will include a
Retry-After
header - Wait for the specified time before retrying
Example error response:
Best Practices
- Implement exponential backoff in your client code, if not using an SDK
- Cache results when possible to reduce API calls
- Optimize your requests to use fewer tokens
Need Higher Limits?
If you need higher limits:
- Contact us or join our Discord community
- Provide details about your use case and expected request volume
- We'll review and adjust your limits if feasible
Remember, we're here to help you succeed. Don't hesitate to reach out if you have any questions or need assistance optimizing your API usage!
Last updated on
Authentication
Authenticate securely with Mixedbread API using API keys. This guide covers key creation, implementation in requests and SDKs, and key management best practices.
Status Codes
Reference for API response status codes and error handling. Learn what each code means, how to interpret error responses, and best practices for implementing robust error handling.