Mixtral 8x22B
Available frontierSparse mixture-of-experts model with frontier performance
Released April 17, 2024
- Context Window
- 64K
- TTFT
- N/A
- Speed
- N/A
- Max Output
- 4K
- Training Cutoff
- 2024-03
Last tested: 2026-01-15
About
Mixtral 8x22B is Mistral's most capable open-weights model, using sparse mixture-of-experts architecture to achieve near-frontier performance with remarkable efficiency. With 176B total parameters but only 44B active per forward pass, it delivers exceptional reasoning and multilingual capabilities while remaining cost-effective to deploy.
Capabilities
Pricing
- Input
- $1.00/M per 1M tokens
- Output
- $3.00/M per 1M tokens
Static pricing
Details
Mixtral 8x22B
Mixtral 8x22B represents a breakthrough in efficient AI architecture. By employing a sparse mixture-of-experts (SMoE) design, Mistral has created a model that rivals dense frontier models in capability while requiring significantly less compute per inference.
Mixture-of-Experts Architecture
The model consists of 8 expert networks, each with 22 billion parameters, totaling 176 billion parameters. However, the routing mechanism activates only 2 experts per token, meaning just 44 billion parameters are used for each forward pass. This sparse activation delivers frontier-level intelligence at a fraction of the computational cost.
Benchmark Performance
Mixtral 8x22B competes with models like GPT-4 and Claude on key benchmarks:
- MMLU: Strong performance across 57 academic subjects
- HumanEval: Competitive code generation capabilities
- GSM8K: Solid mathematical reasoning
- Multilingual: Excellent performance in French, German, Spanish, Italian, and English
Open Weights Advantage
Unlike proprietary frontier models, Mixtral 8x22B is available under Apache 2.0 license. This means:
- Self-hosting: Deploy on your own infrastructure for data sovereignty
- Fine-tuning: Customize for domain-specific applications
- Cost control: Predictable infrastructure costs without per-token API fees
- Privacy: Keep sensitive data entirely on-premises
European AI Innovation
Mixtral 8x22B demonstrates that Europe can compete at the frontier of AI research. Mistral’s team has shown that architectural innovation can deliver capabilities previously thought to require massive dense models, opening new possibilities for efficient AI deployment.
Best Use Cases
- Organizations requiring data sovereignty
- High-volume inference with predictable costs
- Fine-tuning for specialized domains
- Research and experimentation
- Multilingual enterprise applications
Provider
Other Mistral AI Models
See all models- Mistral Large 2
- Mistral's flagship model for enterprise reasoning
- Context
- 128K
- Speed
- N/A
- TTFT
- N/A
- Mistral Small
- Cost-efficient model for high-volume applications
- Context
- 128K
- Speed
- N/A
- TTFT
- N/A
- Codestral
- Purpose-built for code generation and software engineering
- Context
- 32K
- Speed
- N/A
- TTFT
- N/A
Get the signal, skip the noise.
Weekly digest of new models and provider updates across 40+ compute providers. Curated for AI builders who ship.