Skip to main content
Mistral AI

Mistral Small

Available standard

Cost-efficient model for high-volume applications

Released September 18, 2024

Context Window
128K
TTFT
N/A
Speed
N/A
Max Output
8K
Training Cutoff
2024-06

Last tested: 2026-01-15

About

Mistral Small delivers exceptional value for production workloads. With a 128K context window and remarkably low latency, it handles translation, summarization, and classification tasks at a fraction of the cost of larger models. Ideal for teams optimizing their AI spend without sacrificing quality on routine tasks.

Capabilities

function-calling streaming json-mode multilingual low-latency

Pricing

Input
$0.20/M
per 1M tokens
Output
$0.60/M
per 1M tokens

Static pricing

Details

Mistral Small

Mistral Small exemplifies the efficiency-first philosophy that has made Mistral AI a standout in the European AI landscape. This model is purpose-built for production environments where cost efficiency and speed matter as much as capability.

Efficiency by Design

At just $0.20 per million input tokens, Mistral Small is 10x more affordable than frontier models while delivering impressive performance on a wide range of tasks. The model achieves sub-50ms response times and processes over 150 tokens per second, making it ideal for real-time applications.

Core Capabilities

Translation Excellence: Mistral Small handles translation between major languages with high fidelity, supporting European languages, Asian languages, and more. Its training on diverse multilingual data makes it particularly strong for businesses operating across borders.

Summarization & Classification: The model excels at document summarization, sentiment analysis, and content classification, tasks that represent the majority of enterprise AI workloads.

Function Calling: Full support for structured tool use enables integration into automated workflows, with reliable JSON output for downstream processing.

When to Choose Mistral Small

Mistral Small is the right choice when:

  • You need to process high volumes of text economically
  • Latency requirements demand sub-100ms responses
  • Tasks are well-defined (translation, classification, summarization)
  • Budget optimization is a priority

For complex reasoning or nuanced generation, consider Mistral Large 2 instead.

Production Ready

Mistral Small has been battle-tested in production environments, serving millions of requests daily. Its consistent performance and predictable pricing make it a reliable foundation for AI-powered products.

Newsletter

Get the signal, skip the noise.

Weekly digest of new models and provider updates across 40+ compute providers. Curated for AI builders who ship.

New model releases
Capability updates
Provider status
bots.so
The AI Inference Model Index
© bots.so — The AI Inference Model Index

bots.so aggregates publicly available model deployment information from official provider sources. We are not affiliated with any model provider. Model availability changes rapidly; always verify on official sites.