DeepSeek V3
Available frontier671B MoE model with groundbreaking cost efficiency
Released December 26, 2024
Last tested: 2026-01-15
About
DeepSeek V3 is a Mixture-of-Experts model with 671B total parameters but only 37B active per token. It achieves GPT-4 level performance at a fraction of the cost, representing a major breakthrough in efficient AI architecture.
Capabilities
Pricing
- Input
- $1.25/M per 1M tokens
- Output
- $1.25/M per 1M tokens
Static pricing
Details
DeepSeek V3: The MoE Revolution
DeepSeek V3 represents a paradigm shift in large language model efficiency. With its innovative Mixture-of-Experts architecture, the model contains 671 billion total parameters while activating only 37 billion per token, delivering frontier-level performance at dramatically reduced computational costs.
Architecture Breakthrough
The MoE (Mixture-of-Experts) design routes each token through specialized expert networks, achieving:
- Efficient Inference: Only 37B of 671B parameters active per forward pass
- Lower Costs: 10-50x more cost-effective than comparable dense models
- Competitive Performance: Matches GPT-4 class models on most benchmarks
Key Capabilities
- Reasoning: Strong performance on math, coding, and logic tasks
- Multilingual: Excellent Chinese and English support with broad language coverage
- Long Context: Full 128K context window for document analysis
- Code Generation: Competitive with specialized code models
Cost Efficiency
DeepSeek V3 disrupted the industry with its training efficiency - reportedly trained for under $6M compared to hundreds of millions for comparable models. This efficiency extends to inference, making it one of the most cost-effective frontier models available.
Best Use Cases
- Production applications requiring GPT-4 level quality at lower cost
- Multilingual applications with Chinese language requirements
- Code generation and technical documentation
- Long-context document analysis and summarization
Other Together AI Models
See all modelsGet the signal, skip the noise.
Weekly digest of new models and provider updates across 41+ compute providers. Curated for AI builders who ship.