Claude 3.5 Haiku

Available standard

Fast and affordable model for high-volume production workloads

Released October 22, 2024

API Documentation View Provider

Context Window

200K

TTFT

N/A

Speed

N/A

Max Output

8,192

Training Cutoff

Apr 2024

Last tested: 2026-01-15

About

Claude 3.5 Haiku delivers impressive speed and capability at Anthropic's most affordable price tier. Ideal for real-time applications, chatbots, and high-throughput processing with 200K context support.

Capabilities

vision function-calling streaming

Pricing

Input: $0.80/M
Output: $4.00/M

Live pricing Static pricing

Batch

$0.40/M input, $2.00/M output

per 1M tokens

Details

Overview

Claude 3.5 Haiku is Anthropic’s fastest model, optimized for production workloads requiring low latency and high throughput. Despite its speed, it maintains strong reasoning capabilities that surpass previous generation models like Claude 3 Opus on many benchmarks.

Key Strengths

Exceptional Speed: With response times under 70ms and throughput exceeding 180 tokens per second, Claude 3.5 Haiku is ideal for real-time applications where latency matters.

Cost Efficiency: At $0.80/M input tokens, it provides enterprise-grade AI capabilities at a fraction of frontier model pricing, making it viable for high-volume applications.

Capable Reasoning: Despite its speed focus, Haiku 3.5 delivers surprisingly strong performance on coding, analysis, and general reasoning tasks.

Use Cases

Customer support chatbots and virtual assistants
Real-time content moderation
High-volume document classification
Interactive coding assistants
Data extraction and processing pipelines

Performance Comparison

Claude 3.5 Haiku outperforms Claude 3 Opus on several benchmarks while being significantly faster and more affordable:

Benchmark	Claude 3.5 Haiku	Claude 3 Opus
MMLU	74.9%	86.8%
GSM8K	88.2%	95.0%
HumanEval	84.8%	84.9%

API Integration

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Classify this customer inquiry."}
    ]
)