GPT-4o

Available frontier

OpenAI's flagship multimodal model

Released May 13, 2024

API Documentation View Provider

Context Window

128K

TTFT

N/A

Speed

N/A

Max Output

16,384

Training Cutoff

Oct 2023

Last tested: 2026-01-15

About

GPT-4o is OpenAI's most advanced multimodal model, capable of processing text, images, and audio with native support. The 'o' stands for 'omni', reflecting its ability to handle multiple modalities seamlessly. It delivers GPT-4 Turbo-level intelligence while being faster and more cost-effective.

Capabilities

vision function-calling streaming json-mode structured-outputs

Pricing

Input: $2.50/M
Output: $10.00/M

Live pricing Static pricing

Batch

$1.25/M

per 1M tokens

Details

Overview

GPT-4o represents OpenAI’s flagship multimodal AI model, combining state-of-the-art language understanding with native vision and audio capabilities. Released in May 2024, it delivers the intelligence of GPT-4 Turbo while achieving 2x faster response times and 50% lower costs.

Key Features

Native Multimodality: Process text, images, and audio in a single model without separate pipelines
Enhanced Vision: Analyze charts, diagrams, screenshots, and complex visual content with high accuracy
Structured Outputs: Generate guaranteed valid JSON matching your specified schema
Function Calling: Seamlessly integrate with external tools and APIs
Real-time Streaming: Receive responses token-by-token for responsive applications

Use Cases

GPT-4o excels in applications requiring sophisticated reasoning combined with visual understanding, including document analysis, code review with screenshots, creative content generation, and complex conversational AI systems.

Performance

With a 128K token context window, GPT-4o can process lengthy documents, extensive codebases, and long conversation histories while maintaining coherent understanding across the full context.