kwj.ai · acquisition inquiries from >$999view prospectus →
The Domesday Book ofKWJ · AI

OpenAI · The Tool-Use Inflection

GPT-4o

The omni model. Text, image, audio natively in one system. Speed doubled vs. GPT-4.

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

GPT-4o ('o' for omni) was the first OpenAI model to handle text, image, and audio natively in one system without a modality handoff. The voice mode update enabled sub-300ms conversational response, making it feel qualitatively different from prior voice AI. On text benchmarks it matched GPT-4 Turbo at roughly half the latency.

Field signature

Voice mode responds in under 300ms with emotion and tone tracking.

Specifications

Released2024-05-13
Context window128,000 tokens
Pricing$2.50 / $10 per million tokens
Modalitiestext · image · audio
LicenseCommercial API only
EraThe Tool-Use Inflection

Strengths

  • Native multimodal
  • Speed
  • Voice quality

Weaknesses

  • Reasoning depth behind o-series
  • Context window shorter than Gemini

Authentication markers

The fingerprints by which GPT-4o can be identified from its output alone.

TellMeaning
Native voice mode without a TTS post-processing step.GPT-4o or GPT-4o-mini.

Notable works

  • First sub-300ms conversational voice AI at frontier quality

Market position

$2.50-$10 per million tokens

Partner offer

OpenAI's API surface remains the broadest commercial offering.

Try OpenAI →

Affiliate link — see disclosure.

Primary sources

  1. [1] OpenAI: GPT-4o

From the Almanac shop

The Operator's Compendium

Every agent harness, every routing pattern, every cost trick. 90-page PDF.

$29Coming soon

Back to the directory