kwj.ai · acquisition inquiries from >$999❦view prospectus →

The Domesday Book ofKWJ · AI

OpenAI · The Tool-Use Inflection

GPT-4o

The omni model. Text, image, audio natively in one system. Speed doubled vs. GPT-4.

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

GPT-4o ('o' for omni) was the first OpenAI model to handle text, image, and audio natively in one system without a modality handoff. The voice mode update enabled sub-300ms conversational response, making it feel qualitatively different from prior voice AI. On text benchmarks it matched GPT-4 Turbo at roughly half the latency.

Field signature

Voice mode responds in under 300ms with emotion and tone tracking.

Specifications

Released	2024-05-13
Context window	128,000 tokens
Pricing	$2.50 / $10 per million tokens
Modalities	text · image · audio
License	Commercial API only
Era	The Tool-Use Inflection

Strengths

Native multimodal
Speed
Voice quality

Weaknesses

Reasoning depth behind o-series
Context window shorter than Gemini

Authentication markers

The fingerprints by which GPT-4o can be identified from its output alone.

Tell	Meaning
Native voice mode without a TTS post-processing step.	GPT-4o or GPT-4o-mini.

Notable works

First sub-300ms conversational voice AI at frontier quality

Market position

$2.50-$10 per million tokens

Partner offer

OpenAI's API surface remains the broadest commercial offering.

Affiliate link — see disclosure.

Primary sources

[1] OpenAI: GPT-4o

From the Almanac shop

The Operator's Compendium

Every agent harness, every routing pattern, every cost trick. 90-page PDF.

$29 — Coming soon

← Back to the directory