The Reasoning-Model Era

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

OpenAI's o-series, released in late 2024, made reasoning explicit. The model spent invisible tokens thinking before answering, billed them separately, and improved on every reasoning benchmark by an order of magnitude. DeepSeek's R1, released January 2025, demonstrated the same pattern at a fraction of the training cost and triggered the open-weights reasoning wave. Cost-per-correct-answer replaced cost-per-token as the operative metric.

o1 and o-series

Released September 2024 as a preview, full models through Q4 2024 and Q1 2025. Reasoning tokens were a separate billable line item. Math and competitive-programming benchmarks shifted decisively.

DeepSeek R1

January 2025. Open weights. Visible <think> blocks. Competitive with o-series on reasoning benchmarks. Triggered a six-week sprint across the open-weights ecosystem to reproduce the recipe.

Extended thinking

Anthropic shipped extended thinking on Claude 3.7 in early 2025 as an opt-in. Reasoning trees visible. By GPT-5 and Claude 4.7, reasoning effort was a tuneable parameter, not a model variant.

Signature models of the era

o1, o3, o4-mini
DeepSeek R1
Claude 3.7 / 4.x with extended thinking
Gemini 2.5 Pro

Technical shifts

Reasoning tokens become a billable category
Math and coding benchmarks shift by an order of magnitude
Open-weights reasoning becomes viable in months, not years

Market shifts

DeepSeek R1's release briefly affected US tech equities
Reasoning-effort tiers introduced across providers

Authentication — is the document from this era?

Tell	Meaning
Visible <think> block before answer	DeepSeek R1 lineage or a derivative.
Separate reasoning_tokens line in API billing	OpenAI o-series or GPT-5.

Agents catalogued in this era

GPT-5 — OpenAI's reasoning-first flagship. Native chain-of-thought, three reasoning-effort tiers, the highest published benchmark scores at release.
Gemini 2.5 Pro — Google's reasoning flagship. Two-million-token context, native multimodal, the only frontier model that reads PDFs without an extraction pre-pass.
DeepSeek R1 — The open-weights reasoning model that printed an industry shockwave. Trained at a fraction of frontier-lab costs.
Grok 4 — Elon's reasoning flagship. Native Twitter/X integration, willing to discuss what other models won't.
OpenAI o3 — OpenAI's peak reasoning model before GPT-5. AIME, ARC-AGI, and SWE-bench records at release.
Phi-4 — Microsoft's small-but-capable reasoning model. Punches above its 14B parameter count.
Together AI — Inference-as-a-service for open-weights models. Fastest Llama, DeepSeek, and Mixtral access.

Primary sources

[1] OpenAI: o1 — 2024-09-12
[2] DeepSeek: R1 — 2025-01-22

From the Almanac shop

The AI Eras — Pocket Field Guide

Ten eras of AI on a single foldable. The Almanac in your pocket.

$19 — Coming soon

← Back to the timeline