Era IX · 2025–2026
The Reasoning-Model Era
By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026
OpenAI's o-series, released in late 2024, made reasoning explicit. The model spent invisible tokens thinking before answering, billed them separately, and improved on every reasoning benchmark by an order of magnitude. DeepSeek's R1, released January 2025, demonstrated the same pattern at a fraction of the training cost and triggered the open-weights reasoning wave. Cost-per-correct-answer replaced cost-per-token as the operative metric.
o1 and o-series
Released September 2024 as a preview, full models through Q4 2024 and Q1 2025. Reasoning tokens were a separate billable line item. Math and competitive-programming benchmarks shifted decisively.
DeepSeek R1
January 2025. Open weights. Visible <think> blocks. Competitive with o-series on reasoning benchmarks. Triggered a six-week sprint across the open-weights ecosystem to reproduce the recipe.
Extended thinking
Anthropic shipped extended thinking on Claude 3.7 in early 2025 as an opt-in. Reasoning trees visible. By GPT-5 and Claude 4.7, reasoning effort was a tuneable parameter, not a model variant.
Signature models of the era
- o1, o3, o4-mini
- DeepSeek R1
- Claude 3.7 / 4.x with extended thinking
- Gemini 2.5 Pro
Technical shifts
- Reasoning tokens become a billable category
- Math and coding benchmarks shift by an order of magnitude
- Open-weights reasoning becomes viable in months, not years
Market shifts
- DeepSeek R1's release briefly affected US tech equities
- Reasoning-effort tiers introduced across providers
Authentication — is the document from this era?
| Tell | Meaning |
|---|---|
| Visible <think> block before answer | DeepSeek R1 lineage or a derivative. |
| Separate reasoning_tokens line in API billing | OpenAI o-series or GPT-5. |
Agents catalogued in this era
- GPT-5 — OpenAI's reasoning-first flagship. Native chain-of-thought, three reasoning-effort tiers, the highest published benchmark scores at release.
- Gemini 2.5 Pro — Google's reasoning flagship. Two-million-token context, native multimodal, the only frontier model that reads PDFs without an extraction pre-pass.
- DeepSeek R1 — The open-weights reasoning model that printed an industry shockwave. Trained at a fraction of frontier-lab costs.
- Grok 4 — Elon's reasoning flagship. Native Twitter/X integration, willing to discuss what other models won't.
- OpenAI o3 — OpenAI's peak reasoning model before GPT-5. AIME, ARC-AGI, and SWE-bench records at release.
- Phi-4 — Microsoft's small-but-capable reasoning model. Punches above its 14B parameter count.
- Together AI — Inference-as-a-service for open-weights models. Fastest Llama, DeepSeek, and Mixtral access.
Primary sources
- [1] OpenAI: o1 — 2024-09-12
- [2] DeepSeek: R1 — 2025-01-22
From the Almanac shop
The AI Eras — Pocket Field Guide
Ten eras of AI on a single foldable. The Almanac in your pocket.
$19 — Coming soon