kwj.ai · acquisition inquiries from >$999view prospectus →
The Domesday Book ofKWJ · AI

Era IX · 2025–2026

The Reasoning-Model Era

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

OpenAI's o-series, released in late 2024, made reasoning explicit. The model spent invisible tokens thinking before answering, billed them separately, and improved on every reasoning benchmark by an order of magnitude. DeepSeek's R1, released January 2025, demonstrated the same pattern at a fraction of the training cost and triggered the open-weights reasoning wave. Cost-per-correct-answer replaced cost-per-token as the operative metric.

o1 and o-series

Released September 2024 as a preview, full models through Q4 2024 and Q1 2025. Reasoning tokens were a separate billable line item. Math and competitive-programming benchmarks shifted decisively.

DeepSeek R1

January 2025. Open weights. Visible <think> blocks. Competitive with o-series on reasoning benchmarks. Triggered a six-week sprint across the open-weights ecosystem to reproduce the recipe.

Extended thinking

Anthropic shipped extended thinking on Claude 3.7 in early 2025 as an opt-in. Reasoning trees visible. By GPT-5 and Claude 4.7, reasoning effort was a tuneable parameter, not a model variant.

Signature models of the era

  • o1, o3, o4-mini
  • DeepSeek R1
  • Claude 3.7 / 4.x with extended thinking
  • Gemini 2.5 Pro

Technical shifts

  • Reasoning tokens become a billable category
  • Math and coding benchmarks shift by an order of magnitude
  • Open-weights reasoning becomes viable in months, not years

Market shifts

  • DeepSeek R1's release briefly affected US tech equities
  • Reasoning-effort tiers introduced across providers

Authentication — is the document from this era?

TellMeaning
Visible <think> block before answerDeepSeek R1 lineage or a derivative.
Separate reasoning_tokens line in API billingOpenAI o-series or GPT-5.

Agents catalogued in this era

  • GPT-5OpenAI's reasoning-first flagship. Native chain-of-thought, three reasoning-effort tiers, the highest published benchmark scores at release.
  • Gemini 2.5 ProGoogle's reasoning flagship. Two-million-token context, native multimodal, the only frontier model that reads PDFs without an extraction pre-pass.
  • DeepSeek R1The open-weights reasoning model that printed an industry shockwave. Trained at a fraction of frontier-lab costs.
  • Grok 4Elon's reasoning flagship. Native Twitter/X integration, willing to discuss what other models won't.
  • OpenAI o3OpenAI's peak reasoning model before GPT-5. AIME, ARC-AGI, and SWE-bench records at release.
  • Phi-4Microsoft's small-but-capable reasoning model. Punches above its 14B parameter count.
  • Together AIInference-as-a-service for open-weights models. Fastest Llama, DeepSeek, and Mixtral access.

Primary sources

  1. [1] OpenAI: o12024-09-12
  2. [2] DeepSeek: R12025-01-22

From the Almanac shop

The AI Eras — Pocket Field Guide

Ten eras of AI on a single foldable. The Almanac in your pocket.

$19Coming soon

Back to the timeline