kwj.ai · acquisition inquiries from >$999view prospectus →
The Domesday Book ofKWJ · AI

Era VII · 2024–2025

The Long-Context Stretch

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Context windows kept stretching. Claude 2's 100K, then Gemini's 1M, then Gemini's 2M, then Claude Opus's 1M opt-in. Operators rebuilt retrieval pipelines around the new capability; some of them shrank to nothing, others stayed for cost reasons. The effective ceiling, as opposed to the advertised one, lagged the marketing throughout.

Beyond 100K

Claude 2's 100K context in 2023 was an outlier. By mid-2024 it was a baseline. Gemini 1.5 announced 1M tokens in February 2024 and 2M soon after. Claude Opus added 1M as an opt-in in 2026.

Needle-in-a-haystack

The benchmark of the era. Inject a fact at a depth, ask for it, score recall. Frontier models scored 100% within a few months of any new context length. The benchmark stopped distinguishing models long before the underlying capability differences disappeared.

The cost ceiling

Operators discovered that the published price for long-context calls doubled past a depth threshold. The 2M context Gemini cost more than five times its 200K rate for the same input length. Few production workloads actually filled the window.

Signature models of the era

  • Claude 2 (100K)
  • Gemini 1.5 Pro (1M, then 2M)
  • Claude Opus 4 (1M opt-in)

Technical shifts

  • Quadratic attention pushed past 200K becomes economically painful
  • Retrieval pipelines simplified
  • Multi-document workflows in single calls

Market shifts

  • RAG-as-a-service market matures and partially consolidates
  • Vector database market plateaus

Authentication — is the document from this era?

TellMeaning
Mentions 'needle in a haystack' as a recent benchmarkDocument dates from 2024.

Agents catalogued in this era

  • GPT-4.5The last non-reasoning OpenAI flagship. Better at vibes than reasoning. Quietly deprecated in favor of GPT-5.
  • Claude 3.5 SonnetThe model that made Claude the coding default. Coding benchmarks beat GPT-4o at launch.
  • Command R+Enterprise RAG specialist. Built for retrieval pipelines, not chat.
  • Llama 3.1 405BThe model that first made the open-weights vs. closed-source comparison uncomfortable for the labs.
  • Gemini 1.5 ProThe model that proved 1M-token context was practical. Video and audio native.
  • Claude 3 OpusThe model that finally made the 'Claude is better than GPT-4' argument defensible.
  • Gemini Ultra 1.0Google's first frontier-class model, released to counter GPT-4 and Claude 3.
  • GPT-4 Turbo128K context, knowledge cutoff updated, cost dropped 3×. The model that made GPT-4 accessible.

Primary sources

  1. [1] Google: Gemini 1.52024-02-15

From the Almanac shop

The AI Eras — Pocket Field Guide

Ten eras of AI on a single foldable. The Almanac in your pocket.

$19Coming soon

Back to the timeline