Era VII · 2024–2025
The Long-Context Stretch
By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026
Context windows kept stretching. Claude 2's 100K, then Gemini's 1M, then Gemini's 2M, then Claude Opus's 1M opt-in. Operators rebuilt retrieval pipelines around the new capability; some of them shrank to nothing, others stayed for cost reasons. The effective ceiling, as opposed to the advertised one, lagged the marketing throughout.
Beyond 100K
Claude 2's 100K context in 2023 was an outlier. By mid-2024 it was a baseline. Gemini 1.5 announced 1M tokens in February 2024 and 2M soon after. Claude Opus added 1M as an opt-in in 2026.
Needle-in-a-haystack
The benchmark of the era. Inject a fact at a depth, ask for it, score recall. Frontier models scored 100% within a few months of any new context length. The benchmark stopped distinguishing models long before the underlying capability differences disappeared.
The cost ceiling
Operators discovered that the published price for long-context calls doubled past a depth threshold. The 2M context Gemini cost more than five times its 200K rate for the same input length. Few production workloads actually filled the window.
Signature models of the era
- Claude 2 (100K)
- Gemini 1.5 Pro (1M, then 2M)
- Claude Opus 4 (1M opt-in)
Technical shifts
- Quadratic attention pushed past 200K becomes economically painful
- Retrieval pipelines simplified
- Multi-document workflows in single calls
Market shifts
- RAG-as-a-service market matures and partially consolidates
- Vector database market plateaus
Authentication — is the document from this era?
| Tell | Meaning |
|---|---|
| Mentions 'needle in a haystack' as a recent benchmark | Document dates from 2024. |
Agents catalogued in this era
- GPT-4.5 — The last non-reasoning OpenAI flagship. Better at vibes than reasoning. Quietly deprecated in favor of GPT-5.
- Claude 3.5 Sonnet — The model that made Claude the coding default. Coding benchmarks beat GPT-4o at launch.
- Command R+ — Enterprise RAG specialist. Built for retrieval pipelines, not chat.
- Llama 3.1 405B — The model that first made the open-weights vs. closed-source comparison uncomfortable for the labs.
- Gemini 1.5 Pro — The model that proved 1M-token context was practical. Video and audio native.
- Claude 3 Opus — The model that finally made the 'Claude is better than GPT-4' argument defensible.
- Gemini Ultra 1.0 — Google's first frontier-class model, released to counter GPT-4 and Claude 3.
- GPT-4 Turbo — 128K context, knowledge cutoff updated, cost dropped 3×. The model that made GPT-4 accessible.
Primary sources
- [1] Google: Gemini 1.5 — 2024-02-15
From the Almanac shop
The AI Eras — Pocket Field Guide
Ten eras of AI on a single foldable. The Almanac in your pocket.
$19 — Coming soon