The Long-Context Stretch

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Context windows kept stretching. Claude 2's 100K, then Gemini's 1M, then Gemini's 2M, then Claude Opus's 1M opt-in. Operators rebuilt retrieval pipelines around the new capability; some of them shrank to nothing, others stayed for cost reasons. The effective ceiling, as opposed to the advertised one, lagged the marketing throughout.

Beyond 100K

Claude 2's 100K context in 2023 was an outlier. By mid-2024 it was a baseline. Gemini 1.5 announced 1M tokens in February 2024 and 2M soon after. Claude Opus added 1M as an opt-in in 2026.

Needle-in-a-haystack

The benchmark of the era. Inject a fact at a depth, ask for it, score recall. Frontier models scored 100% within a few months of any new context length. The benchmark stopped distinguishing models long before the underlying capability differences disappeared.

The cost ceiling

Operators discovered that the published price for long-context calls doubled past a depth threshold. The 2M context Gemini cost more than five times its 200K rate for the same input length. Few production workloads actually filled the window.

Signature models of the era

Claude 2 (100K)
Gemini 1.5 Pro (1M, then 2M)
Claude Opus 4 (1M opt-in)

Technical shifts

Quadratic attention pushed past 200K becomes economically painful
Retrieval pipelines simplified
Multi-document workflows in single calls

Market shifts

RAG-as-a-service market matures and partially consolidates
Vector database market plateaus

Authentication — is the document from this era?

Tell	Meaning
Mentions 'needle in a haystack' as a recent benchmark	Document dates from 2024.

Agents catalogued in this era

GPT-4.5 — The last non-reasoning OpenAI flagship. Better at vibes than reasoning. Quietly deprecated in favor of GPT-5.
Claude 3.5 Sonnet — The model that made Claude the coding default. Coding benchmarks beat GPT-4o at launch.
Command R+ — Enterprise RAG specialist. Built for retrieval pipelines, not chat.
Llama 3.1 405B — The model that first made the open-weights vs. closed-source comparison uncomfortable for the labs.
Gemini 1.5 Pro — The model that proved 1M-token context was practical. Video and audio native.
Claude 3 Opus — The model that finally made the 'Claude is better than GPT-4' argument defensible.
Gemini Ultra 1.0 — Google's first frontier-class model, released to counter GPT-4 and Claude 3.
GPT-4 Turbo — 128K context, knowledge cutoff updated, cost dropped 3×. The model that made GPT-4 accessible.

Primary sources

[1] Google: Gemini 1.5 — 2024-02-15

From the Almanac shop

The AI Eras — Pocket Field Guide

Ten eras of AI on a single foldable. The Almanac in your pocket.

$19 — Coming soon

← Back to the timeline