Research·8 min
Benchmark Inflation: Why MMLU Scores Stopped Mattering
By C.W. Jameson · Published 20 February 2026 · Last reviewed 20 March 2026
MMLU hit 90%. The frontier moved to harder tests. Here is a guide to which benchmarks still separate good from great.
How AI benchmarks get saturated and what to use instead when evaluating models for your actual tasks.
Related dispatches