Research·8 min

SWE-bench: The Benchmark That Redefined AI Coding

By C.W. Jameson · Published 15 October 2025 · Last reviewed 15 November 2025

SWE-bench tasks agents with fixing real GitHub issues. The progression from 5% to over 50% in 18 months was extraordinary.

How SWE-bench was created, why it matters, and what the score progression from 5% to 50%+ reveals about AI coding ability.

Related dispatches