Claude Opus 4.5 achieved 80.9% on SWE-bench Verified, becoming the first model to exceed the 80% threshold. This represents a 21x improvement from GPT-4's initial 3.8% score in October 2023, demonstrating rapid progress in AI coding capabilities.
Claude Opus 4.5 achieved 80.9% on SWE-bench Verified, first to break 80%
Jan 11, 2026GPT-4o + Agentless at 76.2%, Claude Sonnet 4 at 72.4%
Jan 11, 2026Progress from 3.8% (GPT-4, Oct 2023) to 80.9% in 26 months
Jan 11, 2026