Back to BenchmarksN/A
SWE-bench Verified
GitHub issue resolution benchmark - tests coding agents on resolving real-world software issues with 500 human-verified instances
Category:agents
View SourceMethodology
Automated test execution against resolved GitHub issues
Leaderboard
(0 models)No models have been evaluated on this benchmark yet.
Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai
Licensed under CC-BY 4.0