Back to Benchmarks

SWE-bench Verified

GitHub issue resolution benchmark - tests coding agents on resolving real-world software issues with 500 human-verified instances

N/A
Category:agents
View Source

Methodology

Automated test execution against resolved GitHub issues

Leaderboard

(0 models)
No models have been evaluated on this benchmark yet.

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0

SWE-bench Verified - AI Model Benchmark | NeoSignal