Back to Benchmarks

Cybench

Cybersecurity CTF challenges testing security analysis and exploitation

Frontier
Category:agents
EDI:147.4
Slope:3.31
View Source

Leaderboard

(11 models)
RankModelScoreStderr
1Claude Opus 4.555.00
2o322.50
3Claude 3.7 Sonnet20.00
4GPT-4.117.50
5GPT-4o12.50
6o110.00
7Claude 3 Opus10.00
8Mixtral-8x7B-v0.17.50
9Llama 3.1 405B7.50
10Gemini 1.5 Flash7.50
11Meta-Llama-3-8B-Instruct5.00

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0

Cybench: Top Score 55.0% - AI Benchmark | NeoSignal