Back to Benchmarks

DeepResearch Bench

Multi-document synthesis and research tasks testing deep research capabilities

Frontier
Category:specialized
EDI:148.8
Slope:0.80
View Source

Leaderboard

(7 models)
RankModelScoreStderr
1Claude Opus 4.552.60
2GPT-5.251.00
3Grok 447.90
4o346.60
5Claude 3.7 Sonnet43.60
6Gemini 2.5 Pro (Jun 2025)42.80
7DeepSeek V335.10

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0