Back to Benchmarks

FrontierMath-2025-02-28-Private

Research-level mathematics problems at the frontier of current AI capabilities

Frontier
Category:mathematics
EDI:155.9
Slope:3.72
View Source

Leaderboard

(25 models)
RankModelScoreStderr
1GPT-5.240.70±0.03
2Gemini 3 Pro37.60±0.03
3Gemini 2.5 Pro (Jun 2025)29.00±0.03
4o4-mini (high)24.83±0.03
5DeepSeek V322.10±0.04
6Claude Opus 4.520.69±0.02
7Grok 419.66±0.02
8o4-mini-2025-04-16 medium18.97±0.02
9o318.69±0.02
10Claude Sonnet 4.513.49±0.02
11o19.31±0.02
12Qwen 3 235B8.48±0.02
13Claude Haiku 4.55.90±0.01
14Grok-3 mini5.86±0.01
15GPT-4.15.52±0.01
16GPT-4.1 mini4.48±0.01
17Claude 3.7 Sonnet4.14±0.01
18Qwen Plus1.72±0.01
19Qwen2.5-Max1.03±0.01
20Llama 4 Maverick (FP8)0.69±0.00
21Mistral Large0.35±0.00
22GPT-4o0.34±0.00
23Claude 3.5 Haiku0.34±0.00
24Llama 4 Scout0.00
25Gemini 1.5 Flash0.00

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0