Back to BenchmarksFrontier
FrontierMath-2025-02-28-Private
Research-level mathematics problems at the frontier of current AI capabilities
Leaderboard
(25 models)| Rank | Model | Score | Stderr |
|---|---|---|---|
| 1 | GPT-5.2 | 40.70 | ±0.03 |
| 2 | Gemini 3 Pro | 37.60 | ±0.03 |
| 3 | Gemini 2.5 Pro (Jun 2025) | 29.00 | ±0.03 |
| 4 | o4-mini (high) | 24.83 | ±0.03 |
| 5 | DeepSeek V3 | 22.10 | ±0.04 |
| 6 | Claude Opus 4.5 | 20.69 | ±0.02 |
| 7 | Grok 4 | 19.66 | ±0.02 |
| 8 | o4-mini-2025-04-16 medium | 18.97 | ±0.02 |
| 9 | o3 | 18.69 | ±0.02 |
| 10 | Claude Sonnet 4.5 | 13.49 | ±0.02 |
| 11 | o1 | 9.31 | ±0.02 |
| 12 | Qwen 3 235B | 8.48 | ±0.02 |
| 13 | Claude Haiku 4.5 | 5.90 | ±0.01 |
| 14 | Grok-3 mini | 5.86 | ±0.01 |
| 15 | GPT-4.1 | 5.52 | ±0.01 |
| 16 | GPT-4.1 mini | 4.48 | ±0.01 |
| 17 | Claude 3.7 Sonnet | 4.14 | ±0.01 |
| 18 | Qwen Plus | 1.72 | ±0.01 |
| 19 | Qwen2.5-Max | 1.03 | ±0.01 |
| 20 | Llama 4 Maverick (FP8) | 0.69 | ±0.00 |
| 21 | Mistral Large | 0.35 | ±0.00 |
| 22 | GPT-4o | 0.34 | ±0.00 |
| 23 | Claude 3.5 Haiku | 0.34 | ±0.00 |
| 24 | Llama 4 Scout | 0.00 | — |
| 25 | Gemini 1.5 Flash | 0.00 | — |
Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai
Licensed under CC-BY 4.0