Back to BenchmarksFrontier
FrontierMath-Tier-4-2025-07-01-Private
Tier 4 (hardest) FrontierMath problems - research-level difficulty
Leaderboard
(14 models)| Rank | Model | Score | Stderr |
|---|---|---|---|
| 1 | GPT-5.2 | 29.20 | ±0.07 |
| 2 | Gemini 3 Pro | 18.75 | ±0.06 |
| 3 | Gemini 2.5 Pro (Jun 2025) | 10.40 | ±0.04 |
| 4 | o4-mini (high) | 6.25 | ±0.04 |
| 5 | o3 | 4.17 | ±0.03 |
| 6 | Claude Opus 4.5 | 4.17 | ±0.03 |
| 7 | DeepSeek V3 | 2.10 | ±0.02 |
| 8 | Grok 4 | 2.08 | ±0.02 |
| 9 | Claude Haiku 4.5 | 2.08 | ±0.02 |
| 10 | o4-mini-2025-04-16 medium | 2.08 | ±0.02 |
| 11 | GPT-4.1 | 0.00 | — |
| 12 | Claude 3.7 Sonnet | 0.00 | — |
| 13 | Qwen 3 235B | 0.00 | — |
| 14 | Grok-3 mini | 0.00 | — |
Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai
Licensed under CC-BY 4.0