Back to Benchmarks

FrontierMath-Tier-4-2025-07-01-Private

Tier 4 (hardest) FrontierMath problems - research-level difficulty

Frontier
Category:mathematics
EDI:165.4
Slope:3.51
View Source

Leaderboard

(14 models)
RankModelScoreStderr
1GPT-5.229.20±0.07
2Gemini 3 Pro18.75±0.06
3Gemini 2.5 Pro (Jun 2025)10.40±0.04
4o4-mini (high)6.25±0.04
5o34.17±0.03
6Claude Opus 4.54.17±0.03
7DeepSeek V32.10±0.02
8Grok 42.08±0.02
9Claude Haiku 4.52.08±0.02
10o4-mini-2025-04-16 medium2.08±0.02
11GPT-4.10.00
12Claude 3.7 Sonnet0.00
13Qwen 3 235B0.00
14Grok-3 mini0.00

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0

FrontierMath-Tier-4-2025-07-01-Private: Top Score 29.2% - AI Benchmark | NeoSignal