Back to BenchmarksFrontier
CadEval
CAD/technical code evaluation tasks
Leaderboard
(10 models)| Rank | Model | Score | Stderr |
|---|---|---|---|
| 1 | o3 | 74.00 | — |
| 2 | Gemini 2.5 Pro (Jun 2025) | 64.00 | — |
| 3 | o4-mini-2025-04-16 medium | 62.00 | — |
| 4 | o1 | 56.00 | — |
| 5 | Claude 3.7 Sonnet | 54.00 | — |
| 6 | GPT-4.1 | 42.00 | — |
| 7 | Gemini 1.5 Flash | 34.00 | — |
| 8 | Claude 3.5 Haiku | 32.00 | — |
| 9 | GPT-4o | 26.00 | — |
| 10 | GPT-4.1 mini | 16.00 | — |
Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai
Licensed under CC-BY 4.0