Back to Benchmarks

CadEval

CAD/technical code evaluation tasks

Frontier
Category:coding
EDI:139.8
Slope:2.28
View Source

Leaderboard

(10 models)
RankModelScoreStderr
1o374.00
2Gemini 2.5 Pro (Jun 2025)64.00
3o4-mini-2025-04-16 medium62.00
4o156.00
5Claude 3.7 Sonnet54.00
6GPT-4.142.00
7Gemini 1.5 Flash34.00
8Claude 3.5 Haiku32.00
9GPT-4o26.00
10GPT-4.1 mini16.00

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0