Back to Benchmarks

ARC-AGI

Abstract Reasoning Corpus - novel visual reasoning tasks testing general intelligence

Frontier
Category:specialized
EDI:146.0
Slope:4.87
View Source

Leaderboard

(17 models)
RankModelScoreStderr
1GPT-5.286.20
2Claude Opus 4.580.00
3Gemini 3 Pro75.00
4o360.80
5o4-mini (high)58.70
6o4-mini-2025-04-16 medium41.80
7Gemini 2.5 Pro (Jun 2025)33.30
8o130.70
9Claude 3.7 Sonnet28.60
10DeepSeek V321.20
11Grok-3 mini16.50
12DeepSeek R115.80
13GPT-4.110.30
14GPT-4o4.50
15Llama-4-Maverick-17B-128E-Instruct4.40
16GPT-4.1 mini3.50
17Llama 4 Scout0.50

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0