Back to Benchmarks

GSO-Bench

General science and observation benchmark

Frontier
Category:specialized
EDI:165.3
Slope:2.78
View Source

Leaderboard

(10 models)
RankModelScoreStderr
1GPT-5.227.40
2Claude Opus 4.526.50
3Gemini 3 Pro18.60
4o38.80
5kimi-k2-thinking (official)4.90
6Qwen3-Max-Instruct4.90
7Claude 3.7 Sonnet4.60
8Gemini 2.5 Pro (Jun 2025)3.90
9o4-mini (high)3.60
10GPT-4o0.00

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0

GSO-Bench: Top Score 27.4% - AI Benchmark | NeoSignal