Back to Benchmarks

ScienceQA

Science question answering across multiple domains

Frontier
Category:language
EDI:109.6
Slope:1.90
View Source

Leaderboard

(8 models)
RankModelScoreStderr
1Phi-3-mini-4k-instruct91.30
2GPT-4o88.50
3Gemini 1.5 Flash79.70
4falcon-180B74.90
5Llama 3.1 405B73.70
6Claude 3.7 Sonnet72.00
7Qwen Plus68.20
8Llama-2-7b55.78

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0