Back to Benchmarks

PIQA

Physical intuition question answering

Very Hard
Category:language
EDI:74.0
Slope:0.37
View Source

Leaderboard

(16 models)
RankModelScoreStderr
1gpt-4o-mini-2024-07-1888.70
2Phi-3-mini-4k-instruct88.60
3Gemini 1.5 Flash87.50
4Llama 3.1 405B85.90
5falcon-180B84.90
6DeepSeek V384.70
7Gemma 3 27B83.70
8Mixtral-8x7B-v0.183.60
9Mistral Large83.50
10Mistral-7B-v0.183.00
11Llama-2-70b-hf82.80
12Qwen 2.5 72B82.60
13Llama-2-7b81.90
14gemma-7b81.20
15Qwen 3 235B79.90
16GPT-OSS 120B76.70

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0

PIQA: Top Score 88.7% - AI Benchmark | NeoSignal