Back to Benchmarks

OpenBookQA

Open-book question answering requiring common knowledge

Frontier
Category:language
EDI:104.3
Slope:1.15
View Source

Leaderboard

(14 models)
RankModelScoreStderr
1Phi-3-mini-4k-instruct88.00
2Phi-3-small-8k-instruct88.00
3Phi-3-medium-128k-instruct87.40
4gpt-3.5-turbo-110686.00
5Mixtral-8x7B-v0.185.80
6Meta-Llama-3-8B-Instruct82.60
7Mistral-7B-v0.179.80
8gemma-7b78.60
9Phi-473.60
10falcon-180B64.20
11Llama 3.1 405B60.20
12Llama-2-70b-hf60.20
13Llama-2-7b58.60
14GPT-OSS 120B38.80

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0

OpenBookQA: Top Score 88.0% - AI Benchmark | NeoSignal