Back to Benchmarks

TriviaQA

Trivia questions requiring broad knowledge

Medium
Category:language
EDI:48.3
Slope:0.40
View Source

Leaderboard

(18 models)
RankModelScoreStderr
1Llama-2-70b-hf87.60
2Claude 3.7 Sonnet87.50
3Llama 3.1 405B86.00
4gpt-3.5-turbo-110685.80
5GPT-4.184.80
6Llama-2-7b84.60
7DeepSeek V382.90
8Mixtral-8x7B-v0.182.20
9falcon-180B79.90
10Claude Opus 4.578.90
11Mistral-7B-v0.175.20
12Phi-3-medium-128k-instruct73.90
13gemma-7b72.30
14Qwen 2.5 72B71.90
15Meta-Llama-3-8B-Instruct67.70
16Phi-3-mini-4k-instruct64.00
17Phi-3-small-8k-instruct58.10
18Phi-445.20

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0