Back to Benchmarks

Fiction.LiveBench

Fiction comprehension and reasoning tasks

Frontier
Category:specialized
EDI:135.3
Slope:2.78
View Source

Leaderboard

(22 models)
RankModelScoreStderr
1o3100.00
2Grok 496.90
3GPT-5.296.90
4Gemini 2.5 Pro (Jun 2025)90.60
5Qwen 3 235B68.80
6chatgpt-4o-01-29-202565.60
7Grok-3 mini65.60
8GPT-4.163.90
9o4-mini-2025-04-16 medium62.50
10Minimax M259.40
11o153.10
12DeepSeek V353.10
13Claude 3.7 Sonnet53.10
14GPT-4.1 mini46.90
15Qwen3-235B-A22B44.40
16Kimi K2 Instruct40.60
17Claude Opus 4.537.50
18Gemini 2.0 Pro Exp (Feb 2025)37.50
19Gemini 2.0 Flash Thinking Exp37.50
20Llama-4-Maverick-17B-128E-Instruct36.40
21DeepSeek R133.30
22Llama 4 Scout27.30

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0

Fiction.LiveBench: Top Score 100.0% - AI Benchmark | NeoSignal