Back to Benchmarks

Lech Mazur Writing

Writing quality and style evaluation

Frontier
Category:language
EDI:108.4
Slope:0.92
View Source

Leaderboard

(27 models)
RankModelScoreStderr
1GPT-5.28.72
2Qwen3-Max-Instruct8.71
3kimi-k2-thinking (official)8.69
4o38.63
5Gemini 2.5 Pro (Jun 2025)8.60
6Claude Opus 4.58.54
7DeepSeek V38.52
8Qwen 3 235B8.49
9Qwen3-235B-A22B8.30
10DeepSeek R18.30
11GPT-4o8.18
12Grok 48.11
13Claude 3.7 Sonnet8.11
14QwQ-32B8.07
15Gemma 3 27B7.99
16GPT-OSS 120B7.73
17Grok-3 mini7.64
18GPT-4.17.56
19o4-mini-2025-04-16 medium7.50
20Gemini 2.0 Flash Thinking Exp7.38
21Claude 3.5 Haiku7.35
22Qwen2.5-Max7.29
23o17.02
24Mistral Large6.90
25gpt-4o-mini-2024-07-186.72
26Llama-4-Maverick-17B-128E-Instruct6.37
27Phi-46.26

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0