Back to Benchmarks

Aider polyglot

Code editing tasks across multiple programming languages using Aider framework

Frontier
Category:coding
EDI:138.7
Slope:3.97
View Source

Leaderboard

(35 models)
RankModelScoreStderr
1GPT-5.288.00
2o384.90
3Gemini 2.5 Pro (Jun 2025)83.10
4Grok 479.60
5DeepSeek V374.20
6o4-mini (high)72.00
7claude-opus-4-20250514 32K72.00
8Claude Opus 4.570.70
9Claude 3.7 Sonnet64.90
10o161.70
11Qwen3-235B-A22B59.60
12kimi-k2-thinking (official)59.10
13DeepSeek R156.90
14Grok-3 mini53.30
15GPT-4.152.40
16chatgpt-4o-03-27-202545.30
17GPT-OSS 120B41.80
18Qwen3-32B40.00
19Gemini 3 Pro38.20
20Gemini 2.0 Pro Exp (Feb 2025)35.60
21GPT-4.1 mini32.40
22Claude 3.5 Haiku28.00
23chatgpt-4o-01-29-202527.10
24GPT-4o23.10
25Qwen2.5-Max21.80
26QwQ-32B20.90
27Gemini 2.0 Flash Thinking Exp18.20
28DeepSeek-V2.517.80
29Qwen2.5-Coder-32B-Instruct16.40
30Llama-4-Maverick-17B-128E-Instruct15.60
31yi-lightning12.90
32c4ai-command-a-03-202512.00
33Codestral11.10
34Gemma 3 27B4.90
35gpt-4o-mini-2024-07-183.60

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0