Back to Benchmarks

The Agent Company

Multi-step workplace automation tasks testing autonomous agent capabilities

Frontier
Category:agents
EDI:146.9
Slope:3.21
View Source

Leaderboard

(8 models)
RankModelScoreStderr
1Claude 3.7 Sonnet52.73
2Claude Opus 4.546.45
3Gemini 2.5 Pro (Jun 2025)39.85
4DeepSeek V329.91
5Qwen2.5-Max23.99
6Llama 3.1 405B22.90
7Gemini 1.5 Flash22.10
8GPT-4o14.55

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0

The Agent Company: Top Score 52.7% - AI Benchmark | NeoSignal