Claude 3 Opus Agent

AgentsStrong reasoning but slower than Sonnet for agents

Anthropic-powered autonomous agent. Strong AgentBench performer at 35.4% overall. Excels at household tasks (55%), knowledge graphs (45%), OS tasks (33%). Powered by Claude 3 Opus.

Metrics

AgentBench

AgentBench Overall35.4

Composite score across all evaluation environments

AgentBench OS33

Operating system task completion accuracy

AgentBench Database26.2

Database query and manipulation performance

AgentBench Knowledge Graph45

Knowledge graph reasoning and retrieval

AgentBench WebShop28

E-commerce navigation and purchasing

AgentBench ALFWorld55

Simulated household task completion

AgentBench Mind2Web18.5

Real-world web browsing task success

Info

Base Modelclaude-3-opus

Underlying foundation model powering the agent

ProviderAnthropic

Organization that created the model

Score Breakdown

adoption

15%70

Community usage, market traction, and ecosystem maturity

tool use

memory context

self reflection

planning reasoning

Compatibility(1)

Sources(1)

github.com

Developed by

Anthropic

Stack

Tools

Registry

Training