OpenAI-powered autonomous agent. Top-tier AgentBench performer at 40.2% overall. Excels at household tasks (68%), knowledge graphs (52%), OS tasks (39%). Strong tool use capabilities. Powered by GPT-4 Turbo.
Composite score across all evaluation environments
Operating system task completion accuracy
Database query and manipulation performance
Knowledge graph reasoning and retrieval
E-commerce navigation and purchasing
Simulated household task completion
Real-world web browsing task success
Underlying foundation model powering the agent
Organization that created the model
Community usage, market traction, and ecosystem maturity