Back to Benchmarks

GAIA Overall

General AI Assistant benchmark - 466 tasks across 3 difficulty levels testing reasoning, web browsing, tool use, and multi-modality

N/A
Category:agents
View Source

Methodology

Exact match scoring on factual questions with multi-step reasoning

Leaderboard

(0 models)
No models have been evaluated on this benchmark yet.

Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai

Licensed under CC-BY 4.0