Back to BenchmarksFrontier
VideoMME
Video understanding and multimodal evaluation tasks
Leaderboard
(8 models)| Rank | Model | Score | Stderr |
|---|---|---|---|
| 1 | Gemini 1.5 Flash | 75.00 | — |
| 2 | Qwen2.5-Max | 73.50 | — |
| 3 | GPT-4o | 71.90 | — |
| 4 | gpt-4o-mini-2024-07-18 | 64.80 | — |
| 5 | Claude 3.7 Sonnet | 60.00 | — |
| 6 | GPT-4.1 | 59.90 | — |
| 7 | Kimi K2 0905 (Novita) | 55.80 | — |
| 8 | Qwen Plus | 51.30 | — |
Data source: Epoch AI, “Data on AI Benchmarking”. Published at epoch.ai
Licensed under CC-BY 4.0