Component Browser

Browse and compare AI components

Filters

Components

(153 of 153)

models

Gemini 3 Pro

Google's frontier model ranking ECI 154. exceptional across reasoning, coding, and math benchmarks. exceptional intelligence. API access.

ELO: 1Kcountry: United Kinprovider: Google

models

GPT-5.2

OpenAI's frontier model ranking ECI 152. exceptional across reasoning, coding, and math benchmarks. exceptional math.

ELO: 1Kcountry: United Staprovider: OpenAI

models

Claude Opus 4.5

Anthropic flagship model with 200K context. Record ARC-AGI performance. Exceptional reasoning (98/100) and intelligence (97/100). Best for research and complex tasks.

country: United Staprovider: Anthropiceci score: 149.86957925424835

accelerators

NVIDIA B200

Blackwell GPU with massive 192GB HBM. 4.5K TFLOPS FP16 for next-generation performance. 52K tok/s MLPerf.

tdp: 1000WMemory: 192tflops fp16: 5K

models

o3

OpenAI next-gen reasoning model with 200K context. Record ARC-AGI scores. Exceptional reasoning (99/100) and math (98/100). Best for research and complex problem-solving.

country: United Staprovider: OpenAIeci score: 148.78913668659862

frameworks

PyTorch

Training optimization framework. ThoughtWorks Radar: Adopt. 85K GitHub stars, 5.5M weekly downloads.

Stars: 85Kradar status: adoptframework type: training

cloud

CoreWeave

Regional GPU cloud. excellent GPU availability. 4 regions. offering H100/H200 and 3 more. H100 at $6.16/hr. spot instances available.

regions: 4pricing tier: competitivprice updated: 2025-12-28

models

Grok 4

xAI latest-gen reasoning model with 128K context. Exceptional reasoning (95/100) and intelligence (94/100). Record agentic benchmark performance.

country: United Staprovider: xAIeci score: 147.41867399925533

accelerators

NVIDIA H200

Hopper GPU with extended 141GB HBM. 2K TFLOPS FP16 for high-performance compute. 87K tok/s MLPerf. Available from 5 providers starting at $4.50/hr.

tdp: 700WMemory: 141tflops fp16: 2K

models

Claude Sonnet 4.5

Anthropic's frontier model ranking ECI 146. strong performance across major benchmarks. excellent intelligence. API access.

ELO: 1Kcountry: United Staprovider: Anthropic

frameworks

vLLM

Inference serving framework. ThoughtWorks Radar: Adopt. 38K GitHub stars, 850K weekly downloads.

Stars: 38Kradar status: adoptframework type: inference

models

Gemini 2.5 Pro (Jun 2025)

Google's frontier model ranking ECI 146. strong performance across major benchmarks. excellent intelligence. Released Jun 2025. API access.

country: United Kinprovider: Googleeci score: 146.2161106724776

models

DeepSeek V3

DeepSeek efficient MoE with 128K context. 671B parameters. Exceptional code (94/100) and math (93/100). State-of-the-art performance at low inference cost.

country: Chinaprovider: DeepSeekeci score: 145.11291603938145

models

kimi-k2-thinking (official)

Moonshot's frontier model ranking ECI 145. strong performance across major benchmarks. excellent intelligence. Released Nov 2025. Open weights available.

country: Chinaprovider: Moonshoteci score: 145.09977073063365

frameworks

HuggingFace Transformers

Model hub and library. ThoughtWorks Radar: Adopt. 135K GitHub stars, 12M weekly downloads.

Stars: 135Kradar status: adoptframework type: model_hub

frameworks

MCP Filesystem Server

Official MCP server for filesystem operations. Enables AI agents to read, write, search, and manage files. Sandbox-safe with configurable permissions.

provider: Anthropiccapabilities: read,writeradar status: adopt

models

Qwen 3 235B

Alibaba flagship MoE model with 128K context. 235B parameters (22B active). Exceptional math (94/100) and reasoning (94/100). Competes with closed-source frontier models.

country: Chinaprovider: Alibabaeci score: 145.28065460309054

accelerators

NVIDIA H100

Hopper GPU with large 80GB HBM. 2K TFLOPS FP16 for high-performance compute. 125K tok/s MLPerf. Available from 12 providers starting at $1.49/hr.

tdp: 700WMemory: 80tflops fp16: 2K

agents

Claude Code

Anthropic-powered autonomous coding agent. Strong tool use and planning and memory and self-correction capabilities.

license: commercialprovider: Anthropicrun rate: 1000.0M

models

kimi-k2-thinking (turbo official)

Moonshot's frontier model ranking ECI 145. strong performance across major benchmarks. excellent intelligence. Released Nov 2025. Open weights available.

country: Chinaprovider: Moonshoteci score: 145.09977073063365

models

Qwen3-Max-Instruct

Alibaba's frontier model ranking ECI 145. strong performance across major benchmarks. excellent intelligence. Released Sep 2025. API access.

country: Chinaprovider: Alibabaeci score: 145.2734899249661

models

o4-mini (high)

OpenAI's top-tier model ranking ECI 145. competitive on reasoning and coding tasks. excellent intelligence. Released Apr 2025. API access.

country: United Staprovider: OpenAIeci score: 144.9937806547

models

Llama 3.1 405B

Meta largest open-weights model with 128K context. 405B parameters. Excellent instruction following (92/100) and code (90/100). Industry benchmark for open models.

provider: MetaParams: 405open source: Yes

models

Grok Code Fast 1

xAI's coding-optimized model with 128K context. exceptional code (96/100). Best for software development and code generation.

provider: xAImodel type: codingprice input: $0.30

agents

OpenAI CUA

OpenAI Computer-Using Agent. SOTA benchmark results across computer control tasks. OSWorld 38.1% (human: 72.4%), WebArena 58.1%, WebVoyager 87%.

license: commercialprovider: OpenAIagent type: computer_u

models

o1

OpenAI reasoning-first model with 200K context. Exceptional reasoning (98/100) and math (96/100). Uses extended thinking for complex multi-step problems.

country: United Staprovider: OpenAIeci score: 142.37147467669178

agents

Cursor

Anysphere-powered autonomous coding agent. Strong tool use and planning and memory and self-correction capabilities.

license: commercialprovider: Anysphereagent type: coding

models

Qwen3-Coder-480B-A35B

Alibaba's top-tier model ranking ECI 143. competitive on reasoning and coding tasks. excellent intelligence. Released Jul 2025. Open weights available.

country: Chinaprovider: Alibabaeci score: 142.96146590080863

models

Grok 4.1

xAI's general-purpose model with 256K context. excellent intelligence (91/100). Best for general knowledge tasks.

ELO: 1Kprovider: xAIprice input: $2.00

frameworks

Model Context Protocol (MCP)

Open protocol for connecting AI agents to external tools, data sources, and resources. Enables standardized tool use across Claude Code, Cline, and other MCP-compatible agents.

Stars: 35Korganization: Anthropicradar status: adopt

agents

Devin

Cognition Labs' AI software engineer. SWE-bench Verified 53.8%. First autonomous coding agent to demonstrate end-to-end software development.

license: commercialprovider: Cognition valuation: 2000.0M

agents

Writer Action Agent

Writer's enterprise AI agent. GAIA Level 3 leader (61%), highest difficulty multi-step reasoning. Surpassed OpenAI Deep Research (~47.6%).

license: commercialprovider: Writeragent type: general_as

models

Grok-3 mini

xAI's top-tier model ranking ECI 141. competitive on reasoning and coding tasks. strong intelligence. Released Apr 2025. API access.

country: United Staprovider: xAIeci score: 140.80473869724742

cloud

Google Cloud

Global hyperscaler. strong GPU availability. 35 regions. offering H100/A100 and 2 more. H100 at $3/hr. spot instances available.

regions: 35pricing tier: premiumprice updated: 2025-12-28

models

Claude 3.7 Sonnet

Anthropic hybrid reasoning model with 200K context. First Claude with extended thinking. Exceptional code (95/100) and reasoning (96/100). Best for agentic tasks.

country: United Staprovider: Anthropiceci score: 141.67851910714836

agents

Perplexity

Perplexity AI-powered AI-native search agent. Strong tool use and planning and memory and self-correction capabilities.

license: commercialprovider: Perplexityrun rate: 100.0M

accelerators

Google TPU v5p

TPU GPU with large 95GB HBM.

tdp: 450WMemory: 95tflops bf16: 459

agents

Manus AI

General AI assistant acquired by Meta (Dec 2025). GAIA Level 3 score 57.7%. Now integrated into Meta's AI platform.

license: commercialprovider: Meta (acquagent type: general_as

models

Claude Haiku 4.5

Anthropic's top-tier model ranking ECI 141. competitive on reasoning and coding tasks. strong intelligence. Released Oct 2025. API access.

country: United Staprovider: Anthropiceci score: 140.55933670602477

models

Kimi K2 0905 (Novita)

Moonshot's top-tier model ranking ECI 140. competitive on reasoning and coding tasks. strong intelligence. Released Sep 2025. Open weights available.

country: Chinaprovider: Moonshoteci score: 140.48407017520103

models

Kimi K2 Instruct

Moonshot's top-tier model ranking ECI 140. competitive on reasoning and coding tasks. strong intelligence. Released Jul 2025. Open weights available.

country: Chinaprovider: Moonshoteci score: 140.48407017520103

models

DeepSeek R1

DeepSeek reasoning model with 128K context. Uses chain-of-thought. Exceptional math (96/100) and reasoning (96/100). Open-source competitor to o1.

ELO: 1Kcountry: Chinaprovider: DeepSeek

models

GPT-OSS 120B

OpenAI's top-tier model ranking ECI 140. solid benchmark performance. strong intelligence. Open weights available.

country: United Staprovider: OpenAIeci score: 139.76129702790502

models

Mistral Large

Mistral flagship model with 128K context. Strong instruction following (90/100) and code (88/100). European leader in frontier AI.

provider: Mistral AIopen source: NoOpen: No

frameworks

OpenAI Function Calling

OpenAI's structured output format for model-tool interaction. JSON schema-based function definitions enable reliable tool use. Adopted by most LLM API providers.

organization: OpenAIradar status: adoptframework type: protocol

agents

Cline

autonomous coding agent. Strong tool use and planning and memory and self-correction capabilities. 28K GitHub stars.

license: Apache-2.0provider: Open Sourcagent type: coding

frameworks

MCP Git Server

Official MCP server for Git operations. Clone, commit, branch, merge, and push without leaving the agent context. Full Git workflow support.

provider: Anthropiccapabilities: clone,commradar status: adopt

frameworks

Ollama

Local LLM inference with one-line setup. CPU-optimized, edge deployment focus. 148K GitHub stars, 2.5M weekly downloads.

Stars: 148Kradar status: adoptframework type: inference

frameworks

LangChain

LLM orchestration framework. ThoughtWorks Radar: Adopt. 95K GitHub stars, 2.1M weekly downloads.

Stars: 95Kradar status: adoptframework type: orchestrat

frameworks

TensorRT-LLM

Inference serving framework. ThoughtWorks Radar: Trial. 9.5K GitHub stars, 180K weekly downloads.

Stars: 10Kradar status: trialframework type: inference

agents

Abridge

Abridge-powered healthcare automation agent. Strong tool use and planning and memory and self-correction capabilities.

license: commercialprovider: Abridgeuse case: clinical-n

agents

IBM CUGA

IBM Computer Use General Agent. WebArena SOTA (61.7%). Uses modular Planner-Executor-Memory architecture for web automation.

license: researchprovider: IBMagent type: browser

agents

Claude Cowork

Claude Code for non-technical work. Cowork lets you complete tasks like file organization, document creation, and data compilation using natural language. Features parallel task queuing, sandboxed folder access, and connector integrations. Built entirely with Claude Code itself.

license: commercialpricing: $100-200/mplatform: macOS Desk

cloud

AWS

Global hyperscaler. moderate GPU availability. 32 regions. offering H100/A100 and 2 more. H100 at $3.90/hr. spot instances available.

regions: 32pricing tier: premiumprice updated: 2025-12-28

agents

Replit Agent

Replit's AI coding agent. Top-3 revenue in coding agents. 35M MAUs with integrated cloud development environment.

license: commercialprovider: Replitagent type: coding

agents

Windsurf

Codeium's AI-native code editor. 2M MAUs. Cascade model for agentic coding with Flows feature for multi-file edits.

license: freemiumprovider: Codeiumagent type: coding

frameworks

LangSmith

LLM observability platform by LangChain. Native integration for tracing, debugging, and monitoring LLM applications. 5K free traces, $39/user/month cloud.

radar status: adoptframework type: observabilweekly downloads: 450K

models

GPT-4.1

Top-tier model from OpenAI (ECI 138). solid benchmark performance. strong intelligence. Released Apr 2025. API access.

country: United Staprovider: OpenAIeci score: 137.5095823764539

models

Codestral

Mistral code-specialized model with 32K context. Exceptional code generation (94/100). Optimized for software engineering and code completion.

provider: Mistral AIopen source: NoOpen: No

models

Devstral 2512

Mistral AI's coding-optimized model with 128K context. excellent code (94/100). Best for software development and code generation. Open weights.

provider: Mistral AImodel type: codingprice input: $0.10

models

Gemini 1.5 Flash

Google fast model with 1M context. Good instruction following (88/100) and reasoning (85/100). Cost-effective for production workloads.

provider: Googleopen source: NoOpen: No

cloud

Azure

Global hyperscaler. moderate GPU availability. 60 regions. offering H100/A100/ND H100 v5. H100 at $6.98/hr. spot instances available.

regions: 60pricing tier: premiumprice updated: 2025-12-28

frameworks

LangGraph

LLM orchestration framework. ThoughtWorks Radar: Adopt. 8.5K GitHub stars, 450K weekly downloads.

Stars: 9Kradar status: adoptframework type: orchestrat

frameworks

SGLang

Inference serving framework with RadixAttention for KV-cache reuse. Stable latency (4-21ms), optimized for multi-turn chat and RAG.

Stars: 12Kradar status: trialframework type: inference

frameworks

Weaviate

AI-native vector database with HNSW indexing. Open-source, supports hybrid search, product quantization, and multi-tenancy. Cloud and self-hosted options. 13K GitHub stars.

indexing: HNSWdeployment: self-hosteStars: 13K

agents

CrewAI

CrewAI Inc-powered multi-agent orchestration framework. Strong tool use and planning and memory and self-correction capabilities. 24K GitHub stars.

license: MITprovider: CrewAI Incagent type: multi_agen

agents

Lovable

Lovable-powered AI app building platform. Strong tool use and planning and memory and self-correction capabilities.

license: commercialprovider: Lovableagent type: app-builde

accelerators

AMD MI300X

CDNA3 GPU with massive 192GB HBM. 1.3K TFLOPS FP16 for production-grade compute. 169K tok/s MLPerf.

tdp: 750WMemory: 192tflops fp16: 1K

models

Command R+

Cohere enterprise model with 128K context. Strong instruction following (88/100). Optimized for RAG and tool use in enterprise deployments.

provider: Cohereopen source: NoOpen: No

models

kat-coder-pro-v1

Kwaipilot's coding-optimized model with 64K context. exceptional code (95/100). Best for software development and code generation.

provider: Kwaipilotmodel type: codingprice input: $0.20

models

Claude 3.5 Haiku

Anthropic fast model with 200K context. Strong instruction following (88/100) and code (85/100). Best for high-throughput production workloads.

provider: Anthropicopen source: NoOpen: No

agents

NotebookLM

Google-powered research and analysis agent. Strong tool use and planning and memory and self-correction capabilities.

license: commercialprovider: Googleagent type: research

agents

AutoGen

Microsoft-powered multi-agent orchestration framework. Strong tool use and planning and memory and self-correction capabilities. 35K GitHub stars.

license: MITprovider: Microsoftagent type: multi_agen

agents

Aider

autonomous coding agent. Strong tool use and planning and memory and self-correction capabilities. 22K GitHub stars.

license: Apache-2.0provider: Open Sourcagent type: coding

frameworks

CrewAI Framework

Role-playing autonomous agent orchestration. Cutting-edge multi-agent collaboration. 42.5K GitHub stars, 350K weekly downloads.

Stars: 43Kradar status: trialframework type: multi_agen

frameworks

MCP Brave Search

Official MCP server for Brave Search API. Enables web, news, and image search with privacy-respecting results. Requires Brave API key.

provider: Anthropiccapabilities: web_searchradar status: trial

frameworks

LlamaIndex

Data pipeline framework. ThoughtWorks Radar: Trial. 38K GitHub stars, 680K weekly downloads.

Stars: 38Kradar status: trialframework type: data

cloud

Together AI

GPU cloud provider. strong GPU availability. 2 regions. offering H100/A100.

regions: 2pricing tier: valuegpus available: H100,A100

models

Gemini 2.0 Flash Thinking Exp

Top-tier model from Google DeepMind,Google (ECI 136). solid benchmark performance. strong intelligence. Released Jan 2025. API access.

country: United Kinprovider: Google Deeeci score: 136.09543782218225

models

ERNIE 5.0 Preview

Baidu's reasoning-capable model with 128K context. strong reasoning (85/100). Best for complex multi-step problems.

provider: BaiduOpen: NoContext: 128K

frameworks

DeepSpeed

Training optimization framework. ThoughtWorks Radar: Assess. 35K GitHub stars, 320K weekly downloads.

Stars: 35Kradar status: assessframework type: training

frameworks

AutoGen Framework

Microsoft multi-agent conversational framework. Customizable agent behaviors, enterprise-ready. 35K GitHub stars.

Stars: 35Kradar status: trialframework type: multi_agen

frameworks

Haystack

Deepset RAG framework. Lowest token usage in benchmarks, production-ready pipelines. 17.5K GitHub stars.

Stars: 18Kradar status: trialframework type: data

models

Gemini 2.0 Pro Exp (Feb 2025)

Top-tier model from Google (ECI 136). solid benchmark performance. solid intelligence. Released Feb 2025. Hosted access (no API).

country: United Kinprovider: Googleeci score: 135.51597856798938

agents

OpenHands

All Hands AI-powered autonomous coding agent. Strong tool use and planning and memory and self-correction capabilities. 42K GitHub stars.

license: MITprovider: All Hands agent type: coding

frameworks

Arize Phoenix

Open-source LLM observability platform. OpenTelemetry-based tracing, evaluation, and monitoring. Free self-hosted, $50/mo managed cloud.

Stars: 12Kpricing free: Yesradar status: trial

models

GPT-4.1 mini

Top-tier model from OpenAI (ECI 135). solid benchmark performance. solid intelligence. Released Apr 2025. API access.

country: United Staprovider: OpenAIeci score: 135.4435967525866

models

Qwen2.5-Max

Competitive model from Alibaba (ECI 133). solid benchmark performance. solid intelligence. Released Jan 2025. API access.

country: Chinaprovider: Alibabaeci score: 133.17425841782563

frameworks

DSPy

Stanford programmatic prompting framework. Lowest framework overhead, algorithmic optimization. 18K GitHub stars.

Stars: 18Kradar status: trialframework type: orchestrat

cloud

Lambda Labs

Regional GPU cloud. strong GPU availability. 3 regions. offering H100/A100/A10. H100 at $2.49/hr.

regions: 3pricing tier: valueprice updated: 2025-12-28

frameworks

Langfuse

Open-source LLM engineering platform. Tracing, evaluation, prompt management, and metrics. 50K events free, self-host available.

Stars: 8Kradar status: trialframework type: observabil

accelerators

AWS Trainium2

Trainium GPU with large 96GB HBM.

tdp: 400WMemory: 96tflops bf16: 380

frameworks

MetaGPT

Multi-agent framework simulating software company. Agents take roles: PM, architect, engineer. 45K GitHub stars.

Stars: 45Kradar status: trialframework type: multi_agen

agents

GPT Researcher

Tavily-powered research and analysis agent. Strong tool use and planning and memory and self-correction capabilities. 16K GitHub stars.

license: MITprovider: Tavilyagent type: research

frameworks

Context7

Community MCP server for documentation lookup. Retrieves up-to-date docs and code examples for any library. Resolves library IDs and queries documentation.

provider: Communitycapabilities: documentatradar status: trial

models

Minimax M2

Minimax's code-capable model with 128K context. solid code (84/100). Best for software development and code generation.

provider: Minimaxprice input: $0.15Open: No

agents

Kortix Suna

Kortix open-source general agent framework. Self-hosted deployment for enterprise privacy. Flexible LLM backend support.

license: open-sourcprovider: Kortixagent type: general_as

models

Llama 4 Maverick (FP8)

Competitive model from Meta (ECI 133). solid benchmark performance. solid intelligence. Released Apr 2025. Open weights available.

country: United Staprovider: Metaeci score: 132.8850524536027

frameworks

Promptfoo

Open-source LLM evaluation framework with YAML configuration. Supports prompt testing, red-teaming, and CI/CD integration. Lightweight alternative to heavier eval platforms.

Stars: 6Kradar status: trialconfig format: yaml

frameworks

Helicone

Open-source AI gateway with observability. Proxy-based setup for instant LLM monitoring. 100K requests free, $20/seat/month paid.

Stars: 3Kradar status: trialframework type: observabil

models

Gemma 3 27B

Competitive model from Google (ECI 131). solid benchmark performance. solid intelligence. Released Mar 2025. Open weights available.

country: United Kinprovider: Googleeci score: 130.9457032062782

models

Phi-4

Competitive model from Microsoft Research (ECI 131). solid benchmark performance. solid intelligence. Released Dec 2024. Open weights available.

country: United Staprovider: Microsoft eci score: 130.98080665701002

models

Qwen Plus

Competitive model from Alibaba (ECI 131). solid benchmark performance. solid intelligence. Released Apr 2025. API access.

country: Chinaprovider: Alibabaeci score: 131.0447378036277

models

mimo-v2-flash

Xiaomi's code-capable model with 32K context. solid code (82/100). Best for software development and code generation. Open weights.

provider: Xiaomiprice input: $0.07Open: Yes

models

Qwen 2.5 72B

Alibaba open-weights model with 128K context. 72B parameters. Strong math (90/100) and instruction following (90/100). Top performer in open-weights category.

country: Chinaprovider: Alibabaeci score: 129.43221393758552

models

Llama 4 Scout

Competitive model from Meta (ECI 130). solid benchmark performance. solid intelligence. Released Apr 2025. Open weights available.

country: United Staprovider: Metaeci score: 130.0579178010959

models

GPT-4o

OpenAI flagship multimodal model with 128K context. Strong code (92/100) and instruction following (94/100). Handles vision, audio, and text in unified architecture.

country: United Staprovider: OpenAIeci score: 130.02827257640743

frameworks

Braintrust

End-to-end AI evaluation and observability platform. Combines eval frameworks with production logging, experiments, and model comparison. Enterprise-focused alternative to LangSmith.

capabilities: evals,prodradar status: trialframework type: evaluation

frameworks

Playwright MCP

Community MCP server for browser automation via Playwright. Navigate pages, fill forms, take screenshots, and interact with web applications from AI agents.

provider: Communitycapabilities: browser_auStars: 3K

agents

Browser Use

browser automation agent. Strong tool use and planning and memory and self-correction capabilities. 8.5K GitHub stars.

license: MITprovider: Open Sourcagent type: browser

models

GPT-4 Turbo

OpenAI GPT-4 Turbo with 128K context window. Balanced intelligence (88/100) and code generation (90/100). Being superseded by GPT-4o and o-series models.

country: United Staprovider: OpenAIeci score: 127.51320951492868

models

Llama 3.3 70B

Meta latest generation open-weights model with 128K context. 70B parameters. Improved instruction following (90/100) and reasoning (87/100). Best Llama 70B variant.

country: United Staprovider: Metaeci score: 127.28520683172442

frameworks

VoltAgent

TypeScript agent framework with built-in observability. Multi-provider support, workflow orchestration. 8.5K GitHub stars.

Stars: 9Kradar status: assessframework type: orchestrat

frameworks

Harbor

Containerized agent evaluation platform with task registry. Enables pre/post execution checks, sandbox environments, and reproducible agent testing. Used by Anthropic for internal evals.

capabilities: containeriradar status: trialframework type: evaluation

models

Claude 3 Opus

Anthropic flagship Claude 3 model with 200K context. Excellent reasoning (94/100) and instruction following (95/100). Premium tier for complex analysis tasks.

country: United Staprovider: Anthropiceci score: 126.85188232987468

frameworks

Agent Protocol

Open standard for agent-to-agent communication. REST API spec enabling agents to list tasks, execute steps, and share artifacts. Supported by AutoGPT and agent frameworks.

Stars: 2Korganization: AI Foundatradar status: assess

agents

Blink

AI-powered app builder for non-coders. Build websites, SaaS, and mobile apps by chatting with AI. Includes database, auth, hosting, and payment integrations.

license: freemiumfeatures: database,aprovider: Blink

models

Qwen3-32B

Emerging model from Alibaba. Released Apr 2025. Benchmark data.

country: Chinaprovider: Alibabaopen source: No

models

gpt-4o-mini-2024-07-18

Emerging model from OpenAI. Released Jul 2024. Benchmark data.

country: United Staprovider: OpenAIopen source: No

models

c4ai-command-a-03-2025

Emerging model from Cohere. Released Mar 2025. Benchmark data.

country: Canadaprovider: Cohereopen source: No

frameworks

Agent2Agent (A2A)

Google's emerging protocol for cross-platform agent interoperability. Enables agents from different platforms to discover, negotiate, and collaborate on tasks.

organization: Googleradar status: assessframework type: protocol

models

yi-lightning

Emerging model from 01.AI. Released Dec 2024. Benchmark data.

country: Chinaprovider: 01.AIopen source: No

models

Qwen3-235B-A22B

Emerging model from Alibaba. Released Apr 2025. Benchmark data.

country: Chinaprovider: Alibabaopen source: No

models

Llama-4-Maverick-17B-128E-Instruct

Emerging model from Meta AI. Released Apr 2025. Benchmark data.

country: United Staprovider: Meta AIopen source: No

models

Phi-3-medium-128k-instruct

Emerging model from Microsoft. Released Apr 2024. Benchmark data.

country: United Staprovider: Microsoftopen source: No

models

Qwen2.5-Coder-32B-Instruct

Emerging model from Alibaba. Released Nov 2024. Benchmark data.

country: Chinaprovider: Alibabaopen source: No

models

Mistral-7B-v0.1

Emerging model from Mistral AI. Released Sep 2023. Benchmark data.

country: Franceprovider: Mistral AIopen source: No

models

gpt-3.5-turbo-1106

Emerging model from OpenAI. Released Nov 2023. Benchmark data.

country: United Staprovider: OpenAIopen source: No

models

chatgpt-4o-01-29-2025

Emerging model from OpenAI. Released Jan 2025. Benchmark data.

country: United Staprovider: OpenAIopen source: No

models

o4-mini-2025-04-16 medium

Emerging model from OpenAI. Released Apr 2025. Benchmark data.

country: United Staprovider: OpenAIopen source: No

models

chatgpt-4o-03-27-2025

Emerging model from OpenAI. Released Mar 2025. Benchmark data.

country: United Staprovider: OpenAIopen source: No

models

Yi-6B

Emerging model from 01.AI. Released Nov 2023. Benchmark data.

country: Chinaprovider: 01.AIopen source: No

models

falcon-180B

Emerging model from Technology Innovation Institute. Released Sep 2023. Benchmark data.

country: United Araprovider: Technologyopen source: No

models

Llama-2-7b

Emerging model from Meta AI. Released Jul 2023. Benchmark data.

country: United Staprovider: Meta AIopen source: No

models

Llama-2-70b-hf

Emerging model from Meta AI. Released Jul 2023. Benchmark data.

country: United Staprovider: Meta AIopen source: No

models

Phi-3-small-8k-instruct

Emerging model from Microsoft. Released Apr 2024. Benchmark data.

country: United Staprovider: Microsoftopen source: No

models

Phi-3-mini-4k-instruct

Emerging model from Microsoft. Released Apr 2024. Benchmark data.

country: United Staprovider: Microsoftopen source: No

models

claude-opus-4-20250514 32K

Emerging model from Anthropic. Released May 2025. Benchmark data.

country: United Staprovider: Anthropicopen source: No

models

gemma-7b

Emerging model from Google DeepMind. Released Feb 2024. Benchmark data.

country: United Kinprovider: Google Deeopen source: No

models

DeepSeek-V2.5

Emerging model from DeepSeek. Released Sep 2024. Benchmark data.

country: Chinaprovider: DeepSeekopen source: No

models

Meta-Llama-3-8B-Instruct

Emerging model from Meta AI. Released Apr 2024. Benchmark data.

country: United Staprovider: Meta AIopen source: No

models

Mixtral-8x7B-v0.1

Emerging model from Mistral AI. Released Dec 2023. Benchmark data.

country: Franceprovider: Mistral AIopen source: No

models

QwQ-32B

Emerging model from Alibaba. Released Mar 2025. Benchmark data.

country: Chinaprovider: Alibabaopen source: No

agents

GPT-4 Agent

OpenAI-powered autonomous agent. Top-tier AgentBench performer at 44.1% overall. Excels at household tasks (78%), knowledge graphs (58%), OS tasks (42%). Strong tool use and planning and self-correction capabilities. Powered by GPT-4.

provider: OpenAIbase model: gpt-4bfcl simple: 92.8

agents

Claude 3.5 Sonnet Agent

Anthropic-powered autonomous agent. Top-tier AgentBench performer at 42.3% overall. Excels at household tasks (72%), knowledge graphs (55%), OS tasks (40%). Strong tool use and self-correction capabilities. Powered by Claude 3.5 Sonnet.

provider: Anthropicbase model: claude-3.5bfcl simple: 94.5

agents

GPT-4-Turbo Agent

OpenAI-powered autonomous agent. Top-tier AgentBench performer at 40.2% overall. Excels at household tasks (68%), knowledge graphs (52%), OS tasks (39%). Strong tool use capabilities. Powered by GPT-4 Turbo.

provider: OpenAIbase model: gpt-4-turbagentbench db: 30

agents

Gemini Pro Agent

Google-powered autonomous agent. Strong AgentBench performer at 38.1% overall. Excels at household tasks (62%), knowledge graphs (48%), OS tasks (35%). Powered by Gemini Pro.

provider: Googlebase model: gemini-probfcl simple: 93.8

agents

Claude 3 Opus Agent

Anthropic-powered autonomous agent. Strong AgentBench performer at 35.4% overall. Excels at household tasks (55%), knowledge graphs (45%), OS tasks (33%). Powered by Claude 3 Opus.

provider: Anthropicbase model: claude-3-oagentbench db: 26.2

agents

Llama 3 70B Agent

Meta-powered autonomous agent. Capable AgentBench performer at 30.2% overall. Excels at household tasks (45%), knowledge graphs (38%). Powered by llama-3-70b.

provider: Metabase model: llama-3-70bfcl simple: 88.5

agents

Mistral Large Agent

Mistral-powered autonomous agent. Capable AgentBench performer at 28.5% overall. Excels at household tasks (42%), knowledge graphs (35%). Powered by mistral-large.

provider: Mistralbase model: mistral-labfcl simple: 90.2

agents

Qwen-72B Agent

Alibaba-powered autonomous agent. Capable AgentBench performer at 25.1% overall. Excels at household tasks (38%), knowledge graphs (32%). Powered by qwen-72b.

provider: Alibababase model: qwen-72bagentbench db: 18.5

agents

DeepSeek-67B Agent

DeepSeek-powered autonomous agent. Evaluated AgentBench performer at 23.2% overall. Excels at household tasks (35%), knowledge graphs (30%). Powered by deepseek-67b.

provider: DeepSeekbase model: deepseek-6agentbench db: 16.5

agents

Llama 3 8B Agent

Meta-powered autonomous agent. Evaluated AgentBench performer at 18.0% overall. Powered by llama-3-8b.

provider: Metabase model: llama-3-8bagentbench db: 12