Text embedding models have emerged as a distinct category in the AI stack, with MTEB standardizing evaluation across 8 task types. The top performers (gte-Qwen2, NV-Embed-v2, voyage-3-large) achieve ~70% overall scores with vector dimensions ranging from 768 to 4096, enabling specialized retrieval and semantic search applications.
MTEB evaluates embeddings across 56+ datasets spanning retrieval, STS, classification, clustering, and reranking
Dec 28, 2025No single model dominates all tasks - retrieval specialists differ from classification specialists
Dec 28, 2025Vector dimensions range from 384 (edge deployment) to 4096 (maximum performance) with distinct cost-quality tradeoffs
Dec 28, 2025