Question 1

What are text embeddings and why do they matter?

Accepted Answer

Text embeddings convert words, sentences, or documents into numerical vectors that capture semantic meaning. Two pieces of similar text have vectors that are close together in the vector space. Embeddings power semantic search, RAG pipelines, recommendation systems, and clustering — they're the core of most modern AI applications.

Question 2

What is MTEB and how do I read the scores?

Accepted Answer

MTEB (Massive Text Embedding Benchmark) is the standard benchmark for embedding models, covering 58 tasks across retrieval, clustering, classification, and more. Higher scores indicate better semantic understanding. A score of 65+ is excellent; most competitive models score 60–70. Top models like BGE-large and text-embedding-3-large score 62–65 on the English leaderboard.

Question 3

OpenAI vs open-source embeddings — which should I choose?

Accepted Answer

OpenAI text-embedding-3-small offers excellent quality at $0.02/1M tokens with no infrastructure setup — ideal for getting started and for most RAG applications. Open-source models like all-MiniLM-L6-v2 (free, self-hosted) have lower MTEB scores but zero API cost. For high-volume applications (>100M tokens/month), self-hosted open-source models become significantly cheaper.

Question 4

What embedding dimension should I use?

Accepted Answer

Higher dimensions generally capture more semantic nuance but use more vector DB storage and slower similarity search. 384 dimensions (all-MiniLM-L6-v2) is sufficient for most applications. 768 dimensions balances quality and efficiency. 1536+ dimensions (OpenAI ada-002, text-embedding-3-large) provide maximum quality. Choose based on your quality requirements and index size.

Question 5

Which embedding model is best for code search?

Accepted Answer

For code search, use models specifically trained on code: Voyage AI's voyage-code-3 achieves top performance on code retrieval. OpenAI's text-embedding-3-large also handles code reasonably well due to diverse training data. BAAI/bge-m3 is a strong free option for multilingual code.

Question 6

Is this embedding model comparison tool free?

Accepted Answer

Yes, completely free with no signup required. All data is static and runs in your browser.

Embedding Model Comparison

How to Choose an Embedding Model

Understanding the Key Metrics

Choosing by Use Case

FAQ

How to Choose an Embedding Model

Understanding the Key Metrics

Choosing by Use Case

More free tools

Model Selection Guide

AI Workflow Planner

AI Glossary Reference

LLM Benchmark Reference

AI API Cost Comparison

FAQ