Embeddings are dense numerical representations of data — text, images, audio — in a continuous vector space. They're the hidden backbone of modern AI search, recommendations, and retrieval-augmented generation (RAG).
From Words to Numbers
Traditional approaches represented text as sparse vectors (bag-of-words, TF-IDF). A vocabulary of 50,000 words meant 50,000-dimensional vectors that were mostly zeros. Embeddings compress meaning into dense vectors of 256-3072 dimensions where every dimension carries signal.
Semantic Similarity
The magic of embeddings is that semantically similar items end up close together in vector space:
- "How do I reset my password?" and "I forgot my login credentials" → high similarity
- "How do I reset my password?" and "What's the weather today?" → low similarity
This enables semantic search — finding results based on meaning, not just keyword overlap.
Popular Embedding Models
| Model | Dimensions | Speed | Quality | Provider | |-------|-----------|-------|---------|----------| | text-embedding-3-small | 1536 | Fast | Good | OpenAI | | text-embedding-3-large | 3072 | Medium | Excellent | OpenAI | | voyage-3 | 1024 | Fast | Excellent | Voyage AI | | BGE-large-en-v1.5 | 1024 | Fast | Very good | Open source | | Nomic Embed v1.5 | 768 | Fast | Good | Open source | | Cohere embed-v3 | 1024 | Fast | Excellent | Cohere |
Beyond Text
Embeddings aren't limited to text. Multimodal embedding models (CLIP, SigLIP) create shared vector spaces where text and images can be compared directly. This powers visual search, image captioning, and cross-modal retrieval.
Key Concepts
- Cosine similarity: The standard metric for comparing embeddings. Ranges from -1 (opposite) to 1 (identical).
- Dimensionality: Higher dimensions capture more nuance but require more storage and compute.
- Normalization: Most embedding models output unit-normalized vectors, making cosine similarity equivalent to dot product.