Understanding Embeddings
Embeddings are numerical representations of text that capture semantic meaning, enabling similarity search and RAG applications.
Popular Embedding Models
| Model | Dimensions | Best For |
|---|
| OpenAI text-embedding-3 | 3072 | General purpose |
| BGE-Large | 1024 | Open source |
| E5-Large | 1024 | Multilingual |
| all-MiniLM-L6 | 384 | Speed/efficiency |
Use Cases
- Semantic Search - Find similar documents
- RAG - Retrieve relevant context for LLMs
- Clustering - Group similar content
- Classification - Categorize text
Vector Databases
| Database | Type | Best For |
|---|
| Pinecone | Cloud | Managed, scalable |
| Weaviate | Open source | Self-hosted |
| Chroma | Lightweight | Local development |
| Milvus | Enterprise | Large scale |
Similarity Metrics
| Metric | Use Case |
|---|
| Cosine | Normalized text (most common) |
| Euclidean | Dense vectors |
| Dot Product | Same as cosine for normalized |