Intermediate ⏱️ 6 min

πŸŽ“ What is Retrieval Augmented Generation?

Enhancing LLM responses by retrieving relevant context from external knowledge sources

What is Retrieval Augmented Generation?

RAG (Retrieval Augmented Generation) is a technique that enhances language model responses by first retrieving relevant information from external knowledge sources, then using that context to generate more accurate and up-to-date answers.

Why RAG?

ProblemHow RAG Helps
Knowledge cutoffRetrieves current information
HallucinationsGrounds responses in real data
Domain expertiseAccess specialized knowledge
Source attributionCan cite retrieved documents

RAG Architecture

User Query
    ↓
[Embedding Model] β†’ Query Vector
    ↓
[Vector Database] ← Search β†’ Top-K Documents
    ↓
[LLM] ← Context + Query β†’ Response

Core Components

1. Document Processing

  • Split documents into chunks
  • Generate embeddings for each chunk
  • Store in vector database

2. Retrieval

  • Convert query to embedding
  • Find similar chunks via vector search
  • Return top-K relevant documents

3. Generation

  • Combine retrieved context with user query
  • Generate response grounded in context

Vector Databases

DatabaseTypeBest For
PineconeManagedProduction scale
ChromaOpen sourceQuick prototypes
WeaviateOpen sourceHybrid search
QdrantOpen sourcePerformance
MilvusOpen sourceEnterprise

Embedding Models

ModelDimensionsQuality
text-embedding-3-large3072Excellent
text-embedding-3-small1536Good
BGE-large1024Very Good
E5-large1024Very Good

Advanced RAG Techniques

Combine vector search with keyword search (BM25).

Reranking

Score retrieved documents with a cross-encoder for better relevance.

Query Transformation

  • HyDE: Generate hypothetical answer, search for similar docs
  • Multi-query: Generate multiple query variations

Chunking Strategies

  • Fixed size chunks
  • Semantic chunking
  • Hierarchical (parent-child)

Challenges

  • ❌ Retrieved context may be irrelevant
  • ❌ Long context can confuse the model
  • ❌ Chunking can break important context
  • ❌ Embedding quality varies by domain

πŸ•ΈοΈ Knowledge Mesh