Skip to content

RAG β€” Retrieval-Augmented Generation

Status: 🚧 Coming soon β€” dedicated chapters are being written.

RAG is the technique of grounding an LLM's response in retrieved, up-to-date documents instead of relying on what's baked into the model's weights. It's how production LLM apps stay accurate, current, and citable.

What this section will cover

  • Why RAG: hallucination, freshness, citations, cost
  • The RAG loop: chunk β†’ embed β†’ store β†’ retrieve β†’ rerank β†’ generate
  • Chunking strategies (fixed, semantic, recursive, document-aware)
  • Embedding models (OpenAI, Cohere, HuggingFace, local)
  • Vector stores (FAISS, Chroma, pgvector, Pinecone, Weaviate)
  • Retrievers (similarity, MMR, hybrid BM25 + dense, multi-query)
  • Reranking (cross-encoders, Cohere Rerank)
  • Evaluation (faithfulness, context relevance, answer relevance)
  • Production patterns: caching, observability, fallback

RAG components are already covered in the LangChain section:

A dedicated RAG track (end-to-end pipelines, evaluation, production tips) is in progress.