RAG — Retrieval-Augmented Generation¶

Status: 🚧 Coming soon — dedicated chapters are being written.

RAG is the technique of grounding an LLM's response in retrieved, up-to-date documents instead of relying on what's baked into the model's weights. It's how production LLM apps stay accurate, current, and citable.

What this section will cover¶

Why RAG: hallucination, freshness, citations, cost
The RAG loop: chunk → embed → store → retrieve → rerank → generate
Chunking strategies (fixed, semantic, recursive, document-aware)
Embedding models (OpenAI, Cohere, HuggingFace, local)
Vector stores (FAISS, Chroma, pgvector, Pinecone, Weaviate)
Retrievers (similarity, MMR, hybrid BM25 + dense, multi-query)
Reranking (cross-encoders, Cohere Rerank)
Evaluation (faithfulness, context relevance, answer relevance)
Production patterns: caching, observability, fallback

RAG components are already covered in the LangChain section:

A dedicated RAG track (end-to-end pipelines, evaluation, production tips) is in progress.

RAG — Retrieval-Augmented Generation¶

What this section will cover¶

Currently available — related material¶