RAG β Retrieval-Augmented Generation¶
Status: π§ Coming soon β dedicated chapters are being written.
RAG is the technique of grounding an LLM's response in retrieved, up-to-date documents instead of relying on what's baked into the model's weights. It's how production LLM apps stay accurate, current, and citable.
What this section will cover¶
- Why RAG: hallucination, freshness, citations, cost
- The RAG loop: chunk β embed β store β retrieve β rerank β generate
- Chunking strategies (fixed, semantic, recursive, document-aware)
- Embedding models (OpenAI, Cohere, HuggingFace, local)
- Vector stores (FAISS, Chroma, pgvector, Pinecone, Weaviate)
- Retrievers (similarity, MMR, hybrid BM25 + dense, multi-query)
- Reranking (cross-encoders, Cohere Rerank)
- Evaluation (faithfulness, context relevance, answer relevance)
- Production patterns: caching, observability, fallback
Currently available β related material¶
RAG components are already covered in the LangChain section:
- LangChain β Document Loaders
- LangChain β Text Splitters
- LangChain β Vector Stores
- LangChain β Retrievers
- LangChain β RAG
A dedicated RAG track (end-to-end pipelines, evaluation, production tips) is in progress.