retrieval-augmented generation (RAG)
Retrieval‑Augmented Generation (RAG) is a technique in which a large language model (LLM) first retrieves relevant external documents at query time and then uses them as additional context when generating its answer.
A typical RAG pipeline may consist of the following steps:
- Encode the user query
- Search a knowledge source, for example a vector index or web/document corpus
- Rank and select the most relevant passages
- Assemble those passages into the prompt so the generator can produce an answer grounded in retrieved evidence
RAG improves over purely parametric models in terms of factuality, timeliness and ease of update because you can refresh the knowledge source instead of retraining the whole model. However, the output quality still depends heavily on retrieval coverage, chunking strategy, and ranking/selection quality.
By Leodanis Pozo Ramos • Updated Oct. 24, 2025