retrieval-augmented generation (RAG)

Retrieval‑Augmented Generation (RAG) is a technique in which a large language model (LLM) first retrieves relevant external documents at query time and then uses them as additional context when generating its answer.

A typical RAG pipeline usually involves these steps:

  • Encode the user query
  • Search a knowledge source, such as a vector index or web/document corpus
  • Rank and select the most relevant passages
  • Assemble those passages into the prompt so the generator can produce an answer grounded in retrieved evidence

RAG improves over purely parametric models in terms of factuality, timeliness, and ease of updates, because you can refresh the knowledge source instead of retraining the whole model. However, the output quality still largely depends on retrieval coverage, chunking strategy, and ranking/selection quality.


By Leodanis Pozo Ramos • Updated Nov. 3, 2025