retrieval-augmented generation (RAG)
Retrieval‑augmented generation (RAG) is a technique in which a large language model (LLM) first retrieves relevant external documents at query time and then uses them as additional context when generating its answer.
A typical RAG pipeline involves these steps:
- Encode the user query
- Search a knowledge source, such as a vector index or web/document corpus
- Rank and select the most relevant passages
- Assemble those passages into the prompt so the generator can produce an answer grounded in retrieved evidence
RAG improves over purely parametric models in terms of factuality, timeliness, and ease of updates, because you can refresh the knowledge source instead of retraining the whole model. However, the output quality still largely depends on retrieval coverage, chunking strategy, and ranking/selection quality.
Related Resources
Tutorial
Build an LLM RAG Chatbot With LangChain
Large language models (LLMs) have taken the world by storm, demonstrating unprecedented capabilities in natural language tasks. In this step-by-step tutorial, you'll leverage LLMs to build your own retrieval-augmented generation (RAG) chatbot using synthetic data with LangChain and Neo4j.
For additional information on related topics, take a look at the following resources:
- First Steps With LangChain (Course)
- Build an LLM RAG Chatbot With LangChain (Quiz)
By Leodanis Pozo Ramos • Updated Dec. 8, 2025