Python LlamaIndex: Step by Step RAG With Examples

LlamaIndex in Python: A RAG Guide With Examples

Discover how to use LlamaIndex with practical examples. This framework helps you build retrieval-augmented generation (RAG) apps using Python. LlamaIndex lets you load your data and documents, create and persist searchable indexes, and query an LLM using your data as context.

In this tutorial, you’ll learn the basics of installing the package, setting AI providers, spinning up a query engine, and running synchronous or asynchronous queries against remote or local models.

By the end of this tutorial, you’ll understand that:

  • You use LlamaIndex to connect your data to LLMs, allowing you to build AI agents, workflows, query engines, and chat engines.
  • You can perform RAG with LlamaIndex to retrieve relevant context at query time, helping the LLM generate grounded answers and minimize hallucinations.

You’ll start by preparing your environment and installing LlamaIndex. From there, you’ll learn how to load your own files, build and save an index, choose different AI providers, and run targeted queries over your data through a query engine.

Take the Quiz: Test your knowledge with our interactive “LlamaIndex in Python: A RAG Guide With Examples” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

LlamaIndex in Python: A RAG Guide With Examples

Take this Python LlamaIndex quiz to test your understanding of index persistence, reloading, and performance gains in RAG applications.

Start Using LlamaIndex

Training or fine-tuning an AI model—like a large language model (LLM)—on your own data can be a complex and resource-intensive process. Instead of modifying the model itself, you can rely on a pattern called retrieval-augmented generation (RAG).

RAG is a technique where the system, at query time, first retrieves relevant external documents or data and then passes them to the LLM as additional context. The model uses this context as a source of truth when generating its answer, which typically makes the response more accurate, up to date, and on topic.

This technique also allows LLMs to provide answers to questions that they wouldn’t have been able to answer otherwise—for example, questions about your internal company information, email history, and similar private data.

LlamaIndex is a Python framework that enables you to build AI-powered apps capable of performing RAG. It helps you feed LLMs with your own data through indexing and retrieval tools. Next, you’ll learn the basics of installing, setting up, and using LlamaIndex in your Python projects.

Install and Set Up LlamaIndex

Before installing LlamaIndex, you should create and activate a Python virtual environment. Refer to Python Virtual Environments: A Primer for detailed instructions on how to do this.

Once you have the virtual environment ready, you can install LlamaIndex from the Python Package Index (PyPI):

Shell
(.venv) $ python -m pip install llama-index

This command downloads the framework from PyPI and installs it in your current Python environment. In practice, llama-index is a core starter bundle of packages containing the following:

  • llama-index-core
  • llama-index-llms-openai
  • llama-index-embeddings-openai
  • llama-index-readers-file

As you can see, OpenAI is the default LLM provider for LlamaIndex. In this tutorial, you’ll rely on this default setting, so after installation, you must set up an environment variable called OPENAI_API_KEY that points to a valid OpenAI API key:

Windows PowerShell
(.venv) PS> $ENV:OPENAI_API_KEY = "<your-api-key-here>"
Shell
(.venv) $ export OPENAI_API_KEY="<your-api-key-here>"

With this command, you make the API key accessible under the environment variable OPENAI_API_KEY in your current terminal session. Note that you’ll lose it when you close your terminal. To persist this variable, add the export command to your shell’s configuration file—typically ~/.bashrc or ~/.zshrc on Linux and macOS—or use the System Properties dialog on Windows.

LlamaIndex also supports many other LLMs. For a complete list of models, visit the Available LLM integrations section in the official documentation.

Run a Quick LlamaIndex RAG Example

Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Article

Already a member? Sign-In

Locked learning resources

The full article is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Article

Already a member? Sign-In

About Leodanis Pozo Ramos

Leodanis is a self-taught Python developer, educator, and technical writer with over 10 years of experience.

» More about Leodanis

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

What Do You Think?

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Become a Member to join the conversation.

Keep Learning

Related Topics: intermediate ai