Starting the Code Example - Loading

Steven Loyens

Using LlamaIndex for RAG in Python Steven Loyens 04:02

Transcript
Discussion

00:00 In this lesson, you’ll start coding, and it’s going to be a basic but fully functioning example of a query engine. And that query engine is going to be using my company policy as context, even though that is a secret document and it is not publicly available.

00:18 Now, although this is a basic example, there are some key concepts to get your head around, so I suggest splitting up the coding into two lessons. And just so that you can get your bearings, I’ve taken the liberty of writing down the steps you’ll be taking.

00:34 So this lesson, you’ll start with loading, and then you take a breather and then come back refreshed for the next steps in the next lesson. Okay, so the first step in the process is to load the policy document.

00:48 And LlamaIndex uses data connectors, also called readers, also called loaders. So loaders, readers, and data connectors are used interchangeably. And the data connector you’ll be using is the SimpleDirectoryReader class, or rather an instance of that class.

01:08 So first, you need to import that class from llama_index.core. So all the way at the top, line one,

01:16 import SimpleDirectoryReader.

01:20 Okay, that’s a good start. So now you want to create your reader or your data connector object, and that is part of loading. And so on line four, type reader, and that’s going to be an instance of the SimpleDirectoryReader class.

01:38 And that takes an input parameter, it’s called input_files,

01:43 and it needs to be a list of the documents that you want to feed into the process. So in your case, it’s going to be the company policy. Now, when I say it’s a list of documents, what I should say is a list of string representations of the path of the document, so the file path of that document.

02:04 So it needs to be a string.

02:07 Our company policy sits in the data subdirectory of the current directory, so I’m going to use dot forward slash data, and then the name of the file, which is company_policy.txt.

02:22 Now, I’m on Linux, so here I am using forward slashes, the same goes for Mac, but if you are on Windows, you’re going to need to use backslashes. Later in the course, you will fix this, and you will make the code platform independent, but that is a later step for now. Please just use whatever is appropriate for your operating system.

02:43 So that is your reader. Now, you need to make that reader read or load the data. So create a variable called documents, then call the load_data() method on the reader object. I want to capture the data, so I’m going to use the reader object, and then .load_data, and spell it correctly, preferably. So I want to call the method and therefore use parentheses.

03:12 Now, this document, this variable, or this object, this is actually a list because the load_data() method returns a list, and it returns a list of document objects. Now, when I say document object, I don’t mean a text file.

03:27 The document objects here are part of the LlamaIndex framework, so they are LlamaIndex-specific objects, if you like. So that document object, it has data, of course, but it also has some metadata about data sources and, I don’t know, like such a filename, a path, and the date created, etc. So now we have your list of document objects ready to go. The next step is to index those documents so that the AI model can interpret them, and that you will cover in the next lesson.

Become a Member to join the conversation.