Creating DataFrames From CSV Files
Here are some resources for more information about topics covered in this lesson:
00:00 Now let’s quickly go over how you would create a DataFrame from, say, a CSV file. pandas allows you to save and load data and labels from different types of files—so for example, CSV files, other spreadsheet files, SQL, and JSON files, and more. Check out the pandas documentation for more details. So for us, what we’re going to do is we’re going to take our job candidates DataFrame.
00:27
If you recall, this is what it looks like. We’re going to create a CSV file that contains this tabular data. So, each DataFrame object has a .to_csv()
method, and all you’re going to do is pass in the name of the CSV file.
00:43
We’ll call this "job_candidates.csv"
.
00:50
This statement will produce a CSV file called job_candidates.csv
in your working directory.
00:58 I’m going to create a Markdown cell and paste the contents of the job candidates CSV file, just so you can see it. I’m going to go ahead and create a Markdown cell and then just simply paste the data. pandas went ahead and created, first, a first line that contains the column labels, and then each row with its corresponding row label, and then all of the data for each row. Now, technically, the CSV file—there is no column label for the first column, and this column contains the row labels, or the index, for the DataFrame that we used to construct the CSV file.
01:36
So if our data was in this form and we wanted to read it in to pandas, then if we wanted to use the first column as the index to the DataFrame that we want to construct from the CSV file, we have to pass in a keyword argument to the read_csv()
method that we’re going to use to load the data. Let me get rid of this.
02:01 So I can either delete the cell, which I can do by going Escape and then hitting X.
02:07 Or instead of hitting X, if I want to convert it back to a code cell while I’m in command mode—and again, you can tell because this is in blue—I’m going to type in Y, and so that converts it back to a Python code cell.
02:22
So, from the pandas
module, we’re going to use the read_csv()
method, and the file that we just created—that’s what we’re going to use as the CSV file. And then again, we want to use the index_col
(index column) keyword and pass in that the first column, which starts again at 0
index, is going to serve as the index to the DataFrame that we want to construct. So if I run it, we get the exact same DataFrame that we had before.
02:54
Now I want to show you if we don’t pass in this index_col
keyword, then pandas will treat that first column as just any other column. And because we didn’t pass an index
keyword, then it’s just going to use default integers for the index labels.
03:14
And then the column name for the first column that contained the index of the DataFrame is going to be an unnamed column. But, of course, we do want that first column to serve as the index, and so we pass in 0
. All right, and so we would go ahead and work with this DataFrame as needed.
03:35 All right, so that’s a quick rundown of the different ways that you can construct a DataFrame in pandas. Definitely check out the pandas documentation for all the different ways to create a pandas DataFrame and the different options that you have. Coming up next, we’ll take a closer look at how you can retrieve labels and data from a DataFrame.
Become a Member to join the conversation.