Reading CSVs With Pandas
This lesson covers a couple different ways to import CSV data into the third party Pandas library. In this video, you’ll learn how to install pandas using pip and, how to use it to read CSV files.
Here’s the example CSV file you’ll be using (
Name,Hire Date,Salary,Sick Days remaining Graham Chapman,03/15/14,50000.00,10 John Cleese,06/01/15,65000.00,8 Eric Idle,05/12/14,45000.00,10 Terry Jones,11/01/13,70000.00,3 Terry Gilliam,08/12/14,48000.00,7 Michael Palin,05/23/13,66000.00,8
The following example shows how to read a CSV file and print out its contents using pandas:
import pandas as pd data_frame = pd.read_csv('hrdata.csv') print(data_frame)
In addition to learning how to read CSV files and printing their contents, you will see how to use pandas to modify the index on the files you read, parse dates and also how to add headers to CSV files without one.
00:00 One way to work with CSVs in Python is to use the data analysis library Pandas, short for panel data. It’s sometimes referred to as the Excel of Python as it stores data in DataFrames, which can be thought of as Excel spreadsheets.
So now in your editor, you can
import pandas as pd, and let’s take a look at the data that we’re going to be working with. Over here, I have a file called
hrdata.csv, and it just has the names, hire dates, salary, and sick days remaining for a number of employees. To load this into
pandas, just go back, create a
DataFrame that you can just call
df, set that equal to
pd.read_csv(), pass in the filename,
'hrdata.csv', and you can print that out just by calling a
print() on the
01:05 Just try running that, and there you go! You can see that everything imported. Looks like there was an issue here. Let’s go back to the CSV, and it looks like I put a period instead of a comma there. So we’ll save that. And just to be safe, let’s rerun it.
So, this is pretty straightforward to fix. Just go back here when you read the CSV and add in a parameter called
index_col and just set this equal to
'Name', just like that. Alrighty. Let’s rerun that.
pandas has us covered here as well. We just have to pass in another parameter, we can say
parse_dates—and because you may have multiples in here, we’ll pass in a list and just say
['Hire Date']. Okay, let’s try to rerun that.
to make this a little easier to read. Because this one already has header information, you can pass in
header=0 to ignore it, and we’ll add our own in. And just say
names and we’ll pass in a list that’ll just be
['Employee', 'Hired', 'Salary', 'Sick Days'].
04:03 So, save that—oh, we would probably change this one too. Actually, let’s just get rid of that line. Save that, rerun it. Then you can see you have the new header information here. And that’s it!
Become a Member to join the conversation.