Importing CSV Data Into a Pandas DataFrame
This is the first video of the course and defines the objectives of this course. Furthermore, you’ll learn how to configure packages used during the course as well as explore the used dataset and how to load it into a Pandas DataFrame.
Resources:
- Download
kevin.csv
from basketball-reference.com (Click on Share & more → Get table as CSV (for Excel))
00:00
We’re going to be talking about Pandas DataFrames. This time we’re going to dive deeper into the methods that the DataFrames provide—things like inspecting the datasets as well as slicing and dicing a DataFrame, as well as gathering statistics about the DataFrame that you may have, such as mean, median, and as well as dealing with things like .groupby()
.
00:23 Seeing as we’ll need some data to work with when dealing with DataFrames, I’ve decided to pull up Kevin Durant’s 2012-2013 basketball season stats. We have stats on the number of minutes he’s played, who he’s played against, his age, what game, number of free throws—all kinds of stuff.
00:41 And that is an interesting set of data we can use to gather some stuff. This is kind of cool, because this is my first year playing in a fantasy basketball league, so this might be fun just to explore what information we can gather on particular players, seeing how they do against certain teams, and maybe even see things about their play type and how long they’ve played and how many points they’ll score in a particular minute. So, we have this data.
01:05
What I went ahead and did was import this data into our Python Notebook. I simply at first imported vincent
, which we’ll use to visualize some of the statistics information we gather later on. I’ve imported pandas
, we’ve imported specifically the DataFrame
and Series
objects.
01:22
We’ve initialized vincent
to work with the Python Notebook.
01:26
And we set our print .max_columns
to None
. What this allows us to do is render long tables like so. These tables here are quite long and won’t render sometimes, so you’ll get a compressed view of the DataFrame.
01:38 What this allows us to do is expand it and so we’ll get as many columns as needed and it’ll end up becoming a scroll bar, as you can see here. I have then taken all the column names and wrote them out in a list, so that will be our column names.
01:52
We then also then took that kevin.csv
you saw here a second ago and imported that into pandas
with the columns that I’ve named above, and we end up having something similar to this.
02:06
The .head()
command allows us to specify the number of items you’ll return in our particular set.
Schumi Chou on June 21, 2019
I was wondering the same question - could RealPython help to share “kevin.csv” which this course uses? So that we can play with it directly to have more practical impression and experience. Thanks so much!
Dan Bader RP Team on June 21, 2019
Thanks for the heads up, you can download the CSV file at basketball-reference.com. Just click on Share & more → Get table as CSV (for Excel) and you can copy the table and save it to a file named kevin.csv
.
olagappanmuthu on Dec. 28, 2019
How do I install the “vincent” package?
pshekhar2707 on March 5, 2020
to install vincent package : i used following command at anaconda prompt (logged in as admin) conda install -c conda-forge vincent
pshekhar2707 on March 5, 2020
After reading file from site mentioned by Dan, you would notice in dataframe 2 columns as :’Unnamed: 5’, and ‘Unnamed: 7’.
So we need to rename those columns as : data.rename(columns={‘Unnamed: 5’:’Home_Away’, ‘Unnamed: 7’:’Win_Loss’}, inplace=True)
gmodelgado on March 15, 2020
May I have the course notebook?
Ricky White RP Team on March 16, 2020
Hi gmodelgado. There is not a notebook that accompanies this course. Sorry.
The link to the CSV file, however, is a above.
Hung Chua on May 17, 2020
pip3 list shows that I’ve got vincent somehow it won’t load in jupyter No problems with pandas
Become a Member to join the conversation.
Anonymous on June 1, 2019
where do i go to get the “kevin.csv” file that you are using?