Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Loading Your Dataset

Here are resources for the data used in this course:

00:00 Let’s load up a dataset. Here’s the URL for a CSV, or comma-separated file, containing basketball data from the website FiveThirtyEight. You can use another package, requests, to download that file.

00:17 requests is a package that wraps the urllib API provided by the Python standard library. It makes networking tasks with HTTP much easier.

00:27 In fact, the author calls it “HTTP for Humans.” If you’ve installed Anaconda, requests is included in the default environment, and if not, you can install it with pip.

00:40 First, import requests.

00:44 Then called the get() function and pass it the download_url. Store the response. Check the .status_code of the response. If it is 200, then everything should be good to go.

00:57 Open a file and write the .content of the response to it. Now the contents of the file are stored locally. Excellent! It’s time to load the CSV file into Pandas. Go ahead and import pandas.

01:13 Notice that the pandas package is aliased as pd. This is not a requirement but it is often how pandas is imported.

01:21 You’ll be making significant use of the pandas package and while shortening the package name by four letters might not seem like a lot right now, over time, it will reduce the amount that you need to type.

01:33 the data set can be loaded from the CSV file. Use the function read_csv() and pass it the path of the CSV file. Look at the type of nba.

01:47 So, what is this DataFrame? You’ll learn more about it later in the course but for now, think of a DataFrame as a way to store tabular data—that is, rows and columns. In fact, you can see how many rows are in the DataFrame by getting its length,

02:05 and you can see there are 126314 rows. The rows and columns can be found in the .shape of nba.

02:16 The .shape attribute is a tuple. The first value is the number of rows and the second value is the number of columns. This means there are 23 columns in the dataset.

02:27 To see the first five rows, get the head of nba. If you wanted to see 10 rows, you could pass 10 to .head(). The default number is 5.

02:38 And you can see one of the benefits of using Jupyter Notebook with Pandas. The Notebook is displayed in a webpage and it takes advantage of rich formatting using HTML, CSS, and in some cases, interactivity with JavaScript.

02:54 The column names are bold and the rows are zebra-striped so they’re easier to distinguish. But where did the column names come from? Go back to the tab with the directory listing. You should see the CSV file.

03:08 Click on it to open it. Notice that the first row of the file contains the column names, also referred to as the header row. By default, the read_csv() function will assume the first row of the CSV file to be the column names.

03:26 Something else interesting about this DataFrame is that not all of the columns are displayed. The columns in the middle have been omitted and an ellipses used as a placeholder to save space.

03:38 You can force Pandas to show all of the columns by setting the maximum number of columns. Also, notice that some of the numeric columns are showing up with six decimal places. Fix the number of decimal places to two with this option.

03:57 Now get the last five rows of the DataFrame with the .tail() function. You can see Pandas has applied the formatting. Also, you can get a specific number of rows using .tail(), the same as with .head(). To get the last 10 rows, pass the value 10 to the function .tail().

04:16 In the next lesson, you’ll start to explore your data using the statistics methods supplied by the DataFrame.

markcerv on June 6, 2021

You might want to check that the file you have downloaded is actually a CSV file with data in it. The first lines of the file SHOULD look like:

gameorder,game_id,lg_id,_iscopy,year_id,date_game,seasongame,is_playoffs,team_id,fran_id,pts,elo_i,elo_n,win_equiv,opp_id,opp_fran,opp_pts,opp_elo_i,opp_elo_n,game_location,game_result,forecast,notes
1,194611010TRH,NBA,0,1947,11/1/1946,1,0,TRH,Huskies,66,1300,1293.2767,40.29483,NYK,Knicks,68,1300,1306.7233,H,L,0.64006501,
1,194611010TRH,NBA,1,1947,11/1/1946,1,0,NYK,Knicks,68,1300,1306.7233,41.70517,TRH,Huskies,66,1300,1293.2767,A,W,0.35993499,
2,194611020CHS,NBA,0,1947,11/2/1946,1,0,CHS,Stags,63,1300,1309.6521,42.012257,NYK,Knicks,47,1306.7233,1297.0712,H,W,0.63110125,
...

When I first tried it, I ended up with HTML content from GitHub telling me, ” (Sorry about that, but we can’t show files that are this big right now.)” – so when I tried to run the command

nba = pd.read_csv('nba_all_elo.csv')

I ended up getting an error inside Pandas:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 80, saw 2

How did I solve this problem? I went to my browser to view the CSV file, got to viewing the actual data, and then saved that data/file as nba_all_elo.csv

Nick on Sept. 21, 2021

Am I looking in the wrong place or did you not provide the URL for the website in the transcript?

Become a Member to join the conversation.