Here are resources for the data used in this course:
Loading Your Dataset
Let’s load up a dataset. Here’s the URL for a CSV, or comma-separated file, containing basketball data from the website FiveThirtyEight. You can use another package,
requests, to download that file.
You’ll be making significant use of the
pandas package and while shortening the package name by four letters might not seem like a lot right now, over time, it will reduce the amount that you need to type.
So, what is this
DataFrame? You’ll learn more about it later in the course but for now, think of a
DataFrame as a way to store tabular data—that is, rows and columns. In fact, you can see how many rows are in the
DataFrame by getting its length,
02:54 The column names are bold and the rows are zebra-striped so they’re easier to distinguish. But where did the column names come from? Go back to the tab with the directory listing. You should see the CSV file.
Click on it to open it. Notice that the first row of the file contains the column names, also referred to as the header row. By default, the
read_csv() function will assume the first row of the CSV file to be the column names.
03:38 You can force Pandas to show all of the columns by setting the maximum number of columns. Also, notice that some of the numeric columns are showing up with six decimal places. Fix the number of decimal places to two with this option.
Now get the last five rows of the
DataFrame with the
.tail() function. You can see Pandas has applied the formatting. Also, you can get a specific number of rows using
.tail(), the same as with
.head(). To get the last 10 rows, pass the value
10 to the function
Become a Member to join the conversation.