Exploring DataFrame and Series Objects
If you don’t want to run the code on your local machine, you can find the course demos on Google Colab. In the second demo, you will take a closer look at DataFrame
and Series
objects, including how to construct them out of lists and dictionaries.
00:00 Until now, you’ve been looking at DataFrames in terms of rows and columns, and while it makes sense to think of them intuitively using rows and columns, the internal structure is a little different.
00:12
What you’ve been thinking of as a column so far is, in reality, another Pandas data structure called a Series
. You briefly saw a Series
in the previous lesson, but it’s time to take a closer look.
00:25
The Series
consists of two parts: values and identifiers. The values are a sequence, similar to a list in Python. The identifiers are mapped to the values.
00:38 The collection of identifiers is called the index.
00:43
It’s quite simple to create a Series
in Pandas.
00:49
The values and index can be accessed with the .values
and .index
attributes. The values are returned as a NumPy array. The index is a special type of RangeIndex
with an upper and lower bound, and there are other types of indexes that you’ll see later on.
01:08
You can also explicitly declare the index with the index
keyword argument. This time, the index is just an Index
type. Keep in mind that the RangeIndex
is still valid as well.
01:22
Every Series
will keep a numeric index by default. You may have noticed some similarities with the Series
you have created and the primitive Python collection types. For example, the revenues
Series
is similar to a Python list.
01:38
The city_revenues
Series
is like a Python dictionary, as you can use the index to retrieve the associated values. The dictionary can also be used as the data for a Series
.
01:53
The Series
also supports the .keys()
method and the in
keyword.
02:00
Notice that the Series
can be used to create a DataFrame
. The keys of the dictionary are used for the column names. Notice that the Series
used to create the DataFrame
must have the same index.
02:15
Here, the Series
share the identifiers 'Tokyo'
and 'Amsterdam'
. 'Toronto'
exists in the city_revenues
, but not city_employee_count
, and that is why the value for 'Toronto'
in the 'employee_count'
column is NaN
.
02:32
Look at the .axes
of the DataFrame
. This is a list of two Index
objects. The first is the rows and the second is the columns. Keep this in mind as you go through the course.
02:45
The axis of 0
is the row axis and the axis of 1
is the column axis. Try this exercise to test your knowledge of DataFrame
internals.
02:57
The nba
DataFrame
has a column with the number of points scored in the game by a team. Was the column spelled 'points'
or was it shortened to 'pts'
?
03:10
You can check using the in
keyword. Recall that a DataFrame
has two axes, the first being rows and second being columns. You will want to use the columns axis.
03:24
This shows that 'pts'
is the correct column. Also, you could have used the .keys()
method to get the columns.
03:33 Now that you understand how Series and DataFrames work together, in the next lesson, you’ll see how to drill down into the data that they hold for you.
Become a Member to join the conversation.