Exploring DataFrame and Series Objects

If you don’t want to run the code on your local machine, you can find the course demos on Google Colab. In the second demo, you will take a closer look at DataFrame and Series objects, including how to construct them out of lists and dictionaries.

00:00 Until now, you’ve been looking at DataFrames in terms of rows and columns, and while it makes sense to think of them intuitively using rows and columns, the internal structure is a little different.

00:12 What you’ve been thinking of as a column so far is, in reality, another Pandas data structure called a Series. You briefly saw a Series in the previous lesson, but it’s time to take a closer look.

00:25 The Series consists of two parts: values and identifiers. The values are a sequence, similar to a list in Python. The identifiers are mapped to the values.

00:38 The collection of identifiers is called the index.

00:43 It’s quite simple to create a Series in Pandas.

00:49 The values and index can be accessed with the .values and .index attributes. The values are returned as a NumPy array. The index is a special type of RangeIndex with an upper and lower bound, and there are other types of indexes that you’ll see later on.

01:08 You can also explicitly declare the index with the index keyword argument. This time, the index is just an Index type. Keep in mind that the RangeIndex is still valid as well.

01:22 Every Series will keep a numeric index by default. You may have noticed some similarities with the Series you have created and the primitive Python collection types. For example, the revenues Series is similar to a Python list.

01:38 The city_revenues Series is like a Python dictionary, as you can use the index to retrieve the associated values. The dictionary can also be used as the data for a Series.

01:53 The Series also supports the .keys() method and the in keyword.

02:00 Notice that the Series can be used to create a DataFrame. The keys of the dictionary are used for the column names. Notice that the Series used to create the DataFrame must have the same index.

02:15 Here, the Series share the identifiers 'Tokyo' and 'Amsterdam'. 'Toronto' exists in the city_revenues, but not city_employee_count, and that is why the value for 'Toronto' in the 'employee_count' column is NaN.

02:32 Look at the .axes of the DataFrame. This is a list of two Index objects. The first is the rows and the second is the columns. Keep this in mind as you go through the course.

02:45 The axis of 0 is the row axis and the axis of 1 is the column axis. Try this exercise to test your knowledge of DataFrame internals.

02:57 The nba DataFrame has a column with the number of points scored in the game by a team. Was the column spelled 'points' or was it shortened to 'pts'?

03:10 You can check using the in keyword. Recall that a DataFrame has two axes, the first being rows and second being columns. You will want to use the columns axis.

03:24 This shows that 'pts' is the correct column. Also, you could have used the .keys() method to get the columns.

03:33 Now that you understand how Series and DataFrames work together, in the next lesson, you’ll see how to drill down into the data that they hold for you.

Become a Member to join the conversation.