Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Accessing Data in a DataFrame

00:00 Since a DataFrame is a collection of Series, everything you learned in the previous lesson also applies to DataFrames. But DataFrames are two-dimensional, so indexing them is a little different.

00:13 A DataFrame is conceptually like a Python dictionary, where the keys are the column names and the values are in a Series. Recall the city_data DataFrame from the previous lesson.

00:26 It has a 'revenue' column and the values in the column are stored in a Series with the city names as the index. For column names that are strings, you can treat them like attributes of the DataFrame and get each Series using dot notation.

00:43 Keep in mind that dot notation will not work if the column name is a DataFrame attribute or method name. For example, if you had a column named 'shape', you could access it with the indexing operator but not with dot notation. .shape is an attribute of the DataFrame and will always return the dimensions of the DataFrame. In general, dot notation should only be used in interactive sessions, such as a Jupyter Notebook.

01:11 You can use the .loc attribute to get a particular row in a DataFrame with the row’s label. The .iloc attribute will use the zero-based positional index of the row.

01:25 Additionally, you can slice rows using the .loc attribute.

01:31 This will select all rows, starting at the label 'Tokyo' up to and including 'Toronto'. The .loc attribute includes the upper bound.

01:43 Another trick that works on Python lists is negative indexing. For example, the second to last item in a list could be found at index -2. The same goes for the second to last row of a DataFrame using the .iloc attribute. Try it out on the nba DataFrame.

02:04 In the previous lesson, the .loc and .iloc attributes used only a single value, but DataFrames have a second dimension and the .loc and .iloc attributes have been extended to take advantage of it.

02:18 What if you wanted to get the revenue column for the cities 'Amsterdam' through 'Tokyo'? Similar to how you would index a multidimensional NumPy array, you would just add the column name.

02:33 And you can select multiple columns. Try this with the nba DataFrame. Select all games with the labels 5555 through 5559.

02:44 Then select the 'fran_id' (franchise ID), 'opp_fran' (opposition franchise), 'pts' (points), and 'opp_pts' (opposition points) columns.

02:54 Simply include the column names in square brackets. Now you can remove the unneeded columns. In the next lesson, you’ll learn how to use queries for selection with more accuracy.

Become a Member to join the conversation.