Modifying Values in DataFrames: Label Indices

The pandas DataFrame: Working With Data Efficiently Cesar Aguilar 04:44

Transcript
Discussion (6)

00:00 So of course, one of the main things you’re going to be doing with a DataFrame is accessing and modifying the values of the DataFrame. Let’s introduce some markup here. We’re going to call this section, say, give it a heading label level two, and this will be ## Accessing and Modifying Data.

00:20 Now, we already touched upon this a little bit in our broad overview. We know that if we want to access a column—we know that our DataFrame has a name column—we can also use dot notation.

00:34 So in this case, .name is a valid Python identifier name, so we can do that. We also have the age, so .age was another valid identifier. So we can use dot notation.

00:48 Now, this accesses columns. We may also want to access rows directly. And to access rows, we have to use the .loc method, and this sort of works using list notation. So, for example, if I want to access the first row of the DataFrame, I just pass in the label.

01:08 So here, what’s going to happen is I’m going to get an error and that’s because the 0 integer—so this actual object, the 0—is not a label name of the index, right?

01:21 So I’m going to get an error. Let’s go all the way down here. And basically, I have a KeyError, right? This is not a valid index label. So if we take a look at the labels again, we know that the labels are 10, 11, 12, and so on. So actually, if I want to access an individual row, I need to do, so… If I’m going to use .loc accessor method, I need to use the actual label name.

01:47 So that means that the actual labels of the indices or the index, they can be any type of hashable Python object. So it can be any string, an integer. You know, these are going to be the usual types of labels that you use. So in this case, for example, if I want to access the row with label 11, then I would use that with the .loc accessor method. Now, .loc can also be used to access, say, subsets of your DataFrame, or a sub-DataFrame, just like you would with a NumPy array. And again, with .loc, they have to be actual labels of the rows and labels of the columns.

02:26 So I would have a comma separator, and then here I would pass in the row labels and here I would pass in the column labels, just like a two-dimensional NumPy array. And so, for example, if I wanted to get all of the rows of the columns for, say, the age and the py-score, I’m passing in a list containing the two column names 'age' and 'py-score'.

02:57 Then I get a subset or a sub-DataFrame, which is itself a DataFrame containing all of the row labels and just the age and py-score columns.

03:09 Now let’s suppose I wanted to pick off, say, only the rows where the labels are, say, multiples of 2 or are even. So to do that, I could go df.loc().

03:24 I want to pick off the labels, so I’ll say x for x in the .index if the index label, which is going to be an integer, is a multiple of 2, and so I’ll use the modulo operator (%) and then pass in not, because x % 2, if it’s even, I’m going to get 0 and so I want to flip the Boolean value from False to True.

03:52 So that will pick off only the row labels that are multiples of 2. And so there we go. We get 10, 12, 14, 16.

04:01 And then if I wanted, say, let’s say you wanted only the name and the city columns, then we would pass in the second part to the .loc accessor method,

04:14 and so then we get just the name and the city columns where the row labels are even. So with .loc, the key thing to remember is that the actual values that you pass in to .loc have to be the actual labels of either the row and the column.

04:32 However, sometimes you may want to access the values of a DataFrame or the rows or columns using integer indices, and this is what we’ll talk about in the next lesson.

medecau on June 6, 2022

we have to use the .loc method, and this sort of works using list notation.

.loc behaves like a dict, and the notation is indeed similar. The above comment primes the student to confuse these behaviors.

Martin Breuss RP Team on June 8, 2022

Thanks for the note medecau, we’ll look into updating this 👍

qubert on Jan. 20, 2023

In the video around 3:21, the instructor starts writing code that iterates through the indexes to operate on the data or indexes themselves. I’m just a little confused by the multiple uses of x in the iteration code. Could someone please briefly explain what the following code snippet does? ** [x for x in df.index …] **

Bartosz Zaczyński RP Team on Jan. 23, 2023

@qubert It’s a list comprehension, which is a shorthand notation for an equivalent loop:

for x in df.index:
    if not x % 2:
        yield x

You can use such an anonymous expression without assigning it to a variable, just like in the video. Comprehension expressions tend to be faster than regular loops, too.

Fuxuan Jia on March 11, 2023

how to apply markdown it it?? I cannot understand it

Martin Breuss RP Team on March 13, 2023

@Fuxuan Jia I think you mean using Markdown in a Jupyter Notebook?

You can learn how to do it in the lesson on Markdown Formatting in Jupyter Notebooks.

Become a Member to join the conversation.