Modifying Values in DataFrames: Label Indices
So of course, one of the main things you’re going to be doing with a DataFrame is accessing and modifying the values of the DataFrame. Let’s introduce some markup here. We’re going to call this section, say, give it a heading label level two, and this will be
## Accessing and Modifying Data.
Now, we already touched upon this a little bit in our broad overview. We know that if we want to access a column—we know that our DataFrame has a
name column—we can also use dot notation.
So in this case,
.name is a valid Python identifier name, so we can do that. We also have the
.age was another valid identifier. So we can use dot notation.
Now, this accesses columns. We may also want to access rows directly. And to access rows, we have to use the
.loc method, and this sort of works using list notation. So, for example, if I want to access the first row of the DataFrame, I just pass in the label.
So here, what’s going to happen is I’m going to get an error and that’s because the
0 integer—so this actual object, the
0—is not a label name of the index, right?
So I’m going to get an error. Let’s go all the way down here. And basically, I have a
KeyError, right? This is not a valid index label. So if we take a look at the labels again, we know that the labels are
12, and so on. So actually, if I want to access an individual row, I need to do, so… If I’m going to use
.loc accessor method, I need to use the actual label name.
So that means that the actual labels of the indices or the index, they can be any type of hashable Python object. So it can be any string, an integer. You know, these are going to be the usual types of labels that you use. So in this case, for example, if I want to access the row with label
11, then I would use that with the
.loc accessor method. Now,
.loc can also be used to access, say, subsets of your DataFrame, or a sub-DataFrame, just like you would with a NumPy array. And again, with
.loc, they have to be actual labels of the rows and labels of the columns.
So I would have a comma separator, and then here I would pass in the row labels and here I would pass in the column labels, just like a two-dimensional NumPy array. And so, for example, if I wanted to get all of the rows of the columns for, say, the
age and the
py-score, I’m passing in a list containing the two column names
Then I get a subset or a sub-DataFrame, which is itself a DataFrame containing all of the row labels and just the
Now let’s suppose I wanted to pick off, say, only the rows where the labels are, say, multiples of 2 or are even. So to do that, I could go
I want to pick off the labels, so I’ll say
x for x in the
.index if the index label, which is going to be an integer, is a multiple of
2, and so I’ll use the modulo operator (
%) and then pass in
x % 2, if it’s even, I’m going to get
0 and so I want to flip the Boolean value from
So that will pick off only the row labels that are multiples of 2. And so there we go. We get
And then if I wanted, say, let’s say you wanted only the
name and the
city columns, then we would pass in the second part to the
.loc accessor method,
and so then we get just the
name and the
city columns where the row labels are even. So with
.loc, the key thing to remember is that the actual values that you pass in to
.loc have to be the actual labels of either the row and the column.
04:32 However, sometimes you may want to access the values of a DataFrame or the rows or columns using integer indices, and this is what we’ll talk about in the next lesson.
Thanks for the note medecau, we’ll look into updating this 👍
In the video around 3:21, the instructor starts writing code that iterates through the indexes to operate on the data or indexes themselves. I’m just a little confused by the multiple uses of x in the iteration code. Could someone please briefly explain what the following code snippet does? [x for x in df.index …]
@qubert It’s a list comprehension, which is a shorthand notation for an equivalent loop:
for x in df.index: if not x % 2: yield x
You can use such an anonymous expression without assigning it to a variable, just like in the video. Comprehension expressions tend to be faster than regular loops, too.
how to apply markdown it it?? I cannot understand it
@Fuxuan Jia I think you mean using Markdown in a Jupyter Notebook?
You can learn how to do it in the lesson on Markdown Formatting in Jupyter Notebooks.
Become a Member to join the conversation.
medecau on June 6, 2022
.locbehaves like a dict, and the notation is indeed similar. The above comment primes the student to confuse these behaviors.