Modifying Values in DataFrames: Label Indices
00:00
So of course, one of the main things you’re going to be doing with a DataFrame is accessing and modifying the values of the DataFrame. Let’s introduce some markup here. We’re going to call this section, say, give it a heading label level two, and this will be ## Accessing and Modifying Data
.
00:20
Now, we already touched upon this a little bit in our broad overview. We know that if we want to access a column—we know that our DataFrame has a name
column—we can also use dot notation.
00:34
So in this case, .name
is a valid Python identifier name, so we can do that. We also have the age
, so .age
was another valid identifier. So we can use dot notation.
00:48
Now, this accesses columns. We may also want to access rows directly. And to access rows, we have to use the .loc
method, and this sort of works using list notation. So, for example, if I want to access the first row of the DataFrame, I just pass in the label.
01:08
So here, what’s going to happen is I’m going to get an error and that’s because the 0
integer—so this actual object, the 0
—is not a label name of the index, right?
01:21
So I’m going to get an error. Let’s go all the way down here. And basically, I have a KeyError
, right? This is not a valid index label. So if we take a look at the labels again, we know that the labels are 10
, 11
, 12
, and so on. So actually, if I want to access an individual row, I need to do, so… If I’m going to use .loc
accessor method, I need to use the actual label name.
01:47
So that means that the actual labels of the indices or the index, they can be any type of hashable Python object. So it can be any string, an integer. You know, these are going to be the usual types of labels that you use. So in this case, for example, if I want to access the row with label 11
, then I would use that with the .loc
accessor method. Now, .loc
can also be used to access, say, subsets of your DataFrame, or a sub-DataFrame, just like you would with a NumPy array. And again, with .loc
, they have to be actual labels of the rows and labels of the columns.
02:26
So I would have a comma separator, and then here I would pass in the row labels and here I would pass in the column labels, just like a two-dimensional NumPy array. And so, for example, if I wanted to get all of the rows of the columns for, say, the age
and the py-score
, I’m passing in a list containing the two column names 'age'
and 'py-score'
.
02:57
Then I get a subset or a sub-DataFrame, which is itself a DataFrame containing all of the row labels and just the age
and py-score
columns.
03:09
Now let’s suppose I wanted to pick off, say, only the rows where the labels are, say, multiples of 2 or are even. So to do that, I could go df.loc()
.
03:24
I want to pick off the labels, so I’ll say x for x
in the .index
if the index label, which is going to be an integer, is a multiple of 2
, and so I’ll use the modulo operator (%
) and then pass in not
, because x % 2
, if it’s even, I’m going to get 0
and so I want to flip the Boolean value from False
to True
.
03:52
So that will pick off only the row labels that are multiples of 2. And so there we go. We get 10
, 12
, 14
, 16
.
04:01
And then if I wanted, say, let’s say you wanted only the name
and the city
columns, then we would pass in the second part to the .loc
accessor method,
04:14
and so then we get just the name
and the city
columns where the row labels are even. So with .loc
, the key thing to remember is that the actual values that you pass in to .loc
have to be the actual labels of either the row and the column.
04:32 However, sometimes you may want to access the values of a DataFrame or the rows or columns using integer indices, and this is what we’ll talk about in the next lesson.
Martin Breuss RP Team on June 8, 2022
Thanks for the note medecau, we’ll look into updating this 👍
qubert on Jan. 20, 2023
In the video around 3:21, the instructor starts writing code that iterates through the indexes to operate on the data or indexes themselves. I’m just a little confused by the multiple uses of x in the iteration code. Could someone please briefly explain what the following code snippet does? ** [x for x in df.index …] **
Bartosz Zaczyński RP Team on Jan. 23, 2023
@qubert It’s a list comprehension, which is a shorthand notation for an equivalent loop:
for x in df.index:
if not x % 2:
yield x
You can use such an anonymous expression without assigning it to a variable, just like in the video. Comprehension expressions tend to be faster than regular loops, too.
Fuxuan Jia on March 11, 2023
how to apply markdown it it?? I cannot understand it
Martin Breuss RP Team on March 13, 2023
@Fuxuan Jia I think you mean using Markdown in a Jupyter Notebook?
You can learn how to do it in the lesson on Markdown Formatting in Jupyter Notebooks.
Become a Member to join the conversation.
medecau on June 6, 2022
.loc
behaves like a dict, and the notation is indeed similar. The above comment primes the student to confuse these behaviors.