Access Data in a Multi-Index DataFrame
00:00 In this lesson, you’ll use the location indexer to access data in a multi-index DataFrame.
00:07
So in the previous video, you learned how you can use the keys
argument to make it better understandable where does your data come from in the pandas DataFrame when you concatenate two DataFrames together. And through a multi-index, you can create unique labels, so you can still find every row, even if some of the column names or some of the rows in this repeat in the two DataFrames that you’re sticking together.
00:31
And to show you this real quick, let’s make an example where you’re not actually using this keys
. I call it single
and the assign that to DataFrame.
00:44
So this is the first concatenation that you did. It just sticks these two together, and you can see that the row labels, the row indices here, are repeating—0
, 1
, 2
, 3
from the fruits DataFrame and 0
, 1
, 2
from the vegetables DataFrame.
01:00
And now if you tried to access these by saying .loc[]
and you want to maybe pick out 1:2
—you want to get these two, pear
and tomato
, and all of the columns—you will run into an error. And it is a KeyError
.
01:17
It tells you that the problem is that you have a non-unique label here. 1
appears more than once in the whole DataFrame. And that’s true. You have it up here with the pear, and you have it again with the potato down there.
01:29
And this is why you can’t just use the location indexer as you might expect to pick out specific data if you don’t create a multi-index DataFrame. Now, as a quick aside, you can still use the index location one with .iloc[]
.
01:43
That actually goes just from 0
through however much that is—1
, 2
, 3
, 4
, 5
, 6
.
01:49
So you could use .iloc[]
to still get information, and it might just not be exactly what you expect. It’s going to be a bit different because here you’re talking about indices and this would be the label indices, so the same. Okay, so this is a quick aside.
02:03 But how can you still use your favorite tool to access data in a DataFrame, the location indexer, if you have a concatenated DataFrame that has repeat either column names or repeat row labels?
02:17
And you can do this using this multi-index that you create here. Let me clean it up. We don’t need the default in here, and I will assign that to an aptly named multivitamin
DataFrame.
02:34
So this is again your multi-index DataFrame. And now you can use the location indexer to access specific items in here still, even though you have repeat labels for the rows. And you can do it by just instead of giving only the labels here, you will, like, pass a tuple that first tells you what’s the first index—that’s fruits
here—and then the second one is going to be 1
.
03:00
And from there, you want to go all the way to fruits
03:06
and 2
, and then let’s get all the columns again. And now you get the expected result. You get 1
and 2
, but it labels off the fruits
DataFrame.
03:18
And keep in mind that this is not a zero-based index that you’re working with here, but these are labels, which means that you could also get, for example, 1
and 2
up here from the veggies
DataFrame in a similar way …
03:35
without needing to know what is the actual index, the zero-based index of this row, which can get really hard if you’re working with a large DataFrame. And that’s why you usually like to work with labels for accessing data, but labels need to be unique so that you can use this location indexer to get them, and you can make them unique with a multi-index, which is quick to make by passing this keys
argument.
04:02
In this lesson, you’ve learned how you can access data in a multi-index DataFrame that would come as a result from passing values to the keys
argument in pd.concat()
.
04:12 And you did that using the location indexer, and then you could pass a tuple that gives us the first input, the outer label, and then as the second input, the inner label, and that way you can use the location indexer and slice and dice your DataFrame just as you want to. In the next lesson, you’ll learn about an alternative way of dealing with the issue of duplicate labels, and you will learn how you can re-create a new index after concatenation, throwing the old one out and just getting a new zero-based one instead.
Become a Member to join the conversation.