Access Data in a Multi-Index DataFrame
So in the previous video, you learned how you can use the
keys argument to make it better understandable where does your data come from in the pandas DataFrame when you concatenate two DataFrames together. And through a multi-index, you can create unique labels, so you can still find every row, even if some of the column names or some of the rows in this repeat in the two DataFrames that you’re sticking together.
So this is the first concatenation that you did. It just sticks these two together, and you can see that the row labels, the row indices here, are repeating—
3 from the fruits DataFrame and
2 from the vegetables DataFrame.
And now if you tried to access these by saying
.loc and you want to maybe pick out
1:2—you want to get these two,
tomato, and all of the columns—you will run into an error. And it is a
It tells you that the problem is that you have a non-unique label here.
1 appears more than once in the whole DataFrame. And that’s true. You have it up here with the pear, and you have it again with the potato down there.
And this is why you can’t just use the location indexer as you might expect to pick out specific data if you don’t create a multi-index DataFrame. Now, as a quick aside, you can still use the index location one with
So you could use
.iloc to still get information, and it might just not be exactly what you expect. It’s going to be a bit different because here you’re talking about indices and this would be the label indices, so the same. Okay, so this is a quick aside.
So this is again your multi-index DataFrame. And now you can use the location indexer to access specific items in here still, even though you have repeat labels for the rows. And you can do it by just instead of giving only the labels here, you will, like, pass a tuple that first tells you what’s the first index—that’s
fruits here—and then the second one is going to be
And keep in mind that this is not a zero-based index that you’re working with here, but these are labels, which means that you could also get, for example,
2 up here from the
veggies DataFrame in a similar way …
without needing to know what is the actual index, the zero-based index of this row, which can get really hard if you’re working with a large DataFrame. And that’s why you usually like to work with labels for accessing data, but labels need to be unique so that you can use this location indexer to get them, and you can make them unique with a multi-index, which is quick to make by passing this
04:12 And you did that using the location indexer, and then you could pass a tuple that gives us the first input, the outer label, and then as the second input, the inner label, and that way you can use the location indexer and slice and dice your DataFrame just as you want to. In the next lesson, you’ll learn about an alternative way of dealing with the issue of duplicate labels, and you will learn how you can re-create a new index after concatenation, throwing the old one out and just getting a new zero-based one instead.
Become a Member to join the conversation.