Mark Your DataFrames With Keys
In this lesson, you look at the optional
keys argument to
pd.concat() that can help you to deal with the situation where you have multiple things named the same as a result from the concatenation, whether that goes by the column axis or by the row axis.
and you can see in here, there’s the
keys argument that you’ll use. And there’s even an explanational docstring that relates to the
keys argument, where it says it can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same or overlapping on the passed axis number.
Now, if you turn this around and use the same call, but instead of using
axis="columns", you use the default of rows, but I’ll put it in explicitly. You could also just skip this whole argument.
Then you get the same concatenation that you got at the beginning. And again, you’ll see that you have done here—this relates to your
veggies DataFrame, now there’s no
NaN part of this one—but you can see that it’s part of up there because
fruits has one less column.
This is the
fruits DataFrame, and then here are the
NaN values that it needs to fill to create a full table down here. So this is how you can use the
keys argument to
pd.concat() to give a better idea of where did the data come from and kind of get rid of this ambiguity of having multiple
0 indices for example, or multiple columns of the same name.
In the next lesson, you will learn how you can actually access specific data items in such a multi-index DataFrame, which means that you will take a quick break from the different arguments to
pd.concat(), stick with the
keys one, and then just figure out what are the advantages of actually using this and which errors would you run into if you don’t.
Become a Member to join the conversation.