Join DataFrames Using Index Columns
and now there’s a couple of other options. You have
left_on, which allows you to specify which columns or combination of columns to use in the left DataFrame that you want to combine with the right DataFrame.
So you can mix and match here. And then you also have the options of
right_index, which default to
False, both of them, but by setting these ones to
True, you can ask pandas to use the index columns instead of some of the named columns for performing the join.
I will give it a try in here by using both index columns at first. So in the
fruits DataFrame, it looks like this, and you can see it has an index column that starts at
0 and numbers the rows up to
3, and also the
veggies DataFrame has an index column that starts at
0 and goes up to
2. If you join these too on the index column, and this is what you will get.
01:59 Now again, you can pause and think about what’s going to happen when you execute this line of code. I’m going to run it, and you can see that it combines all the data in the matching rows. Here, the matching rows are based on data that might not have meaning. In this case, it probably doesn’t, but depending on what your DataFrame is, this could be some sort of ID, for example.
So pandas combines these two and puts the information of the
fruits DataFrame in here, prepending the repetitive column names with
_x and then next to it in the same row, it puts the information that comes from the
veggies DataFrame, and same with the other matches on index
1, and index
2 and index
3 in the
fruits DataFrame doesn’t have a corresponding match in the
veggies DataFrame. So, because you’re performing the default inner join, it just gets dropped and isn’t part of the resulting DataFrame. This might make sense, like I said, if you have, for example, product IDs as your indices, it might make sense. In this case, it doesn’t really, but let’s look at the example where we assume that the index in the
veggies DataFrame is maybe actually an amount. Now for this, I will slightly edit the data we’ve been working with.
fruits has three columns and the third one holds an amount. It still has the index column, right? And the veggies DataFrame looks the same as it did before. Now, we could assume that maybe in this dataset here, it means that there’s zero tomatoes, one potato, and two carrots. You know, that’s probably not the greatest index column because there might be repetitive values, but let’s give it a try and merge these two DataFrames on the
amount column for the
fruits DataFrame and on the index of the
veggies DataFrame, just to show you how you can mix and match all of these different ways of combining the data. Again, you can pause the video and try this yourself before you see the solution.
And if I execute this, you get a combined DataFrame that might or might not make sense, but it found the row. In this case, the index row
0 in the
veggies DataFrame was matched with index row
2 of the
fruits DataFrame, because the amount here was zero, and that’s the column that you specified would be used to perform the merge.
So in here you see that the tomato here with the amount
0 got matched with the tomato over here from the
veggies DataFrame, and you can also pick apart the other ones. Let’s do one more.
apple here has an amount of
1, and here you specified in the
fruits DataFrame that the
amount column would be the one to consider for the merge.
And here it found a matching column in the
potato that has the index
1, which is why the apple and the potato got combined into a new row in this new DataFrame. And again, pandas keeps track of where does the data come from and adds this automatic
_x for the left DataFrame and automatic
_y for potential duplicate column names from the right DataFrame.
In this way, you can mix and match. You can define
right_on, you can define
left_index, and pick and choose which columns or even indices in your DataFrames contain the information that you want to perform the merge on.
In the next lesson, you will learn how you can modify
_y, that default appended strings that pandas creates, and make it a little more descriptive of which DataFrame did it actually come from, or what was the name of the DataFrame, for example.
Become a Member to join the conversation.