Understand the Differences Between Left and Right Joins
In the previous lesson, you’ve seen what happens when you use an outer join on two DataFrames when using the
pd.merge() function. In this lesson, you’ll take a look at using
pd.merge() with yet another argument to the
how parameter, and you’ll use first left, and then try out a right join.
Now let’s take a look in here when I type
pd.merge() passing in, again, left the
fruits DataFrame and right the
veggies DataFrame, and now pass to
how just the string
"left". pandas is going to focus on the left DataFrame, so
fruits in this case, and it’s going to take the key indices from that DataFrame to perform the merge.
This means that the result is somewhat reminiscent of the
fruits DataFrame that you started out with, but it was able to find a match on the tomato again, and therefore added the value of
red, so the only one that had a value for the color column was the tomato.
is first part of this join, which relates to the keys from your original
fruits DataFrame. These were taken as the keys, and you can see them represented in here as well. Again, pandas performed a join operation, so it merged the two DataFrames.
It didn’t do any concatenation, but it performed a join. It found that inside of the vegetables DataFrame, there was a match on the
tomato row, both the name and the image, and therefore it added the
color row and a value for that one column while filling the ones that didn’t have a match with
NaN values. With this in mind, you might already have a good idea what happens when you do a right join.
02:58 and figured out where do you have a match with the other DataFrames and then created a merge operation on the rows that have a match and keep the data based on the indices of the right DataFrame that don’t have a match in the left DataFrame as well.
So what you are getting here looks very similar to the initial `veggies1 DataFrame. This is your
veggies DataFrame, and this is your output that you get when you perform a right outer join with
pd.merge() and passing
03:32 And the reason, in this case, that you essentially just get a copy of that same DataFrame, but it isn’t quite a copy—as you can see, the order is different of the columns, but the information in there is the same—and that’s simply because there is no additional data in the fruits DataFrame if you start off by considering the indices of the vegetables DataFrame.
So if you focus on
image columns of the vegetable DataFrame, you will again get one match that has tomato and the image also in the fruit DataFrame, which means that you will combine both of those.
The second one is
potato and the image of the potato, which doesn’t get a match, and then
carrot and the image of the carrot, which also doesn’t get a match. But since you’re performing an outer join, the information doesn’t get thrown away, but it’s kept around, but there’s no additional information to add from the fruits DataFrame,
05:00 Keep in mind though that if your vegetables DataFrame looked differently, or if your fruits DataFrame looked differently, this results could be quite different. For example, if you added a column to the fruits DataFrame—so let’s say,
05:30 I guess the lemon should probably not be sweet. Let me fix that. So we’ve got a sour tomato and a sour lemon and a sweet apple and a sweet pear. So there’s some additional information now inside of that fruits DataFrame that wasn’t present before.
And now if I merge these two DataFrames using
pd.merge(fruits, veggies), again using
how="right"—so focusing on the keys of the vegetable DataFrame, you will see that the output is quite different than before, because now you get the
taste column included because there was on one of the match rows, which in this case is only the row that contains information about the tomato, there was also additional information about its taste inside of the fruits DataFrame, and that had to be included. Now, again, you’re only focusing on the indices of the right DataFrame, which is the vegetables DataFrame in this case.
when you ran this left outer join here, and you got the
NaN values for the
color column that got added from the vegetable DataFrame. So in short, the difference between a left and the right join is that with the left join, pandas uses only the keys from the left frame, and with the right join, it uses only the keys from the right frame. This doesn’t change which key columns are used for the join, but you can define that as well,
Become a Member to join the conversation.