Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Understand the Differences Between Left and Right Joins

00:00 In the previous lesson, you’ve seen what happens when you use an outer join on two DataFrames when using the pd.merge() function. In this lesson, you’ll take a look at using pd.merge() with yet another argument to the how parameter, and you’ll use first left, and then try out a right join.

00:19 These are somewhat equivalent to SQL left outer joins and right outer joins. So as you might expect, they do something similar to what you’ve seen with pd.merge() outer.

00:33 Now let’s take a look in here when I type pd.merge() passing in, again, left the fruits DataFrame and right the veggies DataFrame, and now pass to how just the string "left". pandas is going to focus on the left DataFrame, so fruits in this case, and it’s going to take the key indices from that DataFrame to perform the merge.

00:57 This means that the result is somewhat reminiscent of the fruits DataFrame that you started out with, but it was able to find a match on the tomato again, and therefore added the value of red, so the only one that had a value for the color column was the tomato.

01:16 And so it added the value there and filled the rest with NaN values. So basically what you get as an output here

01:27 is first part of this join, which relates to the keys from your original fruits DataFrame. These were taken as the keys, and you can see them represented in here as well. Again, pandas performed a join operation, so it merged the two DataFrames.

01:45 It didn’t do any concatenation, but it performed a join. It found that inside of the vegetables DataFrame, there was a match on the tomato row, both the name and the image, and therefore it added the color row and a value for that one column while filling the ones that didn’t have a match with NaN values. With this in mind, you might already have a good idea what happens when you do a right join.

02:19 So let’s try that out next.

02:24 You can pause the video for a second and try to figure out what’s going to happen.

02:31 So here’s your result for this join, and against what you might have expected, it’s not the remaining columns down here—it’s still potato and carrot, but something different happened.

02:46 What happened here is that pandas took into account the indices of the right DataFrame—in your case, that’s the vegetables DataFrame—

02:58 and figured out where do you have a match with the other DataFrames and then created a merge operation on the rows that have a match and keep the data based on the indices of the right DataFrame that don’t have a match in the left DataFrame as well.

03:13 So what you are getting here looks very similar to the initial `veggies1 DataFrame. This is your veggies DataFrame, and this is your output that you get when you perform a right outer join with pd.merge() and passing how="right".

03:32 And the reason, in this case, that you essentially just get a copy of that same DataFrame, but it isn’t quite a copy—as you can see, the order is different of the columns, but the information in there is the same—and that’s simply because there is no additional data in the fruits DataFrame if you start off by considering the indices of the vegetables DataFrame.

03:55 So if you focus on name and image columns of the vegetable DataFrame, you will again get one match that has tomato and the image also in the fruit DataFrame, which means that you will combine both of those.

04:10 But because your focus is on the indices of the veggies DataFrame on the right one, in this case, it means that you’re thinking about this key, this key, and this key.

04:23 They’re all composite keys of name plus image. So the first one is tomato and the image of the tomato, which gets a match.

04:33 The second one is potato and the image of the potato, which doesn’t get a match, and then carrot and the image of the carrot, which also doesn’t get a match. But since you’re performing an outer join, the information doesn’t get thrown away, but it’s kept around, but there’s no additional information to add from the fruits DataFrame,

04:53 which is why in this specific case, you essentially end up with a copy of the right DataFrame.

05:00 Keep in mind though that if your vegetables DataFrame looked differently, or if your fruits DataFrame looked differently, this results could be quite different. For example, if you added a column to the fruits DataFrame—so let’s say,

05:30 I guess the lemon should probably not be sweet. Let me fix that. So we’ve got a sour tomato and a sour lemon and a sweet apple and a sweet pear. So there’s some additional information now inside of that fruits DataFrame that wasn’t present before.

05:47 And now if I merge these two DataFrames using pd.merge(fruits, veggies), again using how="right"so focusing on the keys of the vegetable DataFrame, you will see that the output is quite different than before, because now you get the taste column included because there was on one of the match rows, which in this case is only the row that contains information about the tomato, there was also additional information about its taste inside of the fruits DataFrame, and that had to be included. Now, again, you’re only focusing on the indices of the right DataFrame, which is the vegetables DataFrame in this case.

06:32 So the other two rows that are going to stay over are about the potato and the carrot. And since they don’t have information about the taste, these again get filled with NaN values.

06:44 So this result is similar to what you’ve seen before when you saw the color column get added to the fruit DataFrame When you ran left outer join—let me scroll up there—

06:58 when you ran this left outer join here, and you got the NaN values for the color column that got added from the vegetable DataFrame. So in short, the difference between a left and the right join is that with the left join, pandas uses only the keys from the left frame, and with the right join, it uses only the keys from the right frame. This doesn’t change which key columns are used for the join, but you can define that as well,

07:28 and you’ll learn how to specify the join columns explicitly in the next lesson.

Become a Member to join the conversation.