Join Columns When Concatenating on Rows
In this lesson, you’ll explore the optional
join parameter that you can pass values to when you’re calling
pd.concat(). So far, you haven’t explicitly passed any argument to the
join parameter, so all the calls look like this:
pd.concat() and then passed in
fruits, veggies, and then a couple of other potential keyword arguments.
But what you did implicitly was using the
join with its default value, which is
"outer". So this is the same call as if you would leave out this keyword argument.
And this performs an outer join on the axis that you’re not specifying, so the default axis here—let’s put it in to make this explicit—is
00:55 which means that you’re performing the concatenation along the rows. You can see that the second DataFrame gets stuck at the bottom, like on the row axis of the first one.
And the join is an outer by default. These are both default values up here. And this is the result that you get with that. The
axis argument defines where the concatenation should happen, and you’ve heard about this in a previous lesson.
join parameter and the argument that you pass here defines what happens on the other axis. So in this case, the join defines what should happen on the column axis, and the default says that it should be an outer join, which as a set operation just means that everything should be included.
Now you can change that to the alternative value of
"inner", which is going to perform an inner join, which means that values that produce a
NaN in the outer join are just not going to appear in here. So before I will run this, I’ll again open a little drawing pad, and then we can explore what happens with an inner join in this case.
If you think about the familiar DataFrame that you know from before—
fruits one consisting of two columns and four rows,
02:16 and then the vegetables one consisting of three columns and three rows—
and you’re concatenating the two of them by the row axis, which means here. Now, what you did before with the outer joint on the column access, right? This is where the
join parameter operates on, and the default is
"outer". So this is why you got those
NaN values filling up the shape to produce a full, like, clean shape for your new table. But now if you use, instead of
"outer", you use—and that goes in quotes—
"inner", then it performs an inner join, which means that it’s going to get rid of everything that doesn’t have a corresponding space in the second DataFrame.
When I now run this command, using the
"inner" argument to the
join parameter, then you’ll see the output has a different shape than if you’d use the
"outer" default argument.
03:24 And to bring this back in connection with the drawing that I just did before, you can still see here that you have the fruits DataFrame, and you also have the vegetables DataFrame, but it changed shape.
You cut off this one column that it had extra because there wasn’t any equivalent in the fruits DataFrame. So this, pandas got rid of this because you were using
"inner" as the method to join the column axis.
In this lesson, you learned how you can perform an inner join on columns while you’re concatenating your DataFrames on rows. And to do this, you had to explicitly pass the keyword argument
join with the value of
"inner" instead of the default value of
And the result of this was that instead of filling your DataFrame up with
NaN values, what pandas does is that it cuts off the column that would contain the
NaN values otherwise. In the next lesson, you will learn how you can use the same
join keyword argument to perform an inner join on the rows while you’re concatenating on columns.
Become a Member to join the conversation.