Join Columns When Concatenating on Rows

Combining Data in pandas With concat() and merge() Martin Breuss 04:46

00:00 In this lesson, you’ll explore the optional join parameter that you can pass values to when you’re calling pd.concat(). So far, you haven’t explicitly passed any argument to the join parameter, so all the calls look like this: pd.concat() and then passed in fruits, veggies, and then a couple of other potential keyword arguments.

00:29 But what you did implicitly was using the join with its default value, which is "outer". So this is the same call as if you would leave out this keyword argument.

00:43 And this performs an outer join on the axis that you’re not specifying, so the default axis here—let’s put it in to make this explicit—is "rows",

00:55 which means that you’re performing the concatenation along the rows. You can see that the second DataFrame gets stuck at the bottom, like on the row axis of the first one.

01:07 And the join is an outer by default. These are both default values up here. And this is the result that you get with that. The axis argument defines where the concatenation should happen, and you’ve heard about this in a previous lesson.

01:22 The join parameter and the argument that you pass here defines what happens on the other axis. So in this case, the join defines what should happen on the column axis, and the default says that it should be an outer join, which as a set operation just means that everything should be included.

01:41 Now you can change that to the alternative value of "inner", which is going to perform an inner join, which means that values that produce a NaN in the outer join are just not going to appear in here. So before I will run this, I’ll again open a little drawing pad, and then we can explore what happens with an inner join in this case.

02:07 If you think about the familiar DataFrame that you know from before—fruits one consisting of two columns and four rows,

02:16 and then the vegetables one consisting of three columns and three rows—

03:10 When I now run this command, using the "inner" argument to the join parameter, then you’ll see the output has a different shape than if you’d use the "outer" default argument.

03:24 And to bring this back in connection with the drawing that I just did before, you can still see here that you have the fruits DataFrame, and you also have the vegetables DataFrame, but it changed shape.

03:41 You cut off this one column that it had extra because there wasn’t any equivalent in the fruits DataFrame. So this, pandas got rid of this because you were using "inner" as the method to join the column axis.

04:03 In this lesson, you learned how you can perform an inner join on columns while you’re concatenating your DataFrames on rows. And to do this, you had to explicitly pass the keyword argument join with the value of "inner" instead of the default value of "outer".

04:20 And the result of this was that instead of filling your DataFrame up with NaN values, what pandas does is that it cuts off the column that would contain the NaN values otherwise. In the next lesson, you will learn how you can use the same join keyword argument to perform an inner join on the rows while you’re concatenating on columns.

Become a Member to join the conversation.