Concatenate Along the Row Axis
00:00
In this lesson, you’ll learn about the module-level function concat()
in the pandas library that you can use to concatenate DataFrames together.
00:11 Before I will actually run this here in the lesson, I’m going to give you a blank screen and draw it out a bit. So remember, we have these two DataFrames that we’re working with here, and one is basically the fruits that consists of two columns, and then it has four rows. And then the second one is your vegetables, and that has a different shape: it’s got three rows and it’s got three columns, so it looks kind of like this.
00:43 If you would just stick those two things together, there’s this empty space that appears up here. And this is literally what pandas does when you concatenate these two DataFrames.
00:53
It just needs to fill this space with something as well. What pandas puts in here are NaN
values, which stands for not a number. And this is your resulting DataFrame, essentially.
01:07
Let’s look at that when you’re actually using pd.concat()
on these two example DataFrames. Okay, say pd.concat()
, and then for the function call, I need to pass a list—an iterable of two DataFrames, for example.
01:21
So I can put in here fruits
and veggies
.
01:25
And when I run this concatenation, then you can see that you got exactly what we were expecting. You still have your fruits
DataFrame here, consists of two columns and four rows. And you have your vegetable DataFrame down here, just stuck below it, three columns and three rows.
01:47
And just like in the schematic that I showed before over here, you have this empty space that pandas needed to fill with something, and what it puts in here are NaN
values.
02:00
So this gives you a quick overview of what pd.concat()
does without passing any other arguments. You’re just putting in these two DataFrames, and you’re getting your result down here. Now, the order matters.
02:14
What do you think is going happen when you do the same thing, but pass veggies
before you pass fruits
? You can pause for a second and try it out yourself, but probably your intuition is going to be right about this.
02:27
You get the same result, but in a different order. So now you can see you have the vegetable DataFrame up top, and then you have down here, the fruits DataFrame, and what is maybe a little bit surprising is the order where the NaN
values go, but that’s just because you have this specific order of the first DataFrame that gets preserved, and the vegetable DataFrame was first column name
, second color
, third image
. So this needs to be applied on the one that gets concatenated to it.
02:59
And because the fruits DataFrame didn’t have a color, it gets filled up here with Nan
values.
03:07 The information that you get after this concatenation is the same, but the order of items is going to be different depending on how you pass them in here to the function.
03:18
This shows you the plain, no-arguments version of running pd.concat()
on two DataFrames. In the next lesson, you’ll explore some of the arguments that you can pass to change the behavior.
Become a Member to join the conversation.