Concatenate Along the Column Axis
One thing that I like to do when working with a new function and exploring it is to look at the documentation, and in IPython notebooks, or just the IPython interpreter, you can do this by passing a
? (question mark) at the end of a function instead of actually calling it.
00:25 And that brings up the documentation right in here in the notebooks or in your console, if you’re running the IPython console, and you can also open it up on the website, the pandas website, but this will do for us right now.
So here you get the function signature, and you can see that
pd.concat() takes as the first argument an iterable, which is what you did up here by passing a list of two DataFrames, vegetables and fruits.
So what that means is just that it is going to stick the second DataFrame at the bottom of the first one, so to say. And you can change that behavior. So instead of doing the default, you can say
pd.concat(), pass in the iterable—and I’ll start with
And then instead of just taking the default
axis=0—that’s the default—or
"rows", which is equivalent, I will say, I want to concatenate on the columns. Again, you could say
axis=1, which stands for columns, but just actually writing the term is, in my opinion, much more descriptive.
So I always like to do that. Now, before you run this, take a moment and think about what’s going to happen. You have the
fruits DataFrame, you have the
veggies DataFrame, and you’re concatenating them together at the columns axis.
02:15 And this time, you don’t want to use the default of concatenating it on the row axis, which would be down below the first DataFrame, but here you want it concatenated by the columns. So it’ll look like that.
looks like I’m going to draw all the lines after all. There you go. And again, you end up with some space that pandas needs to fill down here, a couple of cells, and again, pandas is going to input here the
And pandas does that by filling the empty spaces with
NaN values. Now, press Shift + Enter to execute this cell, and you can see that the result is as expected. Start off again, highlighting this. Here, you have your fruits DataFrame.
So that’s how concatenation across the columns axis works. Now, if you turn it around again—feel free to experiment with this—it’s going to take the order into account. So if you have
veggies first, that’s going to put the
veggies column first, and the
NaN values are going to be in a different spot, but you might be confused about all those same-named columns here now.
And you might not have noticed this before, but actually the same thing happened also up here, only that it wasn’t going across the columns, but across the rows. So here you have the row index, which starts off with
2 for the vegetable DataFrame, and then starts again at
3 for the fruits DataFrame.
04:21 So there’s just a lot of repetition here, and if you would wanna pick out one of those rows, for example, by using the location indexer, and you wanted to go by label, you couldn’t do that because these labels aren’t unique.
In this lesson, you learned how to concatenate two DataFrames along the column axis. And you also heard about a couple of problems that you might run into, but you will learn more about them later on. To make this happen, you used the
pd.concat() call, passing in your iterable of two DataFrames and then also adding the optional keyword argument
axis with the value of
"columns". Now using
"columns" instead of the default of
"rows" allowed you to concatenate the two DataFrames along the column axis.
In the next lesson, you will learn how you can mark your DataFrame using another argument, the
keys argument, which allows you to create a multi-index, which mitigates some of the problems that can come from having duplicate labels.
Become a Member to join the conversation.