Join us and get access to thousands of tutorials and a community of expert Pythonistas.

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Concatenate Along the Column Axis

Combining Data in pandas With concat() and merge() Martin Breuss 05:45

00:00 In this lesson, you’ll start taking a look at the optional arguments that you can pass to pd.concat() to change the way that it behaves.

00:10 One thing that I like to do when working with a new function and exploring it is to look at the documentation, and in IPython notebooks, or just the IPython interpreter, you can do this by passing a ? (question mark) at the end of a function instead of actually calling it.

00:25 And that brings up the documentation right in here in the notebooks or in your console, if you’re running the IPython console, and you can also open it up on the website, the pandas website, but this will do for us right now.

00:38 So here you get the function signature, and you can see that pd.concat() takes as the first argument an iterable, which is what you did up here by passing a list of two DataFrames, vegetables and fruits.

00:53 And then the second argument that we’re going to take a look at now is axis, and this defaults to 0, which stands for rows.

01:02 So what that means is just that it is going to stick the second DataFrame at the bottom of the first one, so to say. And you can change that behavior. So instead of doing the default, you can say pd.concat(), pass in the iterable—and I’ll start with fruits again, fruits and veggies.

01:24 And then instead of just taking the default axis=0—that’s the default—or "rows", which is equivalent, I will say, I want to concatenate on the columns. Again, you could say axis=1, which stands for columns, but just actually writing the term is, in my opinion, much more descriptive.

01:46 So I always like to do that. Now, before you run this, take a moment and think about what’s going to happen. You have the fruits DataFrame, you have the veggies DataFrame, and you’re concatenating them together at the columns axis.

01:58 And let’s draw that out just to take a look.

02:03 So you’re starting off again with the fruits DataFrame, two columns, three rows, and then you want to concatenate the vegetables DataFrame, which is three by three.

02:15 And this time, you don’t want to use the default of concatenating it on the row axis, which would be down below the first DataFrame, but here you want it concatenated by the columns. So it’ll look like that.

02:27 And that’s why we draw the columns in here. If this column’s here, and these are the cells

02:35 looks like I’m going to draw all the lines after all. There you go. And again, you end up with some space that pandas needs to fill down here, a couple of cells, and again, pandas is going to input here the NaN values.

02:52 So this is always just a question of how can you make a complete shape for the table when you just concatenate like that.

03:00 And pandas does that by filling the empty spaces with NaN values. Now, press Shift + Enter to execute this cell, and you can see that the result is as expected. Start off again, highlighting this. Here, you have your fruits DataFrame.

03:20 Here, you have your vegetables DataFrame. And just like again on the whiteboard, what you saw down here, you have the NaN values that pandas fills.

03:32 So that’s how concatenation across the columns axis works. Now, if you turn it around again—feel free to experiment with this—it’s going to take the order into account. So if you have veggies first, that’s going to put the veggies column first, and the NaN values are going to be in a different spot, but you might be confused about all those same-named columns here now.

03:53 Now you have name, image, name, color, image. So the same exact names for the columns appear more than once.

04:00 And you might not have noticed this before, but actually the same thing happened also up here, only that it wasn’t going across the columns, but across the rows. So here you have the row index, which starts off with 0, 1, and 2 for the vegetable DataFrame, and then starts again at 0, 1, 2, 3 for the fruits DataFrame.

04:21 So there’s just a lot of repetition here, and if you would wanna pick out one of those rows, for example, by using the location indexer, and you wanted to go by label, you couldn’t do that because these labels aren’t unique.

04:34 You have two times 0 in here, or here, if you wanted to address a specific column, you couldn’t do this because you have multiple columns with the same name.

05:20 You also visually saw what the different results are that happen when you concatenate along the columns instead of the rows.

05:30 In the next lesson, you will learn how you can mark your DataFrame using another argument, the keys argument, which allows you to create a multi-index, which mitigates some of the problems that can come from having duplicate labels.

Become a Member to join the conversation.