Re-create a New Index After Concatenation
00:00
In this lesson, you learn how to use the ignore_index
argument to, well, ignore the index and create a new one. In the previous lesson, you learned how you can use a multi-index to still be able to uniquely address specific data inside of your DataFrame, even after concatenation, and to avoid problems that come from repeat values in the different labels of the two DataFrames. And now in this video, I’ll show you how you can do this in yet another way, but some of the data is going to get lost. So if you don’t care for the data, for the labels of your rows, then you could just use this ignore_index
argument that I’m going to show you now, but let’s first open up the documentation. I can take a look at this.
00:45
I can see here, ignore_index
takes a Boolean value that is, by default, False
. And if you pass this one instead with True
, then pandas is not going to keep the labels here—0
, 1
, 2
, 3
, 0
, 1
, 2
—or on the column axis, depending on which way you are concatenating.
01:04
But instead, it’s going to reassign it to increasing numbers from 0
. So let’s take a look, not make it too theoretical. If I just stick this together in the most straightforward way with fruits
, vegetables—veggies
—
01:24
you get this DataFrame where the row labels repeat. Now, if you instead pass ignore_index=True
, you’ll see that pandas drops that information.
01:36
So it doesn’t remember that tomato
here used to be row label 0
in the original vegetables DataFrame, but it just starts renumbering them starting from 0
up to however many rows there are, and the same counts for the columns.
01:57 concatenate by columns instead—
02:01
remember you get repeat column names—but if you instead pass, ignore_index=True
, then pandas is going to discard that information and just label it starting from 0
to however many columns there are. As you can see, this brings some data loss with it because now you don’t actually know anymore what those column names were, which might be a problem.
02:25 And same that maybe the unique identifier that you used as labels for your rows is going to get lost if you use it on the rows axis. So you might only want to do this if you really don’t care about those original labels.
02:38
Let’s say there were some sort of number-based index for the rows. This is probably more often where you would use ignore_index
because usually the column names carry more information, but it might just as well be that this is an ID of a certain product that you don’t wanna lose, and so then you never want to use this ignore_index=True
and just stick with the default Boolean False
and use a multi-index to allow you to still account for any sort of duplicates that might be in there and make it possible to still work with your DataFrame.
03:10
So creating a multi-index using a keys
argument or just dropping that information and letting pandas do the numbering by using ignore_index
are two different ways that you can use to deal with the ambiguity that might come from having repeat labels, either in your column labels or in your row labels.
03:30
That wraps up using these additional arguments for pd.concat()
. In the next lesson, you’re going to look at one final of the optional arguments that is going to lead over to the .join()
and merge()
functionality that we’ll talk about later.
Become a Member to join the conversation.