Using Indices
00:00 In this lesson, you’ll be looking at the books dataset indices, how to use them, and how to work with them. Indices, then. You usually want something to refer to the rows, something unique with which you can refer to each row, something like an index number, but it doesn’t have to be an index number.
00:17 It can be a string. It can be a hash value, a checksum. There are many different options you have there. It doesn’t have to be unique, but it generally is much more helpful if it is. The DataFrame, as you’ve seen, if it doesn’t find one automatically will add one.
00:32
And usually that will just be a list of incrementing numbers from 0
up to however many rows there are. Again, it’s better if it’s unique, but it doesn’t have to be. pandas won’t enforce that on you.
00:44 If you already have an identifier column, you can first check to see if the values are unique, and then you can explicitly set that DataFrame’s index to that column that you want to be the identifier.
00:58
So there is one clear candidate to be our identifier, our index, which is the id
column. We can take a look at this. books.loc[:]
will get all the rows, and we’ll look at the id
column.
01:18 That looks like a bunch of numbers. They go up quite a bit far more than the number of rows there are. But the thing that you’re interested in right now is whether it’s unique or not.
01:28
So, one thing you can do is to use this property called is_unique
right on the series that we returned, basically the column of id
.
01:37
This is not a method, it’s just a property. So if you run this, it will return a True
or False
value indicating whether it’s unique.
01:44
And as you can see, it is unique. To set that as the index, you can call the method of .set_index()
on the DataFrame, again chaining this all together and passing in the column title.
02:07
and inspect the data. As you can see, now the id
is the ID that comes with the dataset. It’s not incremental, but it’s the ID that came with the dataset.
02:19 You’re not exactly sure what this refers to, but perhaps it has some significance. There’s no real downside using it, so you might as well.
02:30
Now that you’ve tweaked the indices of the books dataset, in the next lesson, you’re going to get into cleaning the date
column.
Become a Member to join the conversation.