Exploring the Books Dataset
In this lesson, you’ll be moving on and doing some initial exploration of the books dataset, the third and final dataset of this course. This data is typical of what a library might have: title, author, place of publication, year of publication, that type of thing. However, in this dataset, there are many
NaN values stand for not a number, which is a default value that pandas has when it can’t read the value. You also have many columns that are pretty much full of these
NaN values, so you don’t really need them.
working away … and it’s done. So now, if you look at
books.head(), Control + Enter to run it, and there you go: it’s done pretty well by itself, but as you can see, there are lots of
There are also these extra bits to a lot of the columns. As you can see, most of the values are just one place, but then this has some extra information that really is just noise in this case, and you’ll want to clean that up. Likewise, with date of publication, there are lots of these square bracket (
) things going on.
The first thing you want to do is do some renaming with the
.rename() method. Now you can do it like you did before, where, say, you have
Edition Statement, and you pass in a mapper object where you’re—oops, you have to write in here,
columns =, and then you pass it a map of a dictionary with the existing title and the title that you want.
And then take a look at this. And as you can see, it’s renamed the column
Edition Statement into one with snake case, with no spaces and all lowercase. However, since all the columns are actually quite well named, and you don’t really want to change any of them, you just want them all to be in this sort of snake case format, you can actually pass in a
lambda function here,
and this will pass each header into this function, and then you can transform it and return it however you want. Since it’s a string, you could call the
.lower() method on it, which will send it to lowercase, and then you can call the
.replace() method on the result of that, and you can replace any space with an underscore (
So now let’s try running that. Up (↑) to get the last command. Control + Enter. And there you go. All the titles are now lowercase. To see that more clearly, you can call on the
.columns attribute of the DataFrame to see all the column names, and as you can see, they’re all in a nice snake case format. However, there’s still one that you’d probably like to be shorter.
04:17 Running here … and now let’s look at the columns here. Okay, so now we have all our columns renamed. You’ll notice that this is being reformatted to this sort of method-chaining style. It’s wrapped in parentheses, which allows this to be on separate lines, because usually it would have to be sort of chained on here.
04:42 And this is the typical way you’ll see a lot of pandas code written in this sort of long chain, which at the start is great because it just shows you every step that’s needed to clean the data before you actually doing any science on it.
Become a Member to join the conversation.