Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Applying the Same Strategies to NaNs

00:00 Now that you understand the difference between null values and NaN values, you’ll learn how to work with NaN values.

00:07 And you’ll realize the API is pretty much the same as with null values. You’ll start by manufacturing some NaN values to work with, and you’ll do this by first filling in the columns “total” and “tip”, so that instead of nulls you have the value zero.

00:26 And this will create some rows for which both total and tip is zero. And what you will do next is make sure that you compute something like the percentage of tip per order, because this division will force some NaN values to show up.

00:45 So you will divide the column “tip” by the column “total”, and you will call this the tip percentage. And you will see that some NaN values will show up.

00:56 For example, row four has a NaN value because zero divided by zero is a mathematically undetermined result. Now that you have some NaNs, you can see that the API for working with NaNs and nulls is pretty much the same.

01:10 For example, you can use the expression is_nan to check for the special value

01:18 NaN. For instance, you can take a look at the column tip percentage

01:27 and use the expression is_nan to get a Boolean mask identifying apologies. This was supposed to be tip percentage to identify the NaN values.

01:39 Now, unlike the null values, this is computed on the fly. The null mask, the validity mask, is stored in memory and so is_null is free in terms of computational resources. is_nan() is not free.

01:54 It has to be computed when you ask for it.

01:58 Now, to count NaN values, you do not have a dedicated function. You have to get there by using the functions or the expressions is_nan and sum().

02:14 So you would do, for example, to figure out how many NaNs you have in each column,

02:20 let’s do a select. Now you will refer to all columns with a float type because no other column type can have a NaN value. And then you check the values to figure out which ones are NaN, and then you sum that.

02:36 And this will give you a count of the NaN values in each column. And you can see that only the column “tip %” contains NaN values.

02:46 Now to fill NaNs, you use the same API, but in a different expression. So to fill NaN values, you use

02:57 the expression fill_nan, which works in the same way as the expression fill_null. So you can fill by using a constant value, by using an arbitrary expression, or by using a strategy, a fill strategy.

03:15 And in this example, you’ll just use a constant value.

03:21 Again, this in real life, if you are doing this operation, you need to make sure that it makes sense from the point of view of your data. But as an example here, you can replace all of the NaNs in the “tip %” with a zero.

03:38 And you can see, for example, row four where you have a total of zero in the tip of zero. Now the tip percentage is also zero. In your context, you need to make sure that this operation is actually reasonable, it makes sense. And finally, you can drop NaNs with the methods .drop_nans(), which works in the same way as the .drop_nulls() function or method.

04:10 So maybe I should present it this way to make it clear that we’re talking about these methods. So for example, if you type in tips.drop_nans() you will be dropping only two rows.

04:26 So from 180, you went to 178, and you drop the two rows that have a NaN value in the tip percentage column. And finally, a last tip might be that if you’re working with data that comes from a tool that does not distinguish the special value NaN and the null value, in those cases, it might make sense to replace all of the NaNs with nulls and then tackle everything in one go.

04:55 And you can easily replace NaNs with nulls by using the expression .fill_nan() and typing None. So this will replace all NaNs with nulls.

05:14 Alright, with this set, you’ve reached the end of this lesson and you’re almost at the end of this video course. So in the next lesson, you’ll just review everything you’ve learned, you’ll take a look at a summary, and you’ll be sent off to use your new found skills in the real world.

Become a Member to join the conversation.