Applying the Same Strategies to NaNs
00:00
Now that you understand the difference between null
values and NaN
values, you’ll learn how to work with NaN
values.
00:07
And you’ll realize the API is pretty much the same as with null
values. You’ll start by manufacturing some NaN
values to work with, and you’ll do this by first filling in the columns “total” and “tip”, so that instead of nulls you have the value zero.
00:26
And this will create some rows for which both total and tip is zero. And what you will do next is make sure that you compute something like the percentage of tip per order, because this division will force some NaN
values to show up.
00:45
So you will divide the column “tip” by the column “total”, and you will call this the tip percentage. And you will see that some NaN
values will show up.
00:56
For example, row four has a NaN
value because zero divided by zero is a mathematically undetermined result. Now that you have some NaN
s, you can see that the API for working with NaN
s and null
s is pretty much the same.
01:10
For example, you can use the expression is_nan
to check for the special value
01:18
NaN
. For instance, you can take a look at the column tip percentage
01:27
and use the expression is_nan
to get a Boolean mask identifying apologies. This was supposed to be tip percentage to identify the NaN
values.
01:39
Now, unlike the null values, this is computed on the fly. The null mask, the validity mask, is stored in memory and so is_null
is free in terms of computational resources. is_nan()
is not free.
01:54 It has to be computed when you ask for it.
01:58
Now, to count NaN
values, you do not have a dedicated function. You have to get there by using the functions or the expressions is_nan
and sum()
.
02:14
So you would do, for example, to figure out how many NaN
s you have in each column,
02:20
let’s do a select. Now you will refer to all columns with a float type because no other column type can have a NaN
value. And then you check the values to figure out which ones are NaN
, and then you sum that.
02:36
And this will give you a count of the NaN
values in each column. And you can see that only the column “tip %” contains NaN
values.
02:46
Now to fill NaN
s, you use the same API, but in a different expression. So to fill NaN
values, you use
02:57
the expression fill_nan
, which works in the same way as the expression fill_null
. So you can fill by using a constant value, by using an arbitrary expression, or by using a strategy, a fill strategy.
03:15 And in this example, you’ll just use a constant value.
03:21
Again, this in real life, if you are doing this operation, you need to make sure that it makes sense from the point of view of your data. But as an example here, you can replace all of the NaN
s in the “tip %” with a zero.
03:38
And you can see, for example, row four where you have a total of zero in the tip of zero. Now the tip percentage is also zero. In your context, you need to make sure that this operation is actually reasonable, it makes sense. And finally, you can drop NaN
s with the methods .drop_nans()
, which works in the same way as the .drop_nulls()
function or method.
04:10
So maybe I should present it this way to make it clear that we’re talking about these methods. So for example, if you type in tips.drop_nans()
you will be dropping only two rows.
04:26
So from 180, you went to 178, and you drop the two rows that have a NaN
value in the tip percentage column. And finally, a last tip might be that if you’re working with data that comes from a tool that does not distinguish the special value NaN
and the null
value, in those cases, it might make sense to replace all of the NaN
s with null
s and then tackle everything in one go.
04:55
And you can easily replace NaN
s with nulls
by using the expression .fill_nan()
and typing None
. So this will replace all NaN
s with nulls.
05:14 Alright, with this set, you’ve reached the end of this lesson and you’re almost at the end of this video course. So in the next lesson, you’ll just review everything you’ve learned, you’ll take a look at a summary, and you’ll be sent off to use your new found skills in the real world.
Become a Member to join the conversation.