Understanding the Difference Between NaN and null
00:00
You’re now familiar with the tools that Polars provides for you to be able to work with missing data. And now you need to learn about a different special value that can be really annoying, but that isn’t the same as null, and that is the NaN
or NaN
, which stands for not a number.
00:20 This is a very special value that Polars treats differently from the null value because they are different things. And in this lesson, you will learn about these differences.
00:32 So the most important thing, or one of the most important things you need to understand right away, is that the null value is the special value that Polars uses to represent missing data, and this value can show up in columns of all data types.
00:46
For example, in your data tips, you’ve seen null
in float
columns, but also in string
columns, which were the columns “tip”, “total”, and “time”.
00:57
And it’s always the same null
value that represents this missing data. Whereas this NaN
, this Nan
, this is a value that can only show up in float
columns.
01:09
So NaN
is a special float
value, and this is very important to understand.
01:16
And the semantic meaning of these two values is very distinct. Again, I really want you to grasp the difference here. So I’m going to say it again. The null
value represents a missing value.
01:29
You can think of it kind of like the None
in Vanilla Python, whereas the NaN
, NAN
, represents a mathematically undetermined operation.
01:40 It’s not strictly true, but intuitively you can think of it as the value that shows up when math blows up in your face. And you’ll see an example in just a second.
01:51 And the only reason you are going through this, and the only reason you’re being presented with this in a course about missing values is that while Polars enforces this distinction quite strictly, not all tools do.
02:06
So if you’re dealing with data that came from other tools, you might encounter situations where your data has these NaN
values, the NaN
, and they were supposed to represent missing data.
02:19
But in Polars, you will not get this behavior because Polars supports the value NaN
, and it interprets it as a different thing, not a missing value, just a special value.
02:32
So be wary if you’re importing data from other libraries, from other tools. Make sure that you check for this. And in the next lesson, you’ll also learn how to work with these NaNs
.
02:42
For now, I just want you to see an example of a NaN
showing up.
02:47 You can go ahead and open Jupyter Notebook. It can be the notebook you’ve been working on, it can be a new notebook. It doesn’t matter a lot. And what you will see now is that you can create a Series with strings.
03:02
And if you insert the None
value in Python,
03:06
this represents the missing data. Instead of a string, you put a None
there and you get a null
in Polars lens. And this shows you that, for example, in string column, you can have the null
, but the NaN
this you get in float
columns.
03:23
For example, if you create a column with a couple of floating-point values, and then you type in None
, this will result in a null
value because this is a missing value, it’s not a float.
03:37
But you can also make a NaN
, NaN
, show up if you create it explicitly with the built-in float()
and the string, "nan"
.
03:48
This forces the NaN
to show up. And these are two different values, and Polars represents them differently. And if you take this Series that you just created, and if you count the NaNs
, you will see that you have exactly one null value, which is the third value, because NaN
is not null.
04:09
And finally, as an example of how a NaN
might show up after a mathematical computation, for example, if you divide zero by zero, you get a NaN
.
04:21 So some arithmetic operations might result in the special value.
04:25 So now that you understand the special value, in the next lesson, you will learn how to work with it.
Become a Member to join the conversation.