The is Operator
00:00
In this lesson, I’ll take you through how to compare the identities of two objects with the is
and is not
operators.
00:09
So, why do I say that is
compares identity rather than equality? Well, in Python, or at least in CPython—the reference implementation of Python that you’re probably using—identity refers to the memory address at which an object is stored.
00:25 And you can think of this as kind of analogous to thinking about the identities of two people or something like that, right? Even if you have a pair of identical twins or two people who look really similar, they might be equal in many features, right?
00:39
They might have equally-sized noses and length of hair or something, but they’re still not the same person. They don’t have the same identity, right? The way that Python defines identity is simply by memory address, or ID number, and I’ll inspect that and show you how that works under the hood with the id()
function later in the terminal. Importantly, this is different from the variable name, which points to an object. Because you might have, you know, 10, 20, 50, a hundred, a million different variables, which all reference actually the same underlying object, like a list or a string or something like that. To continue the people analogy, you might have one person that goes by many different names, right?
01:21 You might have a person named Jonathan who goes by the name Jack with his friends, by the name John at work, and by the name Jonathan with his parents, right?
01:31
So you can have many different variable names, but you can only have one identity, and that is one memory address. With all that in mind, what should you actually use the is
operator for?
01:41
And the is not
operator, its counterpart? Well, you should use it to compare with None
, and many people in the Python community would go so far as to say that’s the only use case for it.
01:52
I don’t go quite that far, I think that sometimes it can be useful to compare the memory addresses of objects—maybe you’re debugging, maybe you have a very specific program that really needs to work with these memory addresses—but in general usage, you should use it to compare with None
pretty much exclusively.
02:07
Let’s take a look at how all this works in the terminal. Well, I’ll need some variables to operate on, so I’ll have a
, which has the content "This is a string"
, and then I’m going to have b
, and it’s actually also just going to say, "This is a string"
.
02:22
To illustrate the difference between equality and identity, first, let’s take a look at a == b
. This is True
because the two strings have the same value.
02:32
They both say "This is a string"
, right? So they’re equal to one another.
02:36
But if you say a is b
, you get False
because a
and b
were declared separately from one another, right? So they’re technically separate objects even though they have the same value.
02:50
And you can check this by saying id(a)
and id(b)
, and those are two different numbers, meaning they were instantiated at different times.
02:58
And be careful when you look at these IDs, because you might be tempted to think that because the ID of b
is larger, that means it was declared later than a
? This may or may not be true, and it’s not true that that’s always the case, so those ID numbers—there’s a complex algorithm under the hood for how to generate them, so don’t rely on anything about the ID number itself to tell you about the declaration pattern. But you might get confused if you do something like this.
03:28
You might say x = 20
, y = 20
. And in this case, of course, they’re equal. And if you do x is y
, you actually also get True
, and so that might confuse you if you’ve just watched the first part of this lesson, because you’d think, “Well, they’re declared separately, right?
03:44
So they should be different objects, so the is
relation shouldn’t be satisfied.” Well, this is because of a cool feature of the Python interpreter under the hood, which is called interning. And the numbers from -5
—and I’ll put this in a comment just so that you can see it—from -5
to 256
are interned by default. What does this mean?
04:06 It means that each value in this range has a distinct memory location that it occupies. And every variable that’s declared to be equal to that value—or assigned to that value is maybe more accurate to say—each of those variables actually just points to the same underlying memory address.
04:28
This is one of the ways that the Python interpreter optimizes because these numbers are pretty small and so they’re used really often in code, but if you did two integers and you said a = 257
and b = 257
and you said a is b
, that’s actually False
because those are not included in the interned numbers.
04:49 And this happens as well for various small strings that are often used, and it’s difficult to tell sometimes what those strings are, but they might be interned when they’re really frequently used.
04:59
You can see how this works by importing the function, import
—sorry, from sys
. That teaches me not to talk and code the same time. from sys import intern
.
05:13 So, those strings that we used earlier, let’s reinstantiate those because in a silly fashion, I wrote over them.
05:21
But if you have two strings a
and b
, "This is a string"
,
05:25
as I showed you earlier, the is
relationship is not satisfied. But if I say a = intern(a)
and b = intern(b)
, then all of a sudden the ID of a
, and the ID of b
are the same, and so a is b
is now True
.
05:43
And that’s because what the intern()
function does is exactly what the interpreter does—it interns this a
and puts it at a specific location in memory, and then any other variables which are declared or interned which have the same value as a
point to that exact same location in memory.
06:01 So this is an optimization tactic, and if you’re going to declare or work with many different variables, all of which have the same value, it might be useful to you to intern that value just so that you get a little more speed.
06:14
So, I’ve taken you through all this, but I haven’t yet shown you the actual use case that I told you was best for the is
and is not
operators, which is comparing with None
. I’ll show you that now.
06:25
So, a
is of course not equal to None
, so it’s False
that a is None
, and that’s because a
has a value "This is a string"
.
06:33
And so if I say a is not None
, then I get True
, right? But it might not be obvious why this is useful or why you might want to compare with None
. Well, I’ll come up with a contrived example and then I’ll leave you to kind of extrapolate. So, imagine you’re making a web crawler, and so you had a function that gave you a list of web addresses, and those were the web addresses maybe that had a certain picture on them or something.
06:59
You’re searching the web for a particular picture and you want all of the addresses which had that picture. So I’ll say address1
—let’s actually make them strings. ["address1.com", "address2.net"]
—but you might very well, when you’re crawling the web, sometimes you can’t get a response from a particular website.
07:20
So this will probably have some None
values in it. And then you’ll have "address3.org"
.
07:27
You definitely have a None
in here, and so if you just try to say for address in address:
,
07:35
and then maybe you want to do some string manipulation on it, right? Maybe you say print(address.split())
on the period ("."
), right?
07:44
You want to split those up so that you can get both the domain name and then the ".com"
, ".net"
reference there. So this would be great except—oh, I’m sorry. I said for address in address
, so be careful typing and coding—or, talking and coding at the same time is a very dangerous game. But the error that I wanted to show you was this 'NoneType' object has no attribute 'split'
.
08:10
So if you’re trying to operate on all of these things but some of them are None
because your other function had to return None
in some cases, then you’re going to run into issues.
08:21
So what you need to do instead is say for address in web_addresses:
if address is not None:
then you know you’re safe to print the address.split(".")
, and now you get exactly what I wanted, which was the actual address part of it and then the "com"
, "net"
, or suffix of the web address.
08:42
So, that’s one example, and this is a problem that comes up a lot when you’re working with real-world data, is that you can’t always get the data that you want and sometimes you have to have things like None
values, NoneType
in there to kind of fill the empty space.
08:56 But often, you’ll want to either ignore them or treat them in a special fashion, and so you have to compare. And so I’ll just do one more quick example.
09:03
I’ll add in an else
clause. You could print, you know, "Web address not reachable"
.
09:13
Right? You can just say something like that, that might be nice. You have one here, Web address not reachable
. So you know at least that this one address, and of course you would have to do some more work so that you know what the actual thing that you wanted to reach was. But regardless of all that, that’s a use case for this is
and is not None
comparison, which is really, for most cases, the only use case that you need to use this is
and is not
operator for. All the stuff with memory addresses is super cool and super interesting and it’s fun to think about why it is the way it is, but when you’re doing kind of casual or more general purpose programming, you’ll probably just need the is
operator to compare with None
.
Liam Pulsifer RP Team on April 3, 2020
@jamesbrown68 hmm, that’s really interesting! I don’t get that behavior with my REPL, and interning is generally only preserved over each invocation of Python, so I’m not sure what’s causing that to happen. Out of curiosity, what OS and REPL are you running?
jamesbrown68 on April 4, 2020
I’m on Windows 10, Python 3.8.1.
python
>>> x = 20
>>> id(x)
1617684704
exit()
python
>>> x = 20
>>> id(x)
1617684704
Liam Pulsifer RP Team on April 9, 2020
Wow @jamesbrown68, I have to admit this one stumps me. I wonder if there’s some system setting in your OS that’s causing Python to run in exactly the same way every time?
Vedang Joshi on April 10, 2020
Wondering why I’m getting this output?
[root@vedang]# python
Python 2.7.5 (default, Aug 7 2019, 00:51:29)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = "hello"
>>> b = "hello"
>>> a is b
True
>>> a == b
True
>>> id(a)
140575040349312
>>> id(b)
140575040349312
>>>
Vedang Joshi on April 10, 2020
the interning feature seems random?
Dan Bader RP Team on April 10, 2020
“Interning is an implementation-dependent optimization that depends on many factors. It can be interesting to understand how it works, but never depend on it working any particular way.” (Source)
lordcommandermay on April 30, 2020
i have not been able to recreate the above example.
a='bro!'
b='bro!'
print(a is b)
a='bro '
b='bro '
print(a is b)
True
True
[Program finshed]
lordcommandermay on April 30, 2020
nevermind
a='bro'
b='bro'
print(a is b)
a=a+'!'
b=b+'!'
print(a is b)
True
False
[Program finished]
lordcommandermay on April 30, 2020
nevermind
a='bro'
b='bro'
print(a is b)
a=a+'!'
b=b+'!'
print(a is b)
True
False
[Program finished]
lordcommandermay on April 30, 2020
is this use case safe?
if a is True:
Ricky White RP Team on May 1, 2020
Hi @lordcommandermay,
Whether it’s safe, depends on what you’re doing within in if
statement. But checking if an object is True
is common in Python. However, if you are checking to see if the presence of the object a
, then you can shorten it to just: if a:
. If a is a boolean, and you want to check it’s truthiness, then you should use ==
instead of is
. Hope that helps.
Become a Member to join the conversation.
jamesbrown68 on March 31, 2020
About the interned numbers (am I spelling that right?) I noticed that the id’s for ‘x = 20’ was the same, even after I exited Python and started a new REPL. I was expecting the assignments to occur when the REPL started, but I suppose not. So what’s determining the id’s for -5 to 256? My OS?