Small Integer Caching

Variables in Python Martin Breuss 04:11

What is small integer caching? Python caches small integers, which are integers between -5 and 256. These numbers are used so frequently that it’s better for performance to already have these objects available. So these integers will be assigned at startup. Then, each time you refer to one, you’ll be referring to an object that already exists.

00:00 I’ll put it for you under the title of small integer caching. Okay? Keep that in mind. And let’s take a look at what’s happening here.

00:09 If I say a = 30 and b = 30, so same thing as before, but we’re using 30 instead of 300. What we would expect is a == b to be True, which it is.

00:24 And the id(a) == id(b), remember that this points to the location of the object in memory, so we would expect that to be False. However, in Python this is True.

00:37 And that’s kind of weird and could trip you up at the beginning if you don’t know about small integer caching, which is the solution of “Why is this happening?” And very simply, Python caches small integers. So, what’s a small integer? In Python, a small integer is anything between -5 and 256.

00:55 Everything in there gets cached. Let’s talk about what that means. I’ll roll it up again from the beginning with a couple of slides.

01:04 So, what we did is we assigned n and m both to the value 300, but they were both pointing to different objects, right?

01:12 So n was pointing to an integer object 300 and m was pointing to a new, different integer object with the same value, also 300.

01:23 So that’s why if we said n == m we would get True, same value, and id(n) did not equal id(m). You see? It’s two different objects.

01:34 Now, what happened when I said n = 30 and m = 30 as well—so, same type of assignment? Python actually is pointing us to the same object.

01:45 The reason being this cache, the small integer cache. At startup of an interpreter session, Python simply creates the objects for all of the small integers from -5 up to 256. It already creates these integer objects simply because these numbers are used so frequently that it’s better for performance to already have the objects there and just make it every time refer to the same object.

02:10 So this is why id(a) and id(b)—in this case, because it’s 30, which is smaller than 256—is actually pointing to the exact same object in memory. We can check on its number, in that case.

02:26 This is its location in memory. It’s going to be different when I start off a new IPython interpreter session because these integer objects get created on startup.

02:37 So, this is interesting and important to remember, because what I noted here is that essentially what happens is the same as if we would assign m to n. We saw this graphic before—n pointing to 30 and m pointing to 30—when we did this type of assignment.

02:54 And this is what happens with small integers even if we assign them on different lines and you would expect them to create different objects.

03:02 So, important to remember: any integer from -5 to 256 is going to be just assigned at startup, and every time you would refer to it in your program, it’s going to refer to an already existing object.

03:15 Anything above that, 257 upwards—it’s going to create new integer objects. So that’s a bit weird, but it’s a fun thing to know and it’s going to help you if you ever want to be in a Python pub quiz—which is something that happened to me at the PyCon! We got this question, exactly this question, and we had a couple of minutes to solve it.

03:37 So I want to give this to you as a challenge, because it’s a fun thing to think about. And there’s also a nice explanation that is going to be linked down here under the video. So check it out, but give it a try first and see if you can solve this.

03:49 The question is: what is a going to be? Without running the program, can you figure out what is a going to be? Just try to think through it, the different steps in this for loop, and keep in mind what happens with small integers in Python. I hope you’ll have fun. And don’t check the solution too quickly—just give it a go.

Victor Amadi on Dec. 11, 2019

On my system, even when using larger integer values a and b both equate True. Maybe the video can be updated?

John DB on Dec. 11, 2019

I get 257 which is my first guess - because that’s the first “non-cached” number outside the pre-cached range of -5…256.

But Viktor has a point - the video should be updated to show the answer, in case we make typos and get a misleading and misunderstood result.

Also - is “small integer caching” a fixed and guaranteed facet of the Python language for all time, on which developers can make assumptions? Or is it an implementation quirk of C-python that might not occur in other implementations like Jython?

I found this snippet below useful to reassure me that Python automatically auto-vivifies a new object as soon as a variable’s value changes. Chaos otherwise!

x, y = 3000, 3000
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)

Result:

x==y? True  :  x is y? True
x==y? True  :  x is y? False

Albrecht on Dec. 12, 2019

My system (Windows 10, Python 3.7 with IDLE out of the same package) gives the expected results:

Quiz: 257 John DB’s code from Dec. 11, 2019:

x==y? True  :  x is y? False
x==y? True  :  x is y? False

So it’s not a problem of updating. Where does this different behavior come from?

John DB on Dec. 12, 2019

I confirm Albrecht’s claim!

On an old MS-Windows 7 VM, I installed the latest Miniconda 32-bit with Python 3.7.4.

Code

""" test """
str = "/usr/bin/ls"
print ("-> file:", str)

x, y = 3000, 3000
print("id_x:", id(x), ", id y:", id(y))
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)
print("id_x:", id(x), ", id y:", id(y))

Launch IDLE from MS-DOS (aka cmd or prompt or something else):

(base) C:\Users\JDB> python --version
Python 3.7.4

(base) C:\Users\JDB> idle
... editor appears ...

IDLE run shows this:

Python 3.7.4 (default, Aug  9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> 
=== RESTART: C:/Users/JDB/work/2019-12.Win_Py/is_it.py ===
-> file: /usr/bin/ls
id_x: 32776752 , id y: 32776768
x==y? True  :  x is y? False
x==y? True  :  x is y? False
id_x: 32776704 , id y: 32776784

** What gives? **

I suspect that this MS-Windows implementation doesn’t perform the optimization of sharing objects on that first assignment: the first id(x) and id(y) show different numbers (addresses) when run on MS-Windows, but the same addresses on my MacOS C-Python 3.8.0.

I therefore assume that programmers shouldn’t depend on this optimization - but be aware that it might exist, so be careful with “is” comparisons and initialization assumptions.

Albrecht on Dec. 13, 2019

@John DB:

I therefore assume that programmers shouldn’t depend on this optimization - but be aware that it might exist, so be careful with “is” comparisons and initialization assumptions.

I fully agree with your conclusion. “Small integers caching” looks more like a bug than a feature ;-)

John DB on Dec. 13, 2019

<like>

Martin Breuss RP Team on Dec. 14, 2019

Hello @John DB and @Albrecht!

The code snippet @John DB should never return True, since the values you’re starting out with (3000) are much higher than the cached numbers from -5 to 256.

I’ve double-checked the code also on my MacOS machine, and the results are that the ids of both integer objects are different for both cases (3000 and 3001) just as @Albrecht reported back as well. @John DB can you double-check what was the exact code you ran when you received the surprising result?

As in when will small integer caching occur–I know it’s part of the C-Python implementation but might be different in e.g. Jython.

John DB on Dec. 15, 2019

I double-checked and confirmed my original result. (This is done within Pycharm, on MacOS Mojave.)

Can I post a picture here to show the screenshot? Otherwise I repeat the test-case here:

""" test """
str = "/usr/bin/ls"
print ("-> file:", str)

x, y = 3000, 3000
print("[1] id(x):", id(x), ", id(y):", id(y))
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)
print("[2] id(x):", id(x), ", id(y):", id(y))

Result:

/usr/local/bin/python3.8 /Users/jdb/Library/Preferences/PyCharm2019.3/scratches/scratch.py
-> file: /usr/bin/ls
[1] id(x): 4366603504 , id(y): 4366603504
x==y? True  :  x is y? True
x==y? True  :  x is y? False
[2] id(x): 4366603728 , id(y): 4366603696
Process finished with exit code 0

Actually, if optimization is involved and the “is” case works in some cases, why not assume the interpreter might (or might not) do it in other situations as well - and unpredictably?

Martin Breuss RP Team on Dec. 15, 2019

Wow @John DB! This is still a very surprising result so I tried a couple of things to reproduce it:

Python 3.7

Running your code in the Python Interpreter of PyCharm on MacOS Mojave 10.14.3 using Python 3.7:

>>> x, y = 3000, 3000
...print("[1] id(x):", id(x), ", id(y):", id(y))
...print("x==y?", x==y, " :  x is y?", x is y)
...x+=1
...y+=1
...print("x==y?", x==y, " :  x is y?", x is y)
...print("[2] id(x):", id(x), ", id(y):", id(y))

Returns for me the expected results:

[1] id(x): 4584244528 , id(y): 4584245712
x==y? True  :  x is y? False
x==y? True  :  x is y? False
[2] id(x): 4584244272 , id(y): 4584244912

Same when saving your code as a file test.py and running it from the CLI:

(variables) ➜  variables python --version
Python 3.7.4
(variables) ➜  variables python test.py 
[1] id(x): 4458048272 , id(y): 4458046544
x==y? True  :  x is y? False
x==y? True  :  x is y? False
[2] id(x): 4457285360 , id(y): 4457284496

Python 3.8

So, I also installed 3.8 just to make sure, not really expecting a difference since that would be a very strange change. But indeed:

(variables) ➜  variables python --version
Python 3.8.0
(variables) ➜  variables python test.py 
[1] id(x): 4502927920 , id(y): 4502927920
x==y? True  :  x is y? True
x==y? True  :  x is y? False
[2] id(x): 4502927472 , id(y): 4502927152

My mind is blown! Let’s consult the folks who did the articles on changes in Python 3.8 @Geir Arne Hjelle and @Chris Bailey!

John DB on Dec. 15, 2019

@Dan Bader:

FYI: Another way to build or burnish your Pythonista credentials: Jump into the Real Python tutorials, run and test the given examples including corner cases… then if a RealPythonista(tm) says you’ve blown his mind, you know you’re on the right track! :-)

I almost skipped this “variables” course, thinking it was far too basic - but decided to spend a bit time here anyway, thinking there might be useful insight into subtle areas. Sure enough!

Geir Arne Hjelle RP Team on Dec. 15, 2019

First of all, I guess this is an example as good as any as to why you should use == and not is to do comparisons (with ... is None as the main exception) :) Reuven Lerner has a nice article exploring this in more detail: lerner.co.il/2015/06/16/why-you-should-almost-never-use-is-in-python/

I haven’t been able to track down the full explanation of what is happening, but here are a few discoveries and theories.

First of all, small integers (-5 through 256) are interned at start-up - as covered in Martin’s video
Next, Python can also reuse objects during compilation. When running in a REPL each line is compiled separately, while a Python script is compiled as a whole.

This explains that you get different results for the following code snippets:
```
>>> x = 3000
>>> y = 3000
>>> x is y
False
```
In the REPL each line is compiled separately, so x and y point to different objects. However, in a script, everything is compiled together, so x and y point to the same object:
```
$ cat > x_is_y.py
x = 3000
y = 3000
print(x is y)

$ python x_is_y.py 
True
```
You can see the same effect by defining a function in the REPL:
```
>>> def f():
...     x = 3000
...     y = 3000
...     return x is y
... 
>>> f()
True
```
The function is compiled in one go, so again x and y point to the same object.
Using semicolons is another way to force the REPL to compile several statements at once, so the following gives the (by now?) expected result:
```
>>> x = 3000; y = 3000
>>> x is y
True
```
Now things are about to get weird, though. Using tuple unpacking to assign several variables should also force the REPL to compile the objects at the same time (which I assume it does), and reuse the 3000-object. However:
```
>>> x, y = 3000, 3000
>>> x is y
False
```
It turns out that this last result–as has been noted earlier–is very dependent on Python version. As far as I’m able to tell, this only is False on Python 3.7.

Here I’m using Docker to test this out on many different versions of Python:
```
$ docker run --rm -it python:2.7.17-slim python -c "x, y = 3000, 3000; print (x is y)"
True

$ docker run --rm -it python:3.6.9-slim python -c "x, y = 3000, 3000; print(x is y)"
True

$ docker run --rm -it python:3.7.0a3-slim python -c "x, y = 3000, 3000; print(x is y)"
True

$ docker run --rm -it python:3.7.0a4-slim python -c "x, y = 3000, 3000; print(x is y)"
False

$ docker run --rm -it python:3.7.5-slim python -c "x, y = 3000, 3000; print(x is y)"
False

$ docker run --rm -it python:3.8.0a1-slim python -c "x, y = 3000, 3000; print(x is y)"
True
```
So, it seems that there was some change introduced in Python 3.7 alpha 4 that made the unpacking work differently. Then, that behavior was reverted again for the whole Python 3.8 series.

The changes in Python 3.7 alpha 4 are listed at docs.python.org/3/whatsnew/changelog.html#python-3-7-0-alpha-4 I haven’t been able to see where this can have happened, although bugs.python.org/issue30416 may be one candidate? I don’t really have a clue as to where it may have been reverted though. The changelog for Python 3.8 alpha 1 is humongous: docs.python.org/3/whatsnew/changelog.html#python-3-8-0-alpha-1
For a final mind-bender, this is what I see when using IPython, seemingly for all versions of Python:
```
In [1]: x = 3000; y = 3000

In [2]: x is y
Out[2]: False
```
It seems that IPython treats semicolon-expressions as completely separate expressions and sends them off to the interpreter one by one?

Running the unpacking examples in IPython (x, y = 3000, 3000) seem to be consistent with the Python version differences seen above.

hikerguy on Jan. 7, 2020

I just ran the last piece of code in PyCharm and got the opposite result:

x = 3000; y = 3000 print(x is y)

returns True

So, what is the general consensus here? It doesn’t appear that the -5 to 256 rule applies.

hikerguy on Jan. 7, 2020

Well, to confuse things more, I just ran “Pub quiz” presented earlier and it stops at 257 (in PyCharm). Not sure if that helps, but hopefully somoeone can clear the air on this matter.

Thanks,

Andy

Geir Arne Hjelle RP Team on Jan. 7, 2020

I’d say the general consensus is that you should use == when comparing numbers (and most other things) and not is ;)

The -5 to 256 rule does apply, but Python does other optimizations that may intern other numbers as well. In most REPLs, one such optimization is that repeated numbers separated by semicolon end up point to the same object - as you note.

There are differences though, both between different REPLs and between different versions of Python. In my earlier comment above, I show some of these differences.

Martin Breuss RP Team on Jan. 7, 2020

Thanks for the in-depth analysis and the one-liner takeaway @Geir Arne Hjelle :)

use == when comparing numbers (and most other things) and not is

Here’s again the link to Reuven Lerner’s blog post on this topic (for everyone who scrolled straight to the bottom) :)

keyurratanghayra on April 26, 2020

Hi There,

This is weird but if I am using python3 repl, this observation stands true.

a = 3000 b = 3000 a ==b True id(a) == id(b) False`

But Pycharm tells a different story: Both the results are true in PyCharm. Any poniters?

Martin Breuss RP Team on April 27, 2020

Hello @keyurratanghayra. Some results here can be surprising due to a couple of factors. Did you read over the comments on this page? What are your results in the different environments (e.g. which Python version is each of them using, etc.)?

Harsh Chaklasiya on May 3, 2020

a = 3400
b = 3400
c = a+1
d = b+1
print(a == b)
print(id(a) == id(b))

I did this and I got True, False in Pycharm. MacOS Catalina!

Martin Breuss RP Team on May 9, 2020

That makes sense, since both of the integers are much higher than 256, which means that Python will refer to different integer objects. Double-check this course section if it’s unclear to you why you got this output.

DoubleA on Jan. 21, 2021

Hi there. Running the same code as in the video in a) VS code and in b) cmd and getting different results :)

microb1tch on Dec. 5, 2021

For the “pub quiz,” I figured that the final print(a) statement would output 257, which is what I got when I ran the code in interpretive Python.

Interestingly enough, it seems that when I assign multiple variables to the same value (where value > 256), using one line on the interpreter, something interesting happens - Python seems to create one object, to which both variable names refer. Example code:

x = 3000; y = 3000
x == y
True
x is y
True
id(x)
37560752
id(y)
37560752

This seems to be the same phenomenon others have observed. Maybe this is yet another optimization?

If I assign the values separately, I see the behavior I would expect given what I learned in the video - Python creates two different objects with the same value. Example code:

x = 3000
y = 3000
x == y
True
x is y
False
id(x)
37451488
id(y)
36512832

Incrementing the value for each variable by 1 does what I expect, whether I make the assignment on one line, or do so separately. In either case, Python creates new, unique objects for each incremented value.

microb1tch on Dec. 5, 2021

Re: my last point about incrementing by 1, I checked whether new/unique objects were created by using the equality (==) and identity (is) operators. The equality operator returned True, but the identity operator returned False, as expected.

I also checked this using the id() function. After the initial assignment statement, id() indicated that both variables referred to the same memory location (ie. they referred to the same object). After incrementing the variables, id(x) and id(y) returned unique numbers, suggesting that the incremented values were new, unique objects.

So it seems that if you do multiple assignment statements w/ the same value all at once (regardless of whether the integer is greater or less than 256), you create one object in one memory location, which has multiple references to it. But when you manipulate that object (eg. by incrementing its value), Python will create new objects for the changed values (whether you change the values all at once on one interpreter line, or do so on multiple lines w/ separate statements), which occupy different locations in memory than the original object.

It seems that Python copies the original integer object to a new location in memory when you manipulate it, rather than overwriting the area in memory it occupied. Presumably, the original value(s) (eg. the integer object 3000 that was created w/ the initial assignment statement in my first code example above) get taken out with the trash when Python collects the garbage, since they no longer have any references to them.

Martin Breuss RP Team on Dec. 6, 2021

Great investigative research on this topic @RheaRevolver and your conclusions all sound correct as far as my current knowledge of Python internals goes! :)

You might enjoy reading through the CPython Internals Book!

Bartosz Zaczyński RP Team on Dec. 6, 2021

@RheaRevolver That’s correct. I briefly touched on it in a tutorial about the bitwise operators in Python.

jyotirmoyr1 on Feb. 11, 2022

Hi, For me in: Ubuntu 20..4.3 LTS Python version- 3.9.7 The result for below code is:

x=21
y=20
print('Is x==y, where x=21 and y=20 :', x==y) 
## results False
print('is id(x)==id(y),where x=21 and y=20  :',id(x)==id(y),id(x),id(y)) 
## Results False 
x=900
y=900
print()
print('Is x==y, where x=900 and y=900 :', x==y)
## results True
print('is id(x)==id(y),where x=900 and y=900  :',id(x)==id(y),id(x),id(y))
## results True

After going through the discussion list it seems that the results differ depending on Python version/Platform and the editor getting used.

Thanks, JR

jyotirmoyr1 on Feb. 11, 2022

More on this, the fun is when code written in a editor like VScode and running on terminal window giving result as True for bigger integers, but when running from Python interpreter it self the result is False. Why??

Thanks, JR

Kumar Abhishek on Aug. 28, 2022

@Martin the id’s for any variables storing any range of values are just the same.

Martin Breuss RP Team on Sept. 13, 2022

@Kumar Abhishek I’m not sure what you mean with this:

>>> a = range(100)
>>> b = range(200)

>>> id(a)
4375712320

>>> id(b)
4376351824

>>> id(a) == id(b)
False

Also not quite sure how it relates to the lesson. Could you try to explain your comment a bit more?

Valdemar on Dec. 22, 2022

I wanted to test out RheaRevolver’s point, so I ran this program:

a, b = 300, 300
c = 300
print(a is b is c)
print(id(a),id(b), id(c))


for _ in range(250, 260):
    if a is not b:
        break
    a += 1
    b += 1

print(a is b)
print(id(a),id(b))
print(a)

d = 300

print(id(c) is id(d))
print(id(c), id(d))

And it does seem, that a, b and c all point to the same object, but d points a different object, even though c remains unchanged. At least print(id(c) is id(d)) returns False. So it seems that assigning the same value at the same time means pointing to the same object, but assigning the same value at different times, points to a different object. However the final print statement returns the same address for the two objects.

I thought that was a little strange, so I wanted to share it. Does ‘is’ not compare the two addresses?

Become a Member to join the conversation.