# Small Integer Caching

Copied!
Happy Pythoning!

What is small integer caching? Python caches small integers, which are integers between -5 and 256. These numbers are used so frequently that it’s better for performance to already have these objects available. So these integers will be assigned at startup. Then, each time you refer to one, you’ll be referring to an object that already exists.

On my system, even when using larger integer values a and b both equate True. Maybe the video can be updated?

John DB

I get 257 which is my first guess - because that’s the first “non-cached” number outside the pre-cached range of -5…256.

But Viktor has a point - the video should be updated to show the answer, in case we make typos and get a misleading and misunderstood result.

Also - is “small integer caching” a fixed and guaranteed facet of the Python language for all time, on which developers can make assumptions? Or is it an implementation quirk of C-python that might not occur in other implementations like Jython?

I found this snippet below useful to reassure me that Python automatically auto-vivifies a new object as soon as a variable’s value changes. Chaos otherwise!

x, y = 3000, 3000
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)

Result:

x==y? True  :  x is y? True
x==y? True  :  x is y? False

Albrecht

My system (Windows 10, Python 3.7 with IDLE out of the same package) gives the expected results:

Quiz: 257 John DB’s code from Dec. 11, 2019:

x==y? True  :  x is y? False
x==y? True  :  x is y? False

So it’s not a problem of updating. Where does this different behavior come from?

John DB

I confirm Albrecht’s claim!

On an old MS-Windows 7 VM, I installed the latest Miniconda 32-bit with Python 3.7.4.

Code

""" test """
str = "/usr/bin/ls"
print ("-> file:", str)

x, y = 3000, 3000
print("id_x:", id(x), ", id y:", id(y))
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)
print("id_x:", id(x), ", id y:", id(y))

Launch IDLE from MS-DOS (aka cmd or prompt or something else):

(base) C:\Users\JDB> python --version
Python 3.7.4

(base) C:\Users\JDB> idle
... editor appears ...

IDLE run shows this:

Python 3.7.4 (default, Aug  9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)] on win32
>>>
=== RESTART: C:/Users/JDB/work/2019-12.Win_Py/is_it.py ===
-> file: /usr/bin/ls
id_x: 32776752 , id y: 32776768
x==y? True  :  x is y? False
x==y? True  :  x is y? False
id_x: 32776704 , id y: 32776784

** What gives? **

I suspect that this MS-Windows implementation doesn’t perform the optimization of sharing objects on that first assignment: the first id(x) and id(y) show different numbers (addresses) when run on MS-Windows, but the same addresses on my MacOS C-Python 3.8.0.

I therefore assume that programmers shouldn’t depend on this optimization - but be aware that it might exist, so be careful with “is” comparisons and initialization assumptions.

Albrecht

@John DB:

I therefore assume that programmers shouldn’t depend on this optimization - but be aware that it might exist, so be careful with “is” comparisons and initialization assumptions.

I fully agree with your conclusion. “Small integers caching” looks more like a bug than a feature ;-)

John DB

<like>

Martin Breuss RP Team

Hello @John DB and @Albrecht!

The code snippet @John DB should never return True, since the values you’re starting out with (3000) are much higher than the cached numbers from -5 to 256.

I’ve double-checked the code also on my MacOS machine, and the results are that the ids of both integer objects are different for both cases (3000 and 3001) just as @Albrecht reported back as well. @John DB can you double-check what was the exact code you ran when you received the surprising result?

As in when will small integer caching occur–I know it’s part of the C-Python implementation but might be different in e.g. Jython.

John DB

I double-checked and confirmed my original result. (This is done within Pycharm, on MacOS Mojave.)

Can I post a picture here to show the screenshot? Otherwise I repeat the test-case here:

""" test """
str = "/usr/bin/ls"
print ("-> file:", str)

x, y = 3000, 3000
print("[1] id(x):", id(x), ", id(y):", id(y))
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)
print("[2] id(x):", id(x), ", id(y):", id(y))

Result:

/usr/local/bin/python3.8 /Users/jdb/Library/Preferences/PyCharm2019.3/scratches/scratch.py
-> file: /usr/bin/ls
[1] id(x): 4366603504 , id(y): 4366603504
x==y? True  :  x is y? True
x==y? True  :  x is y? False
[2] id(x): 4366603728 , id(y): 4366603696
Process finished with exit code 0

Actually, if optimization is involved and the “is” case works in some cases, why not assume the interpreter might (or might not) do it in other situations as well - and unpredictably?

Martin Breuss RP Team

Wow @John DB! This is still a very surprising result so I tried a couple of things to reproduce it:

Python 3.7

Running your code in the Python Interpreter of PyCharm on MacOS Mojave 10.14.3 using Python 3.7:

>>> x, y = 3000, 3000
...print("[1] id(x):", id(x), ", id(y):", id(y))
...print("x==y?", x==y, " :  x is y?", x is y)
...x+=1
...y+=1
...print("x==y?", x==y, " :  x is y?", x is y)
...print("[2] id(x):", id(x), ", id(y):", id(y))

Returns for me the expected results:

[1] id(x): 4584244528 , id(y): 4584245712
x==y? True  :  x is y? False
x==y? True  :  x is y? False
[2] id(x): 4584244272 , id(y): 4584244912

Same when saving your code as a file test.py and running it from the CLI:

(variables)   variables python --version
Python 3.7.4
(variables)   variables python test.py
[1] id(x): 4458048272 , id(y): 4458046544
x==y? True  :  x is y? False
x==y? True  :  x is y? False
[2] id(x): 4457285360 , id(y): 4457284496

Python 3.8

So, I also installed 3.8 just to make sure, not really expecting a difference since that would be a very strange change. But indeed:

(variables)   variables python --version
Python 3.8.0
(variables)   variables python test.py
[1] id(x): 4502927920 , id(y): 4502927920
x==y? True  :  x is y? True
x==y? True  :  x is y? False
[2] id(x): 4502927472 , id(y): 4502927152

My mind is blown! Let’s consult the folks who did the articles on changes in Python 3.8 @Geir Arne Hjelle and @Chris Bailey!

John DB

FYI: Another way to build or burnish your Pythonista credentials: Jump into the Real Python tutorials, run and test the given examples including corner cases… then if a RealPythonista(tm) says you’ve blown his mind, you know you’re on the right track! :-)

I almost skipped this “variables” course, thinking it was far too basic - but decided to spend a bit time here anyway, thinking there might be useful insight into subtle areas. Sure enough!

Geir Arne Hjelle RP Team

First of all, I guess this is an example as good as any as to why you should use == and not is to do comparisons (with ... is None as the main exception) :) Reuven Lerner has a nice article exploring this in more detail: lerner.co.il/2015/06/16/why-you-should-almost-never-use-is-in-python/

I haven’t been able to track down the full explanation of what is happening, but here are a few discoveries and theories.

• First of all, small integers (-5 through 256) are interned at start-up - as covered in Martin’s video

• Next, Python can also reuse objects during compilation. When running in a REPL each line is compiled separately, while a Python script is compiled as a whole.

This explains that you get different results for the following code snippets:

>>> x = 3000
>>> y = 3000
>>> x is y
False

In the REPL each line is compiled separately, so x and y point to different objects. However, in a script, everything is compiled together, so x and y point to the same object:

\$ cat > x_is_y.py
x = 3000
y = 3000
print(x is y)

\$ python x_is_y.py
True

You can see the same effect by defining a function in the REPL:

>>> def f():
...     x = 3000
...     y = 3000
...     return x is y
...
>>> f()
True

The function is compiled in one go, so again x and y point to the same object.

• Using semicolons is another way to force the REPL to compile several statements at once, so the following gives the (by now?) expected result:

>>> x = 3000; y = 3000
>>> x is y
True

• Now things are about to get weird, though. Using tuple unpacking to assign several variables should also force the REPL to compile the objects at the same time (which I assume it does), and reuse the 3000-object. However:

>>> x, y = 3000, 3000
>>> x is y
False

It turns out that this last result–as has been noted earlier–is very dependent on Python version. As far as I’m able to tell, this only is False on Python 3.7.

Here I’m using Docker to test this out on many different versions of Python:

\$ docker run --rm -it python:2.7.17-slim python -c "x, y = 3000, 3000; print (x is y)"
True

\$ docker run --rm -it python:3.6.9-slim python -c "x, y = 3000, 3000; print(x is y)"
True

\$ docker run --rm -it python:3.7.0a3-slim python -c "x, y = 3000, 3000; print(x is y)"
True

\$ docker run --rm -it python:3.7.0a4-slim python -c "x, y = 3000, 3000; print(x is y)"
False

\$ docker run --rm -it python:3.7.5-slim python -c "x, y = 3000, 3000; print(x is y)"
False

\$ docker run --rm -it python:3.8.0a1-slim python -c "x, y = 3000, 3000; print(x is y)"
True

So, it seems that there was some change introduced in Python 3.7 alpha 4 that made the unpacking work differently. Then, that behavior was reverted again for the whole Python 3.8 series.

The changes in Python 3.7 alpha 4 are listed at docs.python.org/3/whatsnew/changelog.html#python-3-7-0-alpha-4 I haven’t been able to see where this can have happened, although bugs.python.org/issue30416 may be one candidate? I don’t really have a clue as to where it may have been reverted though. The changelog for Python 3.8 alpha 1 is humongous: docs.python.org/3/whatsnew/changelog.html#python-3-8-0-alpha-1

• For a final mind-bender, this is what I see when using IPython, seemingly for all versions of Python:

In [1]: x = 3000; y = 3000

In [2]: x is y
Out[2]: False

It seems that IPython treats semicolon-expressions as completely separate expressions and sends them off to the interpreter one by one?

Running the unpacking examples in IPython (x, y = 3000, 3000) seem to be consistent with the Python version differences seen above.

hikerguy

I just ran the last piece of code in PyCharm and got the opposite result:

x = 3000; y = 3000 print(x is y)

returns True

So, what is the general consensus here? It doesn’t appear that the -5 to 256 rule applies.

hikerguy

Well, to confuse things more, I just ran “Pub quiz” presented earlier and it stops at 257 (in PyCharm). Not sure if that helps, but hopefully somoeone can clear the air on this matter.

Thanks,

Andy

Geir Arne Hjelle RP Team

I’d say the general consensus is that you should use == when comparing numbers (and most other things) and not is ;)

The -5 to 256 rule does apply, but Python does other optimizations that may intern other numbers as well. In most REPLs, one such optimization is that repeated numbers separated by semicolon end up point to the same object - as you note.

There are differences though, both between different REPLs and between different versions of Python. In my earlier comment above, I show some of these differences.

Martin Breuss RP Team

Thanks for the in-depth analysis and the one-liner takeaway @Geir Arne Hjelle :)

use == when comparing numbers (and most other things) and not is

Here’s again the link to Reuven Lerner’s blog post on this topic (for everyone who scrolled straight to the bottom) :)

keyurratanghayra

Hi There,

This is weird but if I am using python3 repl, this observation stands true.

`

a = 3000 b = 3000 a ==b True id(a) == id(b) False`

But Pycharm tells a different story: Both the results are true in PyCharm. Any poniters?

Martin Breuss RP Team

Hello @keyurratanghayra. Some results here can be surprising due to a couple of factors. Did you read over the comments on this page? What are your results in the different environments (e.g. which Python version is each of them using, etc.)?

Harsh Chaklasiya

a = 3400
b = 3400
c = a+1
d = b+1
print(a == b)
print(id(a) == id(b))

I did this and I got True, False in Pycharm. MacOS Catalina!

Martin Breuss RP Team

That makes sense, since both of the integers are much higher than 256, which means that Python will refer to different integer objects. Double-check this course section if it’s unclear to you why you got this output.

DoubleA

Hi there. Running the same code as in the video in a) VS code and in b) cmd and getting different results :)

microb1tch

For the “pub quiz,” I figured that the final print(a) statement would output 257, which is what I got when I ran the code in interpretive Python.

Interestingly enough, it seems that when I assign multiple variables to the same value (where value > 256), using one line on the interpreter, something interesting happens - Python seems to create one object, to which both variable names refer. Example code:

x = 3000; y = 3000
x == y
True
x is y
True
id(x)
37560752
id(y)
37560752

This seems to be the same phenomenon others have observed. Maybe this is yet another optimization?

If I assign the values separately, I see the behavior I would expect given what I learned in the video - Python creates two different objects with the same value. Example code:

x = 3000
y = 3000
x == y
True
x is y
False
id(x)
37451488
id(y)
36512832

Incrementing the value for each variable by 1 does what I expect, whether I make the assignment on one line, or do so separately. In either case, Python creates new, unique objects for each incremented value.

microb1tch

Re: my last point about incrementing by 1, I checked whether new/unique objects were created by using the equality (==) and identity (is) operators. The equality operator returned True, but the identity operator returned False, as expected.

I also checked this using the id() function. After the initial assignment statement, id() indicated that both variables referred to the same memory location (ie. they referred to the same object). After incrementing the variables, id(x) and id(y) returned unique numbers, suggesting that the incremented values were new, unique objects.

So it seems that if you do multiple assignment statements w/ the same value all at once (regardless of whether the integer is greater or less than 256), you create one object in one memory location, which has multiple references to it. But when you manipulate that object (eg. by incrementing its value), Python will create new objects for the changed values (whether you change the values all at once on one interpreter line, or do so on multiple lines w/ separate statements), which occupy different locations in memory than the original object.

It seems that Python copies the original integer object to a new location in memory when you manipulate it, rather than overwriting the area in memory it occupied. Presumably, the original value(s) (eg. the integer object 3000 that was created w/ the initial assignment statement in my first code example above) get taken out with the trash when Python collects the garbage, since they no longer have any references to them.

Martin Breuss RP Team

Great investigative research on this topic @RheaRevolver and your conclusions all sound correct as far as my current knowledge of Python internals goes! :)

You might enjoy reading through the CPython Internals Book!

Bartosz Zaczyński RP Team

@RheaRevolver That’s correct. I briefly touched on it in a tutorial about the bitwise operators in Python.

jyotirmoyr1

Hi, For me in: Ubuntu 20..4.3 LTS Python version- 3.9.7 The result for below code is:

x=21
y=20
print('Is x==y, where x=21 and y=20 :', x==y)
## results False
print('is id(x)==id(y),where x=21 and y=20  :',id(x)==id(y),id(x),id(y))
## Results False
x=900
y=900
print()
print('Is x==y, where x=900 and y=900 :', x==y)
## results True
print('is id(x)==id(y),where x=900 and y=900  :',id(x)==id(y),id(x),id(y))
## results True

After going through the discussion list it seems that the results differ depending on Python version/Platform and the editor getting used.

Thanks, JR

jyotirmoyr1

More on this, the fun is when code written in a editor like VScode and running on terminal window giving result as True for bigger integers, but when running from Python interpreter it self the result is False. Why??

Thanks, JR

Kumar Abhishek

@Martin the id’s for any variables storing any range of values are just the same.

Martin Breuss RP Team

@Kumar Abhishek I’m not sure what you mean with this:

>>> a = range(100)
>>> b = range(200)

>>> id(a)
4375712320

>>> id(b)
4376351824

>>> id(a) == id(b)
False

Also not quite sure how it relates to the lesson. Could you try to explain your comment a bit more?

Valdemar

I wanted to test out RheaRevolver’s point, so I ran this program:

a, b = 300, 300
c = 300
print(a is b is c)
print(id(a),id(b), id(c))

for _ in range(250, 260):
if a is not b:
break
a += 1
b += 1

print(a is b)
print(id(a),id(b))
print(a)

d = 300

print(id(c) is id(d))
print(id(c), id(d))

And it does seem, that a, b and c all point to the same object, but d points a different object, even though c remains unchanged. At least print(id(c) is id(d)) returns False. So it seems that assigning the same value at the same time means pointing to the same object, but assigning the same value at different times, points to a different object. However the final print statement returns the same address for the two objects.

I thought that was a little strange, so I wanted to share it. Does ‘is’ not compare the two addresses?

to join the conversation.