Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Small Integer Caching

Give Feedback

What is small integer caching? Python caches small integers, which are integers between -5 and 256. These numbers are used so frequently that it’s better for performance to already have these objects available. So these integers will be assigned at startup. Then, each time you refer to one, you’ll be referring to an object that already exists.

Victor Amadi on Dec. 11, 2019

On my system, even when using larger integer values a and b both equate True. Maybe the video can be updated?

John DB on Dec. 11, 2019

I get 257 which is my first guess - because that’s the first “non-cached” number outside the pre-cached range of -5…256.

But Viktor has a point - the video should be updated to show the answer, in case we make typos and get a misleading and misunderstood result.

Also - is “small integer caching” a fixed and guaranteed facet of the Python language for all time, on which developers can make assumptions? Or is it an implementation quirk of C-python that might not occur in other implementations like Jython?

I found this snippet below useful to reassure me that Python automatically auto-vivifies a new object as soon as a variable’s value changes. Chaos otherwise!

x, y = 3000, 3000
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)

Result:

x==y? True  :  x is y? True
x==y? True  :  x is y? False

Albrecht on Dec. 12, 2019

My system (Windows 10, Python 3.7 with IDLE out of the same package) gives the expected results:

Quiz: 257 John DB’s code from Dec. 11, 2019:

x==y? True  :  x is y? False
x==y? True  :  x is y? False

So it’s not a problem of updating. Where does this different behavior come from?

John DB on Dec. 12, 2019

I confirm Albrecht’s claim!

On an old MS-Windows 7 VM, I installed the latest Miniconda 32-bit with Python 3.7.4.

Code

""" test """
str = "/usr/bin/ls"
print ("-> file:", str)

x, y = 3000, 3000
print("id_x:", id(x), ", id y:", id(y))
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)
print("id_x:", id(x), ", id y:", id(y))

Launch IDLE from MS-DOS (aka cmd or prompt or something else):

(base) C:\Users\JDB> python --version
Python 3.7.4

(base) C:\Users\JDB> idle
... editor appears ...

IDLE run shows this:

Python 3.7.4 (default, Aug  9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> 
=== RESTART: C:/Users/JDB/work/2019-12.Win_Py/is_it.py ===
-> file: /usr/bin/ls
id_x: 32776752 , id y: 32776768
x==y? True  :  x is y? False
x==y? True  :  x is y? False
id_x: 32776704 , id y: 32776784

What gives?

I suspect that this MS-Windows implementation doesn’t perform the optimization of sharing objects on that first assignment: the first id(x) and id(y) show different numbers (addresses) when run on MS-Windows, but the same addresses on my MacOS C-Python 3.8.0.

I therefore assume that programmers shouldn’t depend on this optimization - but be aware that it might exist, so be careful with “is” comparisons and initialization assumptions.

Albrecht on Dec. 13, 2019

@John DB:

I therefore assume that programmers shouldn’t depend on this optimization - but be aware that it might exist, so be careful with “is” comparisons and initialization assumptions.

I fully agree with your conclusion. “Small integers caching” looks more like a bug than a feature ;-)

John DB on Dec. 13, 2019

<like>

Martin Breuss RP Team on Dec. 14, 2019

Hello @John DB and @Albrecht!

The code snippet @John DB should never return True, since the values you’re starting out with (3000) are much higher than the cached numbers from -5 to 256.

I’ve double-checked the code also on my MacOS machine, and the results are that the ids of both integer objects are different for both cases (3000 and 3001) just as @Albrecht reported back as well. @John DB can you double-check what was the exact code you ran when you received the surprising result?

As in when will small integer caching occur–I know it’s part of the C-Python implementation but might be different in e.g. Jython.

John DB on Dec. 15, 2019

I double-checked and confirmed my original result. (This is done within Pycharm, on MacOS Mojave.)

Can I post a picture here to show the screenshot? Otherwise I repeat the test-case here:

""" test """
str = "/usr/bin/ls"
print ("-> file:", str)

x, y = 3000, 3000
print("[1] id(x):", id(x), ", id(y):", id(y))
print("x==y?", x==y, " :  x is y?", x is y)
x+=1
y+=1
print("x==y?", x==y, " :  x is y?", x is y)
print("[2] id(x):", id(x), ", id(y):", id(y))

Result:

/usr/local/bin/python3.8 /Users/jdb/Library/Preferences/PyCharm2019.3/scratches/scratch.py
-> file: /usr/bin/ls
[1] id(x): 4366603504 , id(y): 4366603504
x==y? True  :  x is y? True
x==y? True  :  x is y? False
[2] id(x): 4366603728 , id(y): 4366603696
Process finished with exit code 0

Actually, if optimization is involved and the “is” case works in some cases, why not assume the interpreter might (or might not) do it in other situations as well - and unpredictably?

Martin Breuss RP Team on Dec. 15, 2019

Wow @John DB! This is still a very surprising result so I tried a couple of things to reproduce it:

Python 3.7

Running your code in the Python Interpreter of PyCharm on MacOS Mojave 10.14.3 using Python 3.7:

>>> x, y = 3000, 3000
...print("[1] id(x):", id(x), ", id(y):", id(y))
...print("x==y?", x==y, " :  x is y?", x is y)
...x+=1
...y+=1
...print("x==y?", x==y, " :  x is y?", x is y)
...print("[2] id(x):", id(x), ", id(y):", id(y))

Returns for me the expected results:

[1] id(x): 4584244528 , id(y): 4584245712
x==y? True  :  x is y? False
x==y? True  :  x is y? False
[2] id(x): 4584244272 , id(y): 4584244912

Same when saving your code as a file test.py and running it from the CLI:

(variables)   variables python --version
Python 3.7.4
(variables)   variables python test.py 
[1] id(x): 4458048272 , id(y): 4458046544
x==y? True  :  x is y? False
x==y? True  :  x is y? False
[2] id(x): 4457285360 , id(y): 4457284496

Python 3.8

So, I also installed 3.8 just to make sure, not really expecting a difference since that would be a very strange change. But indeed:

(variables)   variables python --version
Python 3.8.0
(variables)   variables python test.py 
[1] id(x): 4502927920 , id(y): 4502927920
x==y? True  :  x is y? True
x==y? True  :  x is y? False
[2] id(x): 4502927472 , id(y): 4502927152

My mind is blown! Let’s consult the folks who did the articles on changes in Python 3.8 @Geir Arne Hjelle and @Chris Bailey!

John DB on Dec. 15, 2019

@Dan Bader:

FYI: Another way to build or burnish your Pythonista credentials: Jump into the Real Python tutorials, run and test the given examples including corner cases… then if a RealPythonista(tm) says you’ve blown his mind, you know you’re on the right track! :-)

I almost skipped this “variables” course, thinking it was far too basic - but decided to spend a bit time here anyway, thinking there might be useful insight into subtle areas. Sure enough!

Geir Arne Hjelle RP Team on Dec. 15, 2019

First of all, I guess this is an example as good as any as to why you should use == and not is to do comparisons (with ... is None as the main exception) :) Reuven Lerner has a nice article exploring this in more detail: lerner.co.il/2015/06/16/why-you-should-almost-never-use-is-in-python/

I haven’t been able to track down the full explanation of what is happening, but here are a few discoveries and theories.

  • First of all, small integers (-5 through 256) are interned at start-up - as covered in Martin’s video

  • Next, Python can also reuse objects during compilation. When running in a REPL each line is compiled separately, while a Python script is compiled as a whole.

    This explains that you get different results for the following code snippets:

    >>> x = 3000
    >>> y = 3000
    >>> x is y
    False
    

    In the REPL each line is compiled separately, so x and y point to different objects. However, in a script, everything is compiled together, so x and y point to the same object:

    $ cat > x_is_y.py
    x = 3000
    y = 3000
    print(x is y)
    
    $ python x_is_y.py 
    True
    

    You can see the same effect by defining a function in the REPL:

    >>> def f():
    ...     x = 3000
    ...     y = 3000
    ...     return x is y
    ... 
    >>> f()
    True
    

    The function is compiled in one go, so again x and y point to the same object.

  • Using semicolons is another way to force the REPL to compile several statements at once, so the following gives the (by now?) expected result:

    >>> x = 3000; y = 3000
    >>> x is y
    True
    
  • Now things are about to get weird, though. Using tuple unpacking to assign several variables should also force the REPL to compile the objects at the same time (which I assume it does), and reuse the 3000-object. However:

    >>> x, y = 3000, 3000
    >>> x is y
    False
    

    It turns out that this last result–as has been noted earlier–is very dependent on Python version. As far as I’m able to tell, this only is False on Python 3.7.

    Here I’m using Docker to test this out on many different versions of Python:

    $ docker run --rm -it python:2.7.17-slim python -c "x, y = 3000, 3000; print (x is y)"
    True
    
    $ docker run --rm -it python:3.6.9-slim python -c "x, y = 3000, 3000; print(x is y)"
    True
    
    $ docker run --rm -it python:3.7.0a3-slim python -c "x, y = 3000, 3000; print(x is y)"
    True
    
    $ docker run --rm -it python:3.7.0a4-slim python -c "x, y = 3000, 3000; print(x is y)"
    False
    
    $ docker run --rm -it python:3.7.5-slim python -c "x, y = 3000, 3000; print(x is y)"
    False
    
    $ docker run --rm -it python:3.8.0a1-slim python -c "x, y = 3000, 3000; print(x is y)"
    True
    

    So, it seems that there was some change introduced in Python 3.7 alpha 4 that made the unpacking work differently. Then, that behavior was reverted again for the whole Python 3.8 series.

    The changes in Python 3.7 alpha 4 are listed at docs.python.org/3/whatsnew/changelog.html#python-3-7-0-alpha-4 I haven’t been able to see where this can have happened, although bugs.python.org/issue30416 may be one candidate? I don’t really have a clue as to where it may have been reverted though. The changelog for Python 3.8 alpha 1 is humongous: docs.python.org/3/whatsnew/changelog.html#python-3-8-0-alpha-1

  • For a final mind-bender, this is what I see when using IPython, seemingly for all versions of Python:

    In [1]: x = 3000; y = 3000
    
    In [2]: x is y
    Out[2]: False
    

    It seems that IPython treats semicolon-expressions as completely separate expressions and sends them off to the interpreter one by one?

    Running the unpacking examples in IPython (x, y = 3000, 3000) seem to be consistent with the Python version differences seen above.

hikerguy on Jan. 7, 2020

I just ran the last piece of code in PyCharm and got the opposite result:

x = 3000; y = 3000 print(x is y)

returns True

So, what is the general consensus here? It doesn’t appear that the -5 to 256 rule applies.

hikerguy on Jan. 7, 2020

Well, to confuse things more, I just ran “Pub quiz” presented earlier and it stops at 257 (in PyCharm). Not sure if that helps, but hopefully somoeone can clear the air on this matter.

Thanks,

Andy

Geir Arne Hjelle RP Team on Jan. 7, 2020

I’d say the general consensus is that you should use == when comparing numbers (and most other things) and not is ;)

The -5 to 256 rule does apply, but Python does other optimizations that may intern other numbers as well. In most REPLs, one such optimization is that repeated numbers separated by semicolon end up point to the same object - as you note.

There are differences though, both between different REPLs and between different versions of Python. In my earlier comment above, I show some of these differences.

Martin Breuss RP Team on Jan. 7, 2020

Thanks for the in-depth analysis and the one-liner takeaway @Geir Arne Hjelle :)

use == when comparing numbers (and most other things) and not is

Here’s again the link to Reuven Lerner’s blog post on this topic (for everyone who scrolled straight to the bottom) :)

keyurratanghayra on April 26, 2020

Hi There,

This is weird but if I am using python3 repl, this observation stands true.

`

a = 3000 b = 3000 a ==b True id(a) == id(b) False`

But Pycharm tells a different story: Both the results are true in PyCharm. Any poniters?

Martin Breuss RP Team on April 27, 2020

Hello @keyurratanghayra. Some results here can be surprising due to a couple of factors. Did you read over the comments on this page? What are your results in the different environments (e.g. which Python version is each of them using, etc.)?

Harsh Chaklasiya on May 3, 2020

a = 3400 b = 3400 c = a+1 d = b+1 print(a == b) print(id(a) == id(b))

i did this and i got

True False in Pycharm. MacOS Catalina!

Martin Breuss RP Team on May 9, 2020

That makes sense, since both of the integers are much higher than 256, which means that Python will refer to different integer objects. Double-check this course section if it’s unclear to you why you got this output.

Become a Member to join the conversation.