Intern Objects

Pointers and Objects in Python Austin Cepalia 02:54

In this lesson, you’ll learn about a intern objects, which are a subset of objects that Python pre-creates in memory and keeps in the global namespace. These variables are extremely likely to be used in many programs, and this prevents memory allocation calls for consistently used objects.

00:00 If you were thinking that Python’s method of creating new PyObjects and redirecting names to point to them is inefficient, then you’d be correct.

00:10 The core Python developers knew this, and so they created something called intern objects. Interned objects are pre-created objects in memory that can be accessed from anywhere in your program. Before creating a new object in memory, Python will check to see if it already exists as one of these intern objects.

00:33 If it does, the name will point to it. If not, a new object is created for the name to point to. You can think of these intern objects as a cache. It prevents Python from having to repeatedly create some of the most common objects used in Python programs.

00:52 You can manually intern objects, but Python defaults to interning some integers and strings for you. What gets interned automatically depends on the Python implementation, but in CPython 3.7, integers between -5 and 256 are interned, as with strings that are less than 20 characters and contain only ASCII letters, digits, or underscores.

01:20 These sets were chosen because they are the most commonly used in Python programs.

01:27 Here’s an example of this with integers. As you can see on the left, x and y point to the same memory address because their assigned value falls between -5 and 256.

01:41 If we increase that to something like 1500, then we see that Python creates separate objects for the names.

01:51 Some string objects are interned, too. These two short strings on the left are interned because they meet the interning criteria. But on the right, the strings contain an exclamation mark, which causes them to fail the criteria for automatic interning.

02:10 As I mentioned before, what gets interned automatically is implementation-specific. It’s ultimately a trade-off between memory usage and time complexity.

02:22 Also, interning rules may appear different when running code in a script, rather than in the interactive shell directly. This is because some compilers used to compile Python scripts are smart enough to optimize different names if they detect they’re pointing to the same values, even if they fall outside the range for pre-interned objects.

02:45 Python might not always be as fast as other languages, but it does try to make up for it with techniques like object interning.

devavrat on Oct. 22, 2020

As we have seen in the previous chapter, Python creates an object if it doesn’t exist. And when the value repeats in the future, it just creates a new binding with the existing object.

However, in this video, you showed x and y both are 1500. Since 1500 exists outside the range of intern, a new obj was created, and x pointed to it. But why “y” didn’t get pointed even when it had the same value? This contradicts the previous chapter completely.

Please help.

Bartosz Zaczyński RP Team on Oct. 23, 2020

@devavratk96 Let’s break it down to see what’s going on in more detail.

When you assign an expression to a Python variable, a few things happen:

The expression to the right of the assignment operator (=) is evaluated.
Your variable on the left side of the assignment operator is pointed to the object on the right. (It doesn’t matter what value it pointed to previously if any.)

When the expression is a value like a string literal ("Lorem ipsum"), a number (1500), or even a list, it will almost always result in allocating memory for a new object that will be created:

>>> x = "Lorem ipsum"
>>> y = "Lorem ipsum"
>>> x is y
False

Variables x and y refer to different memory locations, which are two distinct objects despite having the same value!

The only exception, in this case, would be interned objects, which are singletons. These only have one global instance in the whole application:

>>> x = "Lorem"
>>> y = "Lorem"
>>> x is y
True

The first assignment creates a brand new object in memory, but the second one reuses a “cached” string. It’s the same optimization technique as with small integers in Python, which may speed dictionary lookups in some cases.

When you assign another variable, you’re effectively aliasing some existing object, which will be referenced by more than one variable:

>>> x = "Lorem ipsum"
>>> y = x
>>> x is y
True

So, instead of creating a copy of the original string, now you have two ways of referencing the same object.

Things get more interesting when you try modifying the value of a variable. Depending on the type of the underlying object that the variable is pointing to, you’ll get different results. This touches upon the mutability of an object.

Mutable objects let you change their value without changing their identity. A Python list is an example of such an object:

>>> x = ["apple", "banana"]
>>> y = x
>>> x.append("orange")
>>> x is y
True
>>> x
['apple', 'banana', 'orange']
>>> y
['apple', 'banana', 'orange']

Aliasing allows you to update the list with variable x, and observe the modification with both variables.

This isn’t the case with immutable objects such as numbers:

>>> x = 1500
>>> y = x
>>> x is y
True
>>> x += 1  # Same as: x = x + 1
>>> x is y
False
>>> x, y
(1501, 1500)

When you modify a variable in place, you’re actually making a new assignment with an expression that includes a reference to the original variable. Because numbers in Python are immutable, addition creates a new object with the value of 1501, which is assigned back to the x variable. At the same time, variable y still points to the old value of 1500.

devavrat on Oct. 23, 2020

Thank you very much for the swift explanation. It is much clearer now. Also, as we saw in previous video, PyObject has Value, Type and Reference Count. When we have a mutable object such as a list, how does its system/diagram will look like? I assume the Value section will hold a reference to multiple value objects. Because, our main memory location of PyObject do not change even when we add or remove values. Am I correct?

Bartosz Zaczyński RP Team on Oct. 23, 2020

@devavratk96 Python lists are special because they exhibit some useful features that would otherwise be associated with a few data structures in a low-level language like C.

On the one hand, Python lists are like traditional arrays because they provide so-called random access to their elements. In other words, if you know the index of an element, you can get the corresponding value immediately without any lookup because the index is basically the memory address. That effect can be achieved by laying out your elements as a sequence in one contiguous block of memory. However, it’ll only work as long as every element occupies precisely the same number of bytes. If so, you’ll be able to multiply that index by the size of a single element and offset it by the address of the first element in the sequence. One way of enforcing the same size for all elements is to keep only elements of the same type.

However, as you well know, lists in Python are heterogenous by letting you stuff in elements of arbitrary types:

elements = ["apple", 42, True, 3.14, 1 + 6j]

That’s possible with a linked list, which keeps track of elements scattered all over computer memory. Since elements no longer need to occupy a contiguous block of memory, each can have its own size. The tradeoff is a significantly slower lookup.

How is it possible that Python lists get the best of both worlds?

It seems they’re a hybrid of arrays and linked lists, which requires a little bit more memory. Python maintains a dynamic array of pointers to the actual elements. Every pointer, which represents some memory address, is a number that occupies a known number of bytes. Elements, on the other hand, can be anywhere and can have any size.

If you’re really curious about it, you can take a peek at the clever implementation in the CPython source code.

karka on March 14, 2022

Hi,

I’m checking the case when a=280 and b=280. 280 is out of the [-5, 256] range but their ids are same in the output. Using Python 3.9. Has this range changed for newer version?

Bartosz Zaczyński RP Team on March 14, 2022

@karka Nothing’s changed. Here’s your example executed in Python 3.9:

Python 3.9.9 (main, Feb  3 2022, 09:45:44) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 280
>>> b = 280
>>> a is b
False
>>> id(a) == id(b)
False

How did you define those two variables? Python enables int interning for variables defined on the same line:

>>> a, b = 280, 280
>>> a is b
True
>>> id(a) == id(b)
True

karka on March 14, 2022

I am using PyCharm.

a = 280
b = 280
print(a is b)

Output is True

Bartosz Zaczyński RP Team on March 14, 2022

@karka That’s strange because I’m unable to reproduce such a behavior.

Become a Member to join the conversation.