CPython Internals
00:00 In the previous lesson, I talked about new features to the static typing system. In this lesson, I’ll give you an overview of some of the changes happening inside the CPython interpreter.
00:11 Python is an interpreted language. By contrast, a compiled language uses a compiler to translate your code into the machine language used by the computer, whereas an interpreted language reads a file and runs the instructions in the interpreter itself.
00:28 You can kind of think of it like the interpreter being a simulation of a computer running on your computer. One advantage of this is the same code can run on different machines because it’s the interpreter’s responsibility to speak the machine’s language.
00:42 A disadvantage of this is it tends to be slower, as it adds a layer of abstraction between the code and the machine. In all likelihood, the interpreter that you’re running is CPython.
00:53 There are others out there, though, but if you’re not sure which one you’re running, you’re probably running CPython. When you hear about performance improvements in a new release of Python, technically that is performance improvements in the CPython interpreter.
01:08 When you run a Python program using CPython, it instantiates an instance of an interpreter. As you can have more than one instance, this first one is called the main interpreter.
01:20 The main interpreter can spawn copies of itself, those copies being called subinterpreters.
01:27 Most of the parts of subinterpreters are independent of each other. The keyword in that sentence is most. The subinterpreter concept isn’t new. It’s actually been around since Python 1.5, but it operates below the level of the language.
01:42 Unless you’re writing an extension, you don’t have to be aware of it.
01:48 Remember when I said most? Well, there are things you have to be careful with when you run in parallel. You can’t have two different processes changing a single value at the same time. This causes consistency bugs and other problems. To work around this, Python has the GIL. That’s short for global interpreter lock, and it is the bane of people trying to write parallel code.
02:11 There is a lot of work going on to try and shrink the GIL’s impact and/or get rid of it completely. In fact, there are two PEPs that I’m going to talk about that affect the structure of subinterpreters.
02:21 With respect to the GIL, PEP 684 moves the GIL from being global to the subinterpreter level, while PEP 554 adds Python-level access to this mechanism. This feature won’t actually be exposed until the next release, in Python 3.13.
02:38 Moving the GIL means moving almost all the global state, which is a whole bunch of work. It also causes some problems. Any existing extension code likely makes the assumption that the GIL is global, not part of the subinterpreter.
02:54 So part of the work here is to create a path for the extensions. Extensions can mark themselves as being aware of this change. If they are, they can take advantage of it. If they aren’t, then the old mechanism stays in place for backward compatibility.
03:08 All of this is under the hood. It isn’t until 3.13 that this is going to be exposed to the Python level. In the meantime, if you aren’t an extension writer, you won’t notice the change at all.
03:20 For more details about subinterpreters and the changes connected to that in 3.12, see this tutorial.
03:29 I really need to get a T-shirt printed up that says everything in Python is an object. I seem to say it enough. Inside the interpreter, those very same objects are tracked by C structures that are kind of object-like.
03:42 Each one of these structures contains the object’s actual data, as well as some metadata that goes with it. The metadata is there to help track when an object is being referenced and therefore whether it can be deleted. Because all objects use a similar structure, even those objects that can’t be changed have mutable metadata sections.
04:04 Some immutable objects exist for the lifetime of the interpreter, which means they’ll never be garbage collected, and the metadata associated with them is unnecessary overhead.
04:16
PEP 683 proposes a way of doing something about this. In addition to having immutable objects, the interpreter will also have immortal objects. Those that are immortal don’t need the extra metadata, and it can be optimized away. There are more immutable objects than you might think, and some things that will be able to become immortal include the None
object, certain integers, and some strings.
04:42 These things are actually used so much that marking them as immortal can actually make a big difference. They don’t change, so they don’t need cache-handling code.
04:52 They don’t need to be synchronized across multiprocess instances, and by getting rid of some of the metadata, memory can be saved. This mechanism is purely internal to CPython. It won’t affect your code.
05:05 The PEP was brought by the folks at Instagram, and they have seen a significant improvement in memory usage and startup time in some of their large Django clusters by introducing immortal objects.
05:17
You’re probably familiar with list, dictionary, and set comprehensions in Python. This is an example of a list comprehension, which iterates over the numbers
list and creates a new list which contains the squares of the values in numbers
. Generally speaking, comprehensions tend to be faster than their straight Python equivalent, and this has to do with how the interpreter can optimize them. Internally, these comprehensions get turned into a nested function. Yeah, well, and it turns out functions have overhead and can be expensive.
05:50 So, PEP 709 changes how comprehensions are represented internally, making them inline code instead of nested functions. The contents of a comprehension are their own little namespace.
06:04
Consider the n
inside of my example there. It isn’t actually in the local stack. This is why they were originally created using a nested function, because the internal nested functions namespacing could be used for scope. To change them to inline code, a bit of wizardry is required to properly deal with the variables, putting them on the stack before the comprehension and removing them just after, essentially mimicking this part of a function’s role, but removing the need to do a jump call.
06:33 This change is doubling the performance of comprehensions. Of course, that’s just the comprehension part, not your entire script, but if you have some heavy comps or you use them a lot, this is free speedup for your code.
06:47
The last change I’m going to talk about here is a Linux-specific feature. Linux comes with a tool called perf
, which is a profiler. It tracks most hardware events, as well as some software events in the OS.
07:00
With it, you can build call graphs, and there are a large number of tools out there that add additional functionality on top of perf
’s output. Prior to 3.12, if you run perf
on a Python program, you won’t see anything about Python, just the entry point to the interpreter and any underlying C code that gets called.
07:21
Python 3.12 has added hooks to interact with the perf
profiler. Doing this means Python calls can now be monitored as if they were native calls, and it makes it easier for you to do profiling throughout the entire code stack.
07:36 One advantage of this is it can help you see where the GIL is getting in your way, at least while it’s still there. Core devs are busy trying to get rid of it.
07:44 As I mentioned, this is a Linux-only feature, and it isn’t enabled by default. You have to set a shell variable to make it go. If you’re on Linux, and you want to learn more about this feature, this article has a deep dive for you.
08:00 That’s the key parts of Python 3.12. Last up, I’ll summarize the course and point you at some sources of further investigation.
Become a Member to join the conversation.