00:11 Python is an interpreted language. By contrast, a compiled language uses a compiler to translate your code into the machine language used by the computer, whereas an interpreted language reads a file and runs the instructions in the interpreter itself.
00:28 You can kind of think of it like the interpreter being a simulation of a computer running on your computer. One advantage of this is the same code can run on different machines because it’s the interpreter’s responsibility to speak the machine’s language.
00:53 There are others out there, though, but if you’re not sure which one you’re running, you’re probably running CPython. When you hear about performance improvements in a new release of Python, technically that is performance improvements in the CPython interpreter.
01:27 Most of the parts of subinterpreters are independent of each other. The keyword in that sentence is most. The subinterpreter concept isn’t new. It’s actually been around since Python 1.5, but it operates below the level of the language.
01:48 Remember when I said most? Well, there are things you have to be careful with when you run in parallel. You can’t have two different processes changing a single value at the same time. This causes consistency bugs and other problems. To work around this, Python has the GIL. That’s short for global interpreter lock, and it is the bane of people trying to write parallel code.
02:11 There is a lot of work going on to try and shrink the GIL’s impact and/or get rid of it completely. In fact, there are two PEPs that I’m going to talk about that affect the structure of subinterpreters.
02:21 With respect to the GIL, PEP 684 moves the GIL from being global to the subinterpreter level, while PEP 554 adds Python-level access to this mechanism. This feature won’t actually be exposed until the next release, in Python 3.13.
02:38 Moving the GIL means moving almost all the global state, which is a whole bunch of work. It also causes some problems. Any existing extension code likely makes the assumption that the GIL is global, not part of the subinterpreter.
02:54 So part of the work here is to create a path for the extensions. Extensions can mark themselves as being aware of this change. If they are, they can take advantage of it. If they aren’t, then the old mechanism stays in place for backward compatibility.
03:29 I really need to get a T-shirt printed up that says everything in Python is an object. I seem to say it enough. Inside the interpreter, those very same objects are tracked by C structures that are kind of object-like.
03:42 Each one of these structures contains the object’s actual data, as well as some metadata that goes with it. The metadata is there to help track when an object is being referenced and therefore whether it can be deleted. Because all objects use a similar structure, even those objects that can’t be changed have mutable metadata sections.
PEP 683 proposes a way of doing something about this. In addition to having immutable objects, the interpreter will also have immortal objects. Those that are immortal don’t need the extra metadata, and it can be optimized away. There are more immutable objects than you might think, and some things that will be able to become immortal include the
None object, certain integers, and some strings.
04:52 They don’t need to be synchronized across multiprocess instances, and by getting rid of some of the metadata, memory can be saved. This mechanism is purely internal to CPython. It won’t affect your code.
05:05 The PEP was brought by the folks at Instagram, and they have seen a significant improvement in memory usage and startup time in some of their large Django clusters by introducing immortal objects.
You’re probably familiar with list, dictionary, and set comprehensions in Python. This is an example of a list comprehension, which iterates over the
numbers list and creates a new list which contains the squares of the values in
numbers. Generally speaking, comprehensions tend to be faster than their straight Python equivalent, and this has to do with how the interpreter can optimize them. Internally, these comprehensions get turned into a nested function. Yeah, well, and it turns out functions have overhead and can be expensive.
n inside of my example there. It isn’t actually in the local stack. This is why they were originally created using a nested function, because the internal nested functions namespacing could be used for scope. To change them to inline code, a bit of wizardry is required to properly deal with the variables, putting them on the stack before the comprehension and removing them just after, essentially mimicking this part of a function’s role, but removing the need to do a jump call.
06:33 This change is doubling the performance of comprehensions. Of course, that’s just the comprehension part, not your entire script, but if you have some heavy comps or you use them a lot, this is free speedup for your code.
The last change I’m going to talk about here is a Linux-specific feature. Linux comes with a tool called
perf, which is a profiler. It tracks most hardware events, as well as some software events in the OS.
With it, you can build call graphs, and there are a large number of tools out there that add additional functionality on top of
perf’s output. Prior to 3.12, if you run
perf on a Python program, you won’t see anything about Python, just the entry point to the interpreter and any underlying C code that gets called.
Python 3.12 has added hooks to interact with the
perf profiler. Doing this means Python calls can now be monitored as if they were native calls, and it makes it easier for you to do profiling throughout the entire code stack.
07:44 As I mentioned, this is a Linux-only feature, and it isn’t enabled by default. You have to set a shell variable to make it go. If you’re on Linux, and you want to learn more about this feature, this article has a deep dive for you.
Become a Member to join the conversation.