Examining CPython Internals
00:00 In the previous lesson, I showed you t-strings. This lesson is about some of the internal changes to the CPython interpreter. For this lesson, understand that sometimes when you say Python, you mean the language, and sometimes you mean the program that you run that executes your code.
00:17 The most common implementation of the interpreter is called CPython, but it’s not the only one. If you download Python from python.org, it’s CPython that you’re getting.
00:27 Everything in this lesson is CPython-specific, and so it might be different if you’re using PyPy or other alternative interpreters.
00:37
Before Python 3.14, if you wanted to run a debugger with CPython, you needed to hook your code with a call to breakpoint() or set_trace().
00:45 This is problematic if you want to attach to already running code and makes debugging long-running processes like those in high-availability situations difficult.
00:54 Python 3.14 has added a mechanism for attaching to running processes. This is useful for debugging code without stopping it, but it also opens up a wide range of possibilities for examining running code.
01:07 It’ll be interesting to see what tools come out of this. For security reasons you might not want to allow this kind of hooking, so there are flags to turn it off, both at the interpreter level, as well as the build level, allowing you to compile a version of the interpreter that doesn’t support this feature if you’re safety conscious.
01:28 The global interpreter lock, known as the GIL to its frenemies, is an implementation detail of CPython that stops race conditions during memory management.
01:38 When the lock is engaged, only the code in the lock can run and everything else has to wait. This is good for memory management but bad for concurrency. Even if you’ve got threads, there’s a good chance they’ll just be waiting around for the lock to be released, meaning you don’t get all the concurrency that you want.
01:57 Free-threaded mode is an ongoing attempt to remove the GIL. In Python 3.13, it was there as experimental and not in the build by default. The experiment worked well enough and it’s now been included in some of the 3.14 builds.
02:13 It isn’t on by default though. You need to turn a flag on to get access to it. This work is ongoing both within CPython and within all the libraries out there to make sure their code can be run in this mode. Things are looking good, but the steering committee maintains the right to change its mind if problems pop up. Why might they change their minds?
02:34 Well, a lot of Python code is single-threaded and although free-threaded mode means better concurrency, the consequence of it so far is slightly slower performance in single-threaded mode.
02:45 At this point, it’s looking good, but more testing in the real world will be what finally sells it. For a long time now, CPython has allowed multiple interpreters to run simultaneously within a single process.
02:58 This was only available at the C API level though, so it was restricted to extension writers. Python 3.14 opens this up to everyone else.
03:09 If you’ve done any concurrent programming in Python, you’ve probably come across the pool executors for threads and multi-processing. This release sees a new one added for multi-interpreters.
03:20 Conceptually, this is similar to using the multiprocessing library as each interpreter has its own distinct state. Unlike with threads, memory sharing is an explicit mechanism, meaning you’re less likely to have race conditions.
03:33 This is slightly heavier weight than threads, but safer and lighter weight than multi-processing, so hopefully a best of both worlds kind of thing. A couple of versions back, the interpreter was even changed so that each subinterpreter got its own GIL, so you won’t even have that problem. On the downside, your options for sharing memory are a little limited, so modifying single-threaded code to use this can be a bit of work. And not all extensions are able to work with this, so you may find some libraries that can’t deal with it.
04:04 And since it’s kind of new, there’s still a bit of memory overhead as well as performance overhead that needs to be worked on. Developers are confident that this can be sorted out in future versions, though.
04:17 I hear you like interpreters in your interpreter. Well, how about another interpreter? C Python now has another interpreter built into it. The original interpreter has a giant if-then-else block for each of the operations in the bytecode language.
04:32 Actually, it’s a case statement if I’m being specific. This new interpreter uses a tail call mechanism instead, which is something that modern compilers are able to optimize better.
04:42 So far, this is limited to specific compilers and platforms, but the performance improvements look good, so I suspect there’ll be more to come with this in a future release.
04:54 A Just-In-Time compiler, or JIT, dynamically replaces bytecode with machine code, which in theory makes things faster. In practice, there’s overhead to do the compilation step, so it isn’t a performance boost in all situations.
05:09 A JIT was added as an experiment in 3.13, and like free-threading, it’s been included by default in 3.14, but this time only for macOS and Windows. It isn’t on by default.
05:22
You have to set a flag, and there’s an _jit module in the sys library that has calls allowing your code to know if you can JIT and if it’s on at the moment. If all that isn’t enough, there have been a bunch of performance improvements as well.
05:38
22 different modules have had their import times improved, including common ones like string, subprocess, tomllib, and threading. Garbage collection is the process of freeing up the memory of objects you’re no longer using.
05:52 In Python, the work is divided into generations. Measurements in the field have shown that certain generations aren’t actually doing much work, so it’s been reduced down to just two.
06:03 They’re called young and old. This has simplified things and changes the pause times between collection, making the process smoother.
06:12
Speaking of performance improvements, I nominate b16decode() in the base64 module for most improved. It’s a hefty 16 times speed up in some cases.
06:24 And if you’re using universally unique identifiers, well, versions 3, 4, and 5 have had performance improvements as well.
06:33 There are still a few odds and ends to cover in this Python 3.14 release. I’ll show you a bunch of small stuff next.
Become a Member to join the conversation.
