Faster, Faster, Faster
00:10 CPython is currently undergoing multi-release project that focuses on speeding up the interpreter. The improvements in 3.11 are quite significant, with the average speed in Python’s benchmark suite being a little over 1.2 times faster.
00:35 PEP 659 proposed a specializing adaptive interpreter. Just what does that mean? Well, it means that the interpreter is now dynamically adapting the instructions based on the code that is being run.
00:48 The proposal states that the intent is to specialize the code aggressively over a small region. Python is an interpreted language. Your script gets compiled into byte-code, which is executed by a runtime.
01:01 This is distinct from a purely compiled language, where the program is compiled into machine language. The advantage of an interpreted language is that it can be run on any platform where there is a runtime implemented, whereas a purely compiled language needs to be compiled specifically to a platform.
01:19 This is also why purely compiled languages tend to be faster than interpreted ones. There’s one less level of indirection. What PEP 659 proposed is having the interpreter watch what is being executed and modifying the byte-code on the fly, optimizing the choices.
01:46 The interpreter then watches what is being loaded and replaces the adaptive call with a more specific call. This examination of the attribute being loaded might discover that it’s loading an instance value or loading something from a module or loading something from a class’s slot.
02:05 This kind of optimization is only done for code that is called repeatedly, and typically gets triggered in loops. As programs can spend a lot of time in loops, this can make a difference in the execution time of that same loop.
dis() function in the
dis module, I can see what is involved in the function I just called. I’m far from an expert in the underlying interpreter, but you get the general idea of what’s going on.
Your computer has specialized hardware for doing floating-point calculations, and I’m guessing that this specialty operation takes advantage of that, improving your speed. Future calls to
feet_to_meters() should be faster. You may be wondering why it decided to change things when it did.
Another optimization is in the performance of code in
except blocks. This change reduces the amount of overhead in the case where an exception doesn’t fire. Java and C++ have similar mechanisms.
05:32 Using the table method, there is almost no work to be done if the exception doesn’t fire. This doesn’t mean that exceptions are free. They still have overhead to handle them, but as you generally code exceptions to be outside the happy path, this could mean a performance improvement for you.
05:50 Before this improvement, there was some memory overhead attached to each function call that is now no longer necessary. Removing it may cause some speed-up for function calls as a nice side effect.
That means if you run a script a second time without making any changes, the interpreter can skip the compilation step. The typical process when running a script that has contents in
__pycache__ is to read the cache, unmarshal the objects—that means to serialize them from their disk format into their memory format—and allocate memory on the heap for the objects and the code before executing the code. Certain modules in the interpreter are frozen.
06:43 This means they’re put into a state where most of these steps can be skipped. What Python 3.11 is doing is freezing more of the key modules. This freezing process means the code is statically allocated, resulting in the ability to load it directly, essentially combining those first three steps into one operation.
07:04 This change has resulted in a 10 to 15 percent improvement in interpreter loading times. This can be a big difference for small scripts, as Python’s startup is relatively expensive. For smaller scripts, a big chunk of execution time is the startup cost.
07:19 A 10 to 15 percent improvement in startup might mean a 10 percent improvement in your shorter scripts. But wait, there’s more! Trademark insert. There have been some improvements in how the frame that describes the function is created, as well as some other optimizations.
Recursive calls are now more efficient, the method that translates ASCII into Unicode is now order and execution, the
perm() functions in
math lib have been improved, and some optimizations have been done for regular expressions.
Become a Member to join the conversation.