Faster, Faster, Faster
00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’ll be talking about the things that have changed in Python 3.11 to make it faster.
00:10 CPython is currently undergoing multi-release project that focuses on speeding up the interpreter. The improvements in 3.11 are quite significant, with the average speed in Python’s benchmark suite being a little over 1.2 times faster.
00:25 As with all speed changes, you may or may not notice the differences depending on your use cases. Let’s take a look at how some of this speed was achieved.
00:35 PEP 659 proposed a specializing adaptive interpreter. Just what does that mean? Well, it means that the interpreter is now dynamically adapting the instructions based on the code that is being run.
00:48 The proposal states that the intent is to specialize the code aggressively over a small region. Python is an interpreted language. Your script gets compiled into byte-code, which is executed by a runtime.
01:01 This is distinct from a purely compiled language, where the program is compiled into machine language. The advantage of an interpreted language is that it can be run on any platform where there is a runtime implemented, whereas a purely compiled language needs to be compiled specifically to a platform.
01:19 This is also why purely compiled languages tend to be faster than interpreted ones. There’s one less level of indirection. What PEP 659 proposed is having the interpreter watch what is being executed and modifying the byte-code on the fly, optimizing the choices.
01:37
Consider, for example, the byte-code operation LOAD_ATTR
, which is responsible for loading attributes. It can be replaced with LOAD_ATTR_ADAPTIVE
.
01:46 The interpreter then watches what is being loaded and replaces the adaptive call with a more specific call. This examination of the attribute being loaded might discover that it’s loading an instance value or loading something from a module or loading something from a class’s slot.
02:05 This kind of optimization is only done for code that is called repeatedly, and typically gets triggered in loops. As programs can spend a lot of time in loops, this can make a difference in the execution time of that same loop.
02:19 Let’s go take a look at this in practice.
02:23 I’m inside of the Python 3.11 REPL, and I’m going to write a small function that converts between the imperial measurement of feet and the metric measurement of meters.
02:40 Nothing complicated here, there are roughly three feet in a meter. This multiplication does the exact conversion. Now I’m going to run this in a loop seven times …
03:12
The dis
module allows you to disassemble Python code and look at the corresponding byte-code generated by the compiler.
03:25
Using the dis()
function in the dis
module, I can see what is involved in the function I just called. I’m far from an expert in the underlying interpreter, but you get the general idea of what’s going on.
03:37 A constant and a variable are loaded, and a multiplication operation is called on them. The result is then returned. Now let me do one more conversion.
03:55
This is similar to what I was doing in the loop above. Let’s look at dis()
again.
04:06 Notice that it has changed. The adaptive interpreter mechanism has seen that the operation is between two floats and changed the code to be float-multiplication specific.
04:17
Your computer has specialized hardware for doing floating-point calculations, and I’m guessing that this specialty operation takes advantage of that, improving your speed. Future calls to feet_to_meters()
should be faster. You may be wondering why it decided to change things when it did.
04:34 The adaptive mechanism triggers after repeated use of the same code. For this case, it clicked in on the eighth execution, so on the eighth call to the function, the byte-code was adapted.
04:45 A variety of byte calls have been adjusted to be adaptive, and more may get added in the future.
04:54
Another optimization is in the performance of code in try
… except
blocks. This change reduces the amount of overhead in the case where an exception doesn’t fire. Java and C++ have similar mechanisms.
05:09
Good artists copy, great artists steal. The underlying compiler now is generating a table for all the code blocks inside of try
… except
situations.
05:20
That table contains references to the code to be run if an exception fires. Previously, this was done explicitly in the stack, and work was done for each try
… except
case.
05:32 Using the table method, there is almost no work to be done if the exception doesn’t fire. This doesn’t mean that exceptions are free. They still have overhead to handle them, but as you generally code exceptions to be outside the happy path, this could mean a performance improvement for you.
05:50 Before this improvement, there was some memory overhead attached to each function call that is now no longer necessary. Removing it may cause some speed-up for function calls as a nice side effect.
06:04
Remember all that byte-code stuff I was just talking about? Well, creating it takes effort. So Python caches the results in a __pycache__
directory.
06:13
That means if you run a script a second time without making any changes, the interpreter can skip the compilation step. The typical process when running a script that has contents in __pycache__
is to read the cache, unmarshal the objects—that means to serialize them from their disk format into their memory format—and allocate memory on the heap for the objects and the code before executing the code. Certain modules in the interpreter are frozen.
06:43 This means they’re put into a state where most of these steps can be skipped. What Python 3.11 is doing is freezing more of the key modules. This freezing process means the code is statically allocated, resulting in the ability to load it directly, essentially combining those first three steps into one operation.
07:04 This change has resulted in a 10 to 15 percent improvement in interpreter loading times. This can be a big difference for small scripts, as Python’s startup is relatively expensive. For smaller scripts, a big chunk of execution time is the startup cost.
07:19 A 10 to 15 percent improvement in startup might mean a 10 percent improvement in your shorter scripts. But wait, there’s more! Trademark insert. There have been some improvements in how the frame that describes the function is created, as well as some other optimizations.
07:38
Recursive calls are now more efficient, the method that translates ASCII into Unicode is now order and execution, the comb()
and perm()
functions in math
lib have been improved, and some optimizations have been done for regular expressions.
07:56 Well, that was fast. See what I did there? Next up, how tracebacks give finer-grained information when something goes wrong.
Become a Member to join the conversation.