Threads in Python
00:00 In the previous lesson, I introduced you to the concept of concurrency and different patterns it can take. In this lesson, I’ll be talking about threads in Python.
00:10 As I showed you in the lesson on latency, most programs spend a lot of their time waiting for input and output. Threads allow you to time slice your computation. While one thread is waiting for input, another thread can continue to do processing work. Threads work within the Python interpreter, and therefore with the GIL.
00:29 Significant speed-up can be obtained if your software does a lot of disk or network activity.
00:37 All of the software that I demonstrate in this course is available in the supporting materials dropdown if you want to follow along. In order to demonstrate the difference that threading can make, I need something to compare against, so I’m going to start with a synchronous version of a small program.
00:53
This program pulls down two different web pages many, many times. Line 14 is where you’ll find the key entry point to the code. This function, called download_all_sites()
, takes a list of sites, looks for each of the URLs in the list, and calls the download_site()
function.
01:12
The download_site()
function, defined on line 8, gets a session from requests
, which is the library I’m using to download web pages. On line 10, it fetches the content. In order to provide a little bit of clarity about what it’s doing, I’m printing out either a "J"
or an "R"
, depending on which website is being read.
01:32
The definition of the get_session()
function is a little bit overkill in this case. You probably wouldn’t do it this way in reality, but it’s necessary for the threading
library, so to keep the code consistent I’ve done it this way.
01:45 Let me scroll down so that you can see how this program is called.
01:50
The list in line 21 is made up of 80 copies of the two different websites that the requests
library is going to fetch. Line 26 tells you that it’s starting. Line 27 starts a timer. Line 28 is the meat, where it actually downloads all of the sites inside of the sites
list. Line 29 calculates how long it took for this to run. And then line 30 prints out some statistics. Let’s see this program in action.
02:21 There are 160 URLs being downloaded, 80 from Jython and 80 from Real Python.
02:27
The J
and R
indicate when the Jython site or the Real Python site is being downloaded from. This synchronous program is alternating between the two sites.
02:37 The end result is 160 sites were run in about 14 seconds. I’ve run this program several different times. The wait time for it varies wildly, and a lot of that depends on how quickly the sites respond and how quickly the network interface on my computer decides to respond.
02:55 I’ve seen times for this program as much as triple this execution.
03:02
And now for a threaded version. First off, you’re going to need two more imports. Both are part of the standard library. Line 2 introduces concurrent.futures
and line 4, the threading
library. Line 7 sets up the local environment for each of the threads. I’ll describe more of this later.
03:22
The download_site()
method on line 15 hasn’t changed. It’s the same as before. But in order to use the threading
library, get_session()
has to be a little different. Line 9 defines the function get_session()
. Inside of this function, line 11 gets the Session
from the requests
library, but it only does this if thread_local
has not been created before. The combination of the use of the thread environment in line 7 and the assignment of the requests.Session
to that environment in line 11 allow you to change the number of threads in the program and not break anything.
04:00
This ensures that there’s only one requests.Session
per thread. Now let’s see the download_all_sites()
method. It’s changed a little.
04:10
The concurrent.futures
library includes a class called ThreadPoolExecutor
. This is what determines how many threads there are.
04:19
You can instantiate this as a context manager using the with
key statement. And then the executor
has a .map()
method mapping a function to some data.
04:30
Each of the URLs in the sites
listing gets mapped to a function and the thread executor determines when that function is called for which thread. Varying the number of max_workers
in the execution definition will change how many threads are active at the same time. As a function finishes, the thread will be put back into the pool and the executor will then assign the next piece of data to the next available thread.
04:59 Let me scroll down to show you the calling.
05:04 This is no different. So, with some minor modifications to this script, I’ve changed it from being synchronous to threaded. Now let me show you this in action.
05:18 Wow. That’s significantly faster than before—almost ten times. To be honest, this is kind of lucky. That’s one of the best times I’ve seen. Let me try it again, just to show you.
05:38
Not as good this time. 7 and a half seconds is still impressive though. That’s almost doubling the execution from the synchronous program. One thing to notice here is the patterns of the J
and R
. In the synchronous program, it was always J
then R
, J
then R
. In this program, it isn’t, and that’s because the threads are waiting different amounts of time. As Jython or Real Python is more or less responsive, the threads are executing at different rates. At any given time the executor makes sure that only 5
of them are running, but the order that the download_site()
function finishes in is going to be dependent on the network and the server on the other end.
06:21
The threaded version of the program was using the N-workers pattern that I introduced in the previous lesson. download_all_sites()
is the producer.
06:30
It is what manages the list of sites that need to be done. The download_site()
function acts as the worker and, in this case, concurrent.futures
is dictating that there are 5
workers.
06:43 And then finally, the executor acts as a collection point. It waits until all of the threads are finished, and once they are, the program continues as before once the pool passes execution on.
06:55
In this case, the print('Downloaded')
gets called. To be picky about it, this program technically doesn’t have a consumer. The download_site()
function is throwing out the data and not really doing any computation, so there was nothing to be passed on to the consumer.
07:10 There’s just a collection point where the synchronous program resumes.
07:16 In the previous lesson when I described the GIL for you, I mentioned race conditions. These are something that you have to be very careful with inside of threads.
07:25
The thread library acts inside of the Python interpreter. All of the memory is shared across all of the threads. Consider a case where there are two threads using a single requests.Session
object.
07:38 Thread 1 starts downloading from Jython but then gets interrupted. Thread 2 then starts downloading from Real Python but the session object from the first thread wasn’t finished.
07:51
This is going to cause the requests
library to fail. One solution to this is to use a low-level mechanism called locking. You manage your resources and lock them so that only one thread can use a resource at a time.
08:04 This kind of locking is exactly what the GIL is for, but it’s at the global level inside of the interpreter. Your code has the same problem. Fortunately, Python comes with a library method that makes this easier.
08:17
This is the threading.local()
method that you saw on line 7 of the code. It looks like a global variable, but it isn’t. The threading
library is creating a locked space for your objects that are created once per thread. In the get_session()
method, a new requests.Session
object was created inside of this threading.local()
space.
08:39
This guaranteed that each thread got its own requests.Session
object, and also means that you don’t end up with 160 requests.Session
objects for your 160 URLs.
08:52
In the example code I showed you, the max number of workers was set to 5
. There were only five threads happening at a time. This was done on purpose.
09:02 Although you’re downloading 160 URLs, you probably don’t want 160 threads. There’s overhead for creating threads. There’s also overhead for switching between the threads.
09:15 If you have too many threads, that means your code spends all of its time managing the threads. So, how do you know how many threads to have? Well, unfortunately it’s not an easy answer and it’s going to be dependent on how I/O-bound each of your threads are, so you may want to experiment a little bit based on your program.
09:34 An extremely common pattern in GUI software is for there to be a thread for the GUI itself, and another thread for execution and behind. This ensures that the GUI is always responsive to the user and any expensive computation is done on a separate thread.
09:52
If you’re coming from another programming language or you’ve seen the Python threading mechanisms before, you might be wondering about the primitives. The Python threading
library also supports the typical thread primitives: .start()
, .join()
, and Queue
. .start()
is responsible for creating the threads and calling the appropriate functions, .join()
is the point in the program that waits for all the threads to finish, and Queue
is a thread-safe mechanism for communicating between threads. Python has these primitives, but introduced the concurrent.futures
library in order to minimize the amount of code that you have to write when managing threads.
10:30
As you saw in the sample program, an Executor
is responsible for managing threads. It maps data to a function and then maps those functions to the threads managing pools, abstracting away the complexities caused by .start()
, .join()
, and Queue
. This library was first introduced in Python 3.2, so if you’re using something older than that, you’ll have to stick with the basic primitives. But if you’re using 3.2 and later, you’re better off looking at the futures
library.
11:00 So, that’s threading in Python! Next up, I’m going to show you what can cause race conditions and how they’re problematic.
msarabi95 on April 23, 2021
I’m wondering what the threaded version of the program would look like if we use threading primitives instead of the executor. How would the download_all_sites
function change?
Christopher Trudeau RP Team on April 23, 2021
Without the executor, you’d need to write code that creates an individual thread (or 5 of them if you were doing the exact same thing) and then start the thread. You’d also need to do a join on all of them afterwards.
The key part would be how you distribute the data amongst those threads – you could just put a thread constructor in a loop, creating 5 instances, but you’d be missing the mapping to the subset of sites. You likely would slice the list of sites, giving the first thread 1/5 of the work.
To see code examples without the executor, take a look at the following article:
andrewodrain on May 5, 2023
I always come out of your lectures with an extremely deep understanding of the topics you present. After seeing concurrency in action, I think I am addicted. Excellent work Christopher! Thank You!
Christopher Trudeau RP Team on May 6, 2023
Glad you’re finding it useful Andrew. Happy coding!
Tony Ngok on Feb. 6, 2024
- I’ve learnt later in the course that asyncio only uses 1 CPU. So, is it why asyncio is better than thread for I/O bound programming?
- Also, is it that threading uses multiple CPUs (i.e., n CPUs for 2n threads)?
Bartosz Zaczyński RP Team on Feb. 6, 2024
@tonyn1999 The async
/await
approach and threading are two alternative paradigms in concurrent programming, both of which have their pros and cons, so you can’t say that one is always better than the other. That said, asynchronous programming is generally more scalable for I/O-bound tasks thanks to the cheap cost of context switching compared to threads.
The major downside of asyncio
in Python is that not every library provides an asynchronous API. Also, mixing synchronous and asynchronous code can be challenging. If you miss a single blocking code, then that will affect your entire program, bringing it to a complete halt. Finally, getting used to the asynchronous paradigm takes time since it has a steep learning curve.
Threads in Python can use multiple CPU cores but not simultaneously because of the global interpreter lock (GIL), which ensures that only one thread runs at a time. However, there are clever ways to bypass the GIL for true parallelism.
Christopher Trudeau RP Team on Feb. 7, 2024
Hi @tony,
Bartosz has been doing a great job answering questions, but I thought I’d stick my $0.02 in here as well.
Threading and multi-processing has a long history in computing and in most languages you’ll find multiple ways of attacking the same problems.
Way back in olden times, multi-processing was the only way of doing concurrency. But it has a lot of overhead, as you’re essentially keeping two copies of the program active at a time. Threads were invented as a lighter weight solution. Both threading and multi-processing were first offered as features of the operating system.
Programming languages then provided interfaces to those operating system features. But, as cross-platform languages started popping up, some of them decided they didn’t want to have to deal with the OS and invented their own kind of threading, known as “Green Threads”. The idea is the same, but now the programming language is responsible for when the threads switch, rather than it being the OS’s scheduler.
As Python has been around for a while, it has flavours of all these things. The async/await really is just another way of tackling threads. At the most abstract level, it is no different than using threads. But, because it is built into the language there are optimizations there which may mean it is better in certain cases.
Threading and async/await are both IO bound currency solutions.
Multi-processing, as its name implies, uses multiple processors and so works in the CPU-bound case. Of course, to make things more complicated, there is nothing stopping a thread from using the multi-processing library, or a MP code from spinning up threads.
As with all things “performance” in computing, the first thing you should do if you’re trying to make it faster is measure it. Figure out what the bottlenecks are and address those with the appropriate tools.
Thankfully, threads, async, and MP, all use very similar mechanisms, so it doesn’t take much to swap them out and see which gives you the most improvement.
Hope that was worth $0.02 :)
Become a Member to join the conversation.
Polo on Dec. 25, 2020
I would recommend to attach a short code example in this section.