Threads in Python
00:00 In the previous lesson, I introduced you to the concept of concurrency and different patterns it can take. In this lesson, I’ll be talking about threads in Python.
00:10 As I showed you in the lesson on latency, most programs spend a lot of their time waiting for input and output. Threads allow you to time slice your computation. While one thread is waiting for input, another thread can continue to do processing work. Threads work within the Python interpreter, and therefore with the GIL.
00:29 Significant speed-up can be obtained if your software does a lot of disk or network activity.
00:37 All of the software that I demonstrate in this course is available in the supporting materials dropdown if you want to follow along. In order to demonstrate the difference that threading can make, I need something to compare against, so I’m going to start with a synchronous version of a small program.
This program pulls down two different web pages many, many times. Line 14 is where you’ll find the key entry point to the code. This function, called
download_all_sites(), takes a list of sites, looks for each of the URLs in the list, and calls the
download_site() function, defined on line 8, gets a session from
requests, which is the library I’m using to download web pages. On line 10, it fetches the content. In order to provide a little bit of clarity about what it’s doing, I’m printing out either a
"J" or an
"R", depending on which website is being read.
The definition of the
get_session() function is a little bit overkill in this case. You probably wouldn’t do it this way in reality, but it’s necessary for the
threading library, so to keep the code consistent I’ve done it this way.
01:45 Let me scroll down so that you can see how this program is called.
The list in line 21 is made up of 80 copies of the two different websites that the
requests library is going to fetch. Line 26 tells you that it’s starting. Line 27 starts a timer. Line 28 is the meat, where it actually downloads all of the sites inside of the
sites list. Line 29 calculates how long it took for this to run. And then line 30 prints out some statistics. Let’s see this program in action.
02:21 There are 160 URLs being downloaded, 80 from Jython and 80 from Real Python.
R indicate when the Jython site or the Real Python site is being downloaded from. This synchronous program is alternating between the two sites.
02:37 The end result is 160 sites were run in about 14 seconds. I’ve run this program several different times. The wait time for it varies wildly, and a lot of that depends on how quickly the sites respond and how quickly the network interface on my computer decides to respond.
02:55 I’ve seen times for this program as much as triple this execution.
And now for a threaded version. First off, you’re going to need two more imports. Both are part of the standard library. Line 2 introduces
concurrent.futures and line 4, the
threading library. Line 7 sets up the local environment for each of the threads. I’ll describe more of this later.
download_site() method on line 15 hasn’t changed. It’s the same as before. But in order to use the
get_session() has to be a little different. Line 9 defines the function
get_session(). Inside of this function, line 11 gets the
Session from the
requests library, but it only does this if
thread_local has not been created before. The combination of the use of the thread environment in line 7 and the assignment of the
requests.Session to that environment in line 11 allow you to change the number of threads in the program and not break anything.
This ensures that there’s only one
requests.Session per thread. Now let’s see the
download_all_sites() method. It’s changed a little.
concurrent.futures library includes a class called
ThreadPoolExecutor. This is what determines how many threads there are.
You can instantiate this as a context manager using the
with key statement. And then the
executor has a
.map() method mapping a function to some data.
Each of the URLs in the
sites listing gets mapped to a function and the thread executor determines when that function is called for which thread. Varying the number of
max_workers in the execution definition will change how many threads are active at the same time. As a function finishes, the thread will be put back into the pool and the executor will then assign the next piece of data to the next available thread.
04:59 Let me scroll down to show you the calling.
05:04 This is no different. So, with some minor modifications to this script, I’ve changed it from being synchronous to threaded. Now let me show you this in action.
05:18 Wow. That’s significantly faster than before—almost ten times. To be honest, this is kind of lucky. That’s one of the best times I’ve seen. Let me try it again, just to show you.
Not as good this time. 7 and a half seconds is still impressive though. That’s almost doubling the execution from the synchronous program. One thing to notice here is the patterns of the
R. In the synchronous program, it was always
R. In this program, it isn’t, and that’s because the threads are waiting different amounts of time. As Jython or Real Python is more or less responsive, the threads are executing at different rates. At any given time the executor makes sure that only
5 of them are running, but the order that the
download_site() function finishes in is going to be dependent on the network and the server on the other end.
The threaded version of the program was using the N-workers pattern that I introduced in the previous lesson.
download_all_sites() is the producer.
It is what manages the list of sites that need to be done. The
download_site() function acts as the worker and, in this case,
concurrent.futures is dictating that there are
06:43 And then finally, the executor acts as a collection point. It waits until all of the threads are finished, and once they are, the program continues as before once the pool passes execution on.
In this case, the
print('Downloaded') gets called. To be picky about it, this program technically doesn’t have a consumer. The
download_site() function is throwing out the data and not really doing any computation, so there was nothing to be passed on to the consumer.
07:10 There’s just a collection point where the synchronous program resumes.
07:16 In the previous lesson when I described the GIL for you, I mentioned race conditions. These are something that you have to be very careful with inside of threads.
The thread library acts inside of the Python interpreter. All of the memory is shared across all of the threads. Consider a case where there are two threads using a single
07:38 Thread 1 starts downloading from Jython but then gets interrupted. Thread 2 then starts downloading from Real Python but the session object from the first thread wasn’t finished.
This is going to cause the
requests library to fail. One solution to this is to use a low-level mechanism called locking. You manage your resources and lock them so that only one thread can use a resource at a time.
08:04 This kind of locking is exactly what the GIL is for, but it’s at the global level inside of the interpreter. Your code has the same problem. Fortunately, Python comes with a library method that makes this easier.
This is the
threading.local() method that you saw on line 7 of the code. It looks like a global variable, but it isn’t. The
threading library is creating a locked space for your objects that are created once per thread. In the
get_session() method, a new
requests.Session object was created inside of this
This guaranteed that each thread got its own
requests.Session object, and also means that you don’t end up with 160
requests.Session objects for your 160 URLs.
In the example code I showed you, the max number of workers was set to
5. There were only five threads happening at a time. This was done on purpose.
09:02 Although you’re downloading 160 URLs, you probably don’t want 160 threads. There’s overhead for creating threads. There’s also overhead for switching between the threads.
09:15 If you have too many threads, that means your code spends all of its time managing the threads. So, how do you know how many threads to have? Well, unfortunately it’s not an easy answer and it’s going to be dependent on how I/O-bound each of your threads are, so you may want to experiment a little bit based on your program.
09:34 An extremely common pattern in GUI software is for there to be a thread for the GUI itself, and another thread for execution and behind. This ensures that the GUI is always responsive to the user and any expensive computation is done on a separate thread.
If you’re coming from another programming language or you’ve seen the Python threading mechanisms before, you might be wondering about the primitives. The Python
threading library also supports the typical thread primitives:
.start() is responsible for creating the threads and calling the appropriate functions,
.join() is the point in the program that waits for all the threads to finish, and
Queue is a thread-safe mechanism for communicating between threads. Python has these primitives, but introduced the
concurrent.futures library in order to minimize the amount of code that you have to write when managing threads.
As you saw in the sample program, an
Executor is responsible for managing threads. It maps data to a function and then maps those functions to the threads managing pools, abstracting away the complexities caused by
Queue. This library was first introduced in Python 3.2, so if you’re using something older than that, you’ll have to stick with the basic primitives. But if you’re using 3.2 and later, you’re better off looking at the
11:00 So, that’s threading in Python! Next up, I’m going to show you what can cause race conditions and how they’re problematic.
I’m wondering what the threaded version of the program would look like if we use threading primitives instead of the executor. How would the
download_all_sites function change?
Without the executor, you’d need to write code that creates an individual thread (or 5 of them if you were doing the exact same thing) and then start the thread. You’d also need to do a join on all of them afterwards.
The key part would be how you distribute the data amongst those threads – you could just put a thread constructor in a loop, creating 5 instances, but you’d be missing the mapping to the subset of sites. You likely would slice the list of sites, giving the first thread 1/5 of the work.
To see code examples without the executor, take a look at the following article:
Become a Member to join the conversation.
Polo on Dec. 25, 2020
I would recommend to attach a short code example in this section.