Thread Pool
In this lesson, you’ll refactor your previous code by using a thread pool executor from the concurrent.futures
module. If you download the sample code, you can get your own copy of 07-thread_pool.py
:
To learn more, you can also check out the documentation for concurrent.futures.ThreadPoolExecutor
and concurrent.futures.Executor.map
.
00:00
In the previous video, you created three functions and you executed them on three separate threads and you had to write code that was very repetitive. You also saw how some threads finished in an order that you wouldn’t necessarily expect. In this video, we’re going to try to reduce the amount of code we have to write and use something called a ThreadPoolExecutor
to manage the starting and the joining of threads.
00:27
We’re going to import the concurrent.futures
package. And let’s import time
as well.
00:36
And we’ll define myfunc()
again—same function, pretty much, we’re just going to print f'myfunc started with {name}'
, and we’ll sleep()
for 10
seconds,
00:51
and then we’ll print f'myfunc ended with {name}'
.
00:57 Let’s create our entry point,
01:01
if __name__ == '__main__':
. Now in here, this is where we’re going to create our thread pool. We’re going to use this context manager pattern here using the with
keyword, so with concurrent.futures.ThreadPoolExecutor()
.
01:25
And we have to give this a name, so we’ll say as
—I’m just going to call mine e
. So, e
for executor. And in here we can pass it a parameter called max_workers
, and this is the number of threads in the pool we want to create.
01:43
Let’s create 3
, just as we did in the previous lesson—we created three extra threads. And now we have this block here. Within the context of the ThreadPoolExecutor
, we want to execute three threads on myfunc()
, and we want to pass in a different word for name
on each one.
02:04
The way we do that is by using the .map()
function. .map()
will map arguments to a function or a callable—in our case, myfunc
—and then the arguments we want to pass for each thread are passed in here in a list. For the first one, let’s do 'foo'
,
02:27
and then the second thread will be 'bar'
, and 'baz'
. So this iterable here refers to the arguments that are passed to myfunc()
in each thread.
02:40
Notice how we don’t have to do t.start()
and t.join()
—we’re just going to let the ThreadPoolExecutor
manage the starting and the joining of the threads for us.
02:51
Let’s go ahead and print 'main begins'
to show when our main thread starts, and then we’ll do 'main ended'
. And just to flash back in our previous code here in our main thread of the last lesson—look at all this stuff that we can now delete by using the ThreadPoolExecutor
.
03:14
Let’s go ahead and drop down into the terminal and we’ll say python
execute 07-thread_pool
. We have main begins
and then we’ve fired off three threads—there are 3
max_workers
in our thread pool, and each one has a different argument using this .map()
function.
03:34
We see that it started foo
, bar
, baz
, and then it slept for 10 seconds in each thread, and then it returned foo
, bar
, baz
. And then we see main ended
down here.
03:46 This shows a very nice way of using multi-threading without writing repetitive code.
Lee RP Team on Jan. 12, 2020
Hey @nightfury, that’s cool :) I imagine there is a limit of number of threads per process the OS can create. Looks like its 2048 on yours. I just ran it on my machine and it stopped at 4096.
Ahmed on April 16, 2020
Hello Lee,
I am trying to make my code run faster. I have 6000 product sku’s in a file and I make API call for each product. This is taking about 20-30 mins to finish all the API calls, can you recommend me a faster way to do this. Below is my code.
import os
import csv
import json
import time
import requests
import concurrent.futures
product = []
parent = "ProductFiles"
filename3 = os.path.join(parent, 'product_info.csv')
file3 = open(filename3)
wrapper = csv.reader(file3)
for row in wrapper:
product.append(row[0])
def get_product_info():
product_response = {}
for row in product[1:]:
url = url
response = requests.request("GET", url)
response = json.loads(response.text)
if product_response is None:
product_response[row] = response
else:
product_response.update({row:response})
print(product_response)
if __name__ == '__main__':
start = time.time()
with concurrent.futures.ThreadPoolExecutor()as e:
for i in product:
e.map(get_product_info(), i)
end = time.time()
print("Total Time: "+str(end - start))
Few questions:
1.If I don’t mention any number in ThreadPoolExecutore then how many threads does it start or what’s the default number of threads that runs?
- How do i return values from all the threads and gather them or append in same variable ?
khurram703 on July 19, 2020
for me, i just checked that how many threads i can create and the # reached up to 500000 without any issue. Is it possible Mr. Lee?
nicklausbrown on June 18, 2021
Manish Sharma on Oct. 5, 2023
How to decide how many workers we should spin up in an application? I can’t just create an arbitrary number like 1000-2000 threads because other parts of the application are still running. This might cause CPU overusage for a small part of the application.
Bartosz Zaczyński RP Team on Oct. 8, 2023
@Manish Sharma It depends on the type of task at hand. For I/O-bound tasks, it’s not uncommon for modern operating systems to handle thousands of threads that correspond to concurrent connections. On the other hand, for CPU-bound tasks, you’re practically limited to the number of logical cores in your CPU.
Kartik Pidurkar on May 12, 2024
When I tried executing the same code , the order was not the same. Below was the output:
main thread started myfunc started with realpython myfunc started with woo myfunc started with foo myfunc ended with realpythonmyfunc ended with foo myfunc ended with woo
main thread ended
Become a Member to join the conversation.
nightfury on Dec. 26, 2019
Hi,
Would this be a cool way to find the number of threads supported by an OS ? Out of my curiosity I modified the code a bit to see if it crashes beyond certain point
On my system (running mac OS) , the code crashes beyond max_workers=2048 with the following error
Any comments on whats happening here ?