Thread Pool

Threading in Python Lee Gaines 03:55

In this lesson, you’ll refactor your previous code by using a thread pool executor from the concurrent.futures module. If you download the sample code, you can get your own copy of 07-thread_pool.py:

Download

Sample Code (.zip)

12.9 KB

To learn more, you can also check out the documentation for concurrent.futures.ThreadPoolExecutor and concurrent.futures.Executor.map.

00:00 In the previous video, you created three functions and you executed them on three separate threads and you had to write code that was very repetitive. You also saw how some threads finished in an order that you wouldn’t necessarily expect. In this video, we’re going to try to reduce the amount of code we have to write and use something called a ThreadPoolExecutor to manage the starting and the joining of threads.

00:27 We’re going to import the concurrent.futures package. And let’s import time as well.

00:36 And we’ll define myfunc() again—same function, pretty much, we’re just going to print f'myfunc started with {name}', and we’ll sleep() for 10 seconds,

00:51 and then we’ll print f'myfunc ended with {name}'.

00:57 Let’s create our entry point,

01:01 if __name__ == '__main__':. Now in here, this is where we’re going to create our thread pool. We’re going to use this context manager pattern here using the with keyword, so with concurrent.futures.ThreadPoolExecutor().

01:25 And we have to give this a name, so we’ll say as—I’m just going to call mine e. So, e for executor. And in here we can pass it a parameter called max_workers, and this is the number of threads in the pool we want to create.

01:43 Let’s create 3, just as we did in the previous lesson—we created three extra threads. And now we have this block here. Within the context of the ThreadPoolExecutor, we want to execute three threads on myfunc(), and we want to pass in a different word for name on each one.

02:04 The way we do that is by using the .map() function. .map() will map arguments to a function or a callable—in our case, myfunc—and then the arguments we want to pass for each thread are passed in here in a list. For the first one, let’s do 'foo',

02:27 and then the second thread will be 'bar', and 'baz'. So this iterable here refers to the arguments that are passed to myfunc() in each thread.

02:40 Notice how we don’t have to do t.start() and t.join()—we’re just going to let the ThreadPoolExecutor manage the starting and the joining of the threads for us.

02:51 Let’s go ahead and print 'main begins' to show when our main thread starts, and then we’ll do 'main ended'. And just to flash back in our previous code here in our main thread of the last lesson—look at all this stuff that we can now delete by using the ThreadPoolExecutor.

03:14 Let’s go ahead and drop down into the terminal and we’ll say python execute 07-thread_pool. We have main begins and then we’ve fired off three threads—there are 3 max_workers in our thread pool, and each one has a different argument using this .map() function.

03:34 We see that it started foo, bar, baz, and then it slept for 10 seconds in each thread, and then it returned foo, bar, baz. And then we see main ended down here.

03:46 This shows a very nice way of using multi-threading without writing repetitive code.

nightfury on Dec. 26, 2019

Hi,

Would this be a cool way to find the number of threads supported by an OS ? Out of my curiosity I modified the code a bit to see if it crashes beyond certain point

import threading
import time
import concurrent.futures

def my_func(name):
    print(f'my_func started with {name}')
    time.sleep(5)
    print(f'my_func ended with {name}')

if __name__ == '__main__':
    max_workers = 5000
    print('Main started')

    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as e:
        arg_list = ['func_' + str(i+1) for i in range(0,max_workers)]
        e.map(my_func,arg_list)

    print('Main ended')

On my system (running mac OS) , the code crashes beyond max_workers=2048 with the following error

RuntimeError: can't start new thread

Any comments on whats happening here ?

Lee RP Team on Jan. 12, 2020

Hey @nightfury, that’s cool :) I imagine there is a limit of number of threads per process the OS can create. Looks like its 2048 on yours. I just ran it on my machine and it stopped at 4096.

Ahmed on April 16, 2020

Hello Lee,

I am trying to make my code run faster. I have 6000 product sku’s in a file and I make API call for each product. This is taking about 20-30 mins to finish all the API calls, can you recommend me a faster way to do this. Below is my code.

import os
import csv
import json
import time
import requests
import concurrent.futures


product = []
parent = "ProductFiles"
filename3 = os.path.join(parent, 'product_info.csv')
file3 = open(filename3)
wrapper = csv.reader(file3)
for row in wrapper:
    product.append(row[0])


def get_product_info():
    product_response = {}

    for row in product[1:]:
        url = url

        response = requests.request("GET", url)
        response = json.loads(response.text)

        if product_response is None:
            product_response[row] = response
        else:
            product_response.update({row:response})

    print(product_response)


if __name__ == '__main__':
    start = time.time()
    with concurrent.futures.ThreadPoolExecutor()as e:
        for i in product:
            e.map(get_product_info(), i)

    end = time.time()
    print("Total Time: "+str(end - start))

Few questions:

1.If I don’t mention any number in ThreadPoolExecutore then how many threads does it start or what’s the default number of threads that runs?

How do i return values from all the threads and gather them or append in same variable ?

khurram703 on July 19, 2020

for me, i just checked that how many threads i can create and the # reached up to 500000 without any issue. Is it possible Mr. Lee?

nicklausbrown on June 18, 2021

@Ahmed, while it is unlikely you will see this, it seems your best approach would be to use async python vs. threads and aoihttp to make many concurrent calls to the remote API. Real python has a great course and article on the many ways to do concurrency in python.

Manish Sharma on Oct. 5, 2023

How to decide how many workers we should spin up in an application? I can’t just create an arbitrary number like 1000-2000 threads because other parts of the application are still running. This might cause CPU overusage for a small part of the application.

Bartosz Zaczyński RP Team on Oct. 8, 2023

@Manish Sharma It depends on the type of task at hand. For I/O-bound tasks, it’s not uncommon for modern operating systems to handle thousands of threads that correspond to concurrent connections. On the other hand, for CPU-bound tasks, you’re practically limited to the number of logical cores in your CPU.

Kartik Pidurkar on May 12, 2024

When I tried executing the same code , the order was not the same. Below was the output:

main thread started myfunc started with realpython myfunc started with woo myfunc started with foo myfunc ended with realpythonmyfunc ended with foo myfunc ended with woo

main thread ended

Become a Member to join the conversation.