Parallel Processing With concurrent.futures: Overview
In this section, you’ll learn how to do multithreading and parallel programming in Python using functional programming principles and the concurrent.futures
module.
You’ll take the example data set based on an immutable data structure that you previously transformed using the built-in map()
function. But this time, you’ll process the data in parallel, across multiple threads using Python 3’s concurrent.futures
module, which is available in the standard library.
You’ll see, step by step, how to parallelize an existing piece of Python code so that it can execute much faster and leverage all of your available CPU cores and computing power. You’ll learn how to use the ProcessPoolExecutor
and ThreadPoolExecutor
classes and their parallel map
implementations that make parallelizing most Python code written in a functional style a breeze.
By knowing the difference between both executors available in the concurrent.futures
module, you’ll be able to parallelize your Python functions across multiple threads and across multiple processes. You’ll get a brief introduction to the Python Global Interpreter Lock, also known as the GIL, and see how you can work around its limitations by using the correct executor implementation.
Once again, you’ll use your little testbed program from the last video to measure the execution time with the time.time()
function. This allows you to compare the single-threaded and multithreaded implementations of the same algorithm.
00:00
Hey there and welcome to another video in my Functional Programming in Python series. In the last video, you saw how to take a piece of code that used the built-in map()
function and to refactor it so that works in a parallel processing fashion, so it gets executed in parallel, processing multiple records at the same time. That can lead to huge speedups in the execution time.
00:24
We did that using the multiprocessing
module that’s available in Python 2 and Python 3. Now, I already hinted at this in the previous video, or towards the end of the previous video—that there’s other ways to implement parallel processing using a functional programming style in Python. So, what I want to talk about in this video—what I want to show you in this video—is how to use the concurrent
module that’s built into Python 3. So, that’s not available in Python 2, but it’s kind of the nice and clean interface for doing parallel processing and parallel programming in Python 3. All right.
01:00
So, let’s bring back the multiprocessing
implementation for a second, and just to run this example program again… So, what you can see here is that, well, we’re taking this input data set, we’re generating this output here, and this takes about two seconds to complete using multiprocessing
.
01:18 We can see here, based on our logging output that I set up, that the work is distributed across a bunch of different processes. We have these four worker processes, here, that we can identify based on their process ID, and they’re working on these records in parallel.
01:36
Then, at the end, the multiprocessing.Pool
reassembles all the results and gives us a list with all these derived—
01:46 or, I guess I call it transformed— dictionaries here that are based on the input data.
brayo on Sept. 19, 2020
Felt it when he said ‘hopefully runs’
Become a Member to join the conversation.
mikkoskilpelainen on July 7, 2020
Such great content about Multiprocessing module and how it relates to map. Keep it up!