When to Use concurrent.futures or multiprocessing
In this lesson, you’ll see which situations might be better suited to using either concurrent.futures
or multiprocessing
. You’ll also learn about how that ties in with the Global Interpreter Lock (GIL).
Because of the GIL, no two threads can execute Python code at the same time. So even if you have multiple threads running in your Python program, only one of them can execute at a time. The best way to get around this is to use process-based parallel programing, or process-based parallelism.
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00 Now, of course, the question is, “Well, when should you use one over the other?” The dark secret of Python is something called the Global Interpreter Lock.
00:10
What it means, basically, is that no two threads can execute Python code at the same time. Even if you have multiple threads running in your Python program, only one of them can execute at a time. Now, this sounds like a huge limitation and in some cases it is, but, really, what happens most of the time is that your thread will be waiting on I/O to complete. In this case, if I’m calling time.sleep()
—that’s an I/O operation. That’s blocking this thread. That means while that time.sleep()
call is blocking this one thread, other threads can execute, and then at the end, the thread will resume and they will all finish their processing.
00:52 So in this case, it doesn’t really make a difference. However, if I was doing heavy number crunching in these threads, I would run into the global interpreter lock problem because the end result wouldn’t really be faster than running a single-threaded version.
01:06 So, there’s lots more to say about this, but really what you need to remember is that the best way to get around this is to use process-based parallel programming in Python, or process-based parallelism.
01:18
And this is where this concurrent.futures
module is extremely handy because you know—see? I did it again. I just switched from thread-based execution, or thread-based parallelism, to process-based parallelism.
01:31 This gets around the global interpreter lock problem because every single process has its own interpreter. Therefore, they can all run in parallel, you can actually spread them out across multiple CPU cores, and this solves the global interpreter lock problem.
01:48
But this is definitely something that you need to keep in mind when you’re writing parallel programs in Python. This is also where this concurrent.futures
module is kind of nice, because you can change the execution strategy very, very easily.
02:02
And, really, the ProcessPoolExecutor
is just a wrapper around the multiprocessing.Pool
, but if you’re using this interface, it just becomes so simple to swap out the different execution strategies here.
02:17
Now, we’re back to a ThreadPoolExecutor
again, and we’re getting a different result.
konk on July 5, 2019
Good tutorial. I’d read about python multiprocessing/threading, but had not yet implemented it. Seeing it in action in this tutorial - wow - quite easy to get going. No reason not to implement when it will help.
az on Nov. 29, 2019
You’ve cleared my doubt about functional programming. Now, the time for question. Can we say that a piece of functional program should only contain function calls with no procedural logic and it should ideally act on immutable data ?
Dan Bader RP Team on Nov. 30, 2019
Glad you liked the course!
Can we say that a piece of functional program should only contain function calls with no procedural logic and it should ideally act on immutable data?
Great question. Well, my take on this is that I’ll use whatever makes my life and the lives of my colleagues easier :) I’m not a purist when it comes to functional programming.
I find it useful as a technique that I can use when appropriate, but I’m not going to lock myself into writing only pure FP code with 100% immutable data structures.
It might make for a fun exercise to try and attain that, but at the end of the day I’m usually writing code to solve a problem. So I’ll use whatever tools and techniques that make me the most effective in getting to my goal. I don’t feel bad about mixing functional, procedural, and object-oriented programming styles.
Pygator on Jan. 20, 2020
I liked taking builtin map from python3 and connecting it with a parallel map idiom from two separate modules to speed up code execution; great place to use FP on a big data structure.
mikesult on March 3, 2020
Great tutorial in functional programming. I learned a lot. I’d like to share a newbie mistake I made in the last section. I typed in the code from the video but I named it concurrent.py
(bad mistake) and when I tried to run, it caused an error:
ModuleNotFoundError:
No module named 'concurrent.futures'
'concurrent' is not a package
I fumbled with this for most of the day trying to figure out the problem before I finally found a post from a few years back on stackoverflow that had the same error and one of the answering comments included ‘…Either that or you’re shadowing concurrent. Do you have a concurrent.py?’
Umm, yes I do.
Once I renamed the file it worked as expected.
So I learned that you shouldn’t name your file the same name as a package name. Of course it seems so obvious now.
Thanks for a great intro to the functional programming style.
Axel FAUVEL on March 27, 2020
Thanks a lot for this course, very well explained :)
Ola Ajibode on March 27, 2020
Got is now! That concurrent.futures bit was very useful particularly when GIL is in the picture. Thanks and kudos!
Dr VINOD KUMAR VERMA on March 28, 2020
nice contents.
ibrahim suleiman on March 29, 2020
is there any project you can suggest to apply the lesson learnt from this course
yashtronp on March 29, 2020
Thanks a lot for this course, very well explained is there any project you can suggest to apply the lesson learnt from this course plz
sroder on April 1, 2020
That was cool !
Cristian Palau on April 6, 2020
Thank you Dan for this great tutorial! :)
zorion on April 8, 2020
Awesome, thanks Dan! I finally understand what GIL blocks. It was always a black box for me, I knew that there was something wrong in Python parallelism but I didn’t know that it was restricted to threads while computing. Good to know, Good to know!
George Yeboah on April 10, 2020
Good tutorial I really enjoyed watching it and picking up some cool techniques from it Great work keep it up
darth88vader88 on April 10, 2020
Thanks for the course, Dan! key takeaway was definitely parallel processing. the discussion was “pure gold” to jump start its use in my coding
radupopa21 on April 10, 2020
Never fully understood the GIL problem and how concurent.futures solves it for us. Thanks you very much for that.
Paul Ricketts on April 11, 2020
I’m super impressed with the clarity of the explanations. And finally I understand what functional programming is, and how handy it can be for multiprocessing. Many thanks!
bennjuguna0 on April 13, 2020
Honestly thought it would be harder than this. Many thanks to you for the awesome tutorials.
berry4 on April 13, 2020
Thank you for this course. I learned a lot from it!
Dave on April 14, 2020
That was excellent! You do a great job presenting this info.
Your examples were working with a small dataset. How would you populate this type of immutable/named tuples data structure? Would you import to pandas first and then set this up?
pcordero on April 15, 2020
Really nice overview and explanation! congrats.
Tomas Menito on April 22, 2020
Great tutorial, thanks!
Javier Ruiz on April 22, 2020
This was a very nice present! Thanks Dan and Real Python!
nareshhdfs on April 25, 2020
has anyone idea about given error like below while using multiprocessing pool?
**cls(buf, protocol).dump(obj)
TypeError: can’t pickle SSLContext objects
milosvblagojevic on April 25, 2020
Thanks for the course, it is very clear and concise and helpful.
milangnjatovic on April 28, 2020
Great course and even greater presentation. Keep it up.
Zarata on May 7, 2020
Processes and Threads and GIL, oh my! Processes and Threads and Gil, oh my! … It’s a mind stretcher!! The fact the (implied superior) not-GIL-limited ProcessPoolExec executes in 2 sec while the ThreadPoolExec different result is 1 sec (half) teaches much … but it’s going to be awhile gaining the skill and insight to know precisely what :) ! Wow! Thanks DB. BTW, roughly how much resource hit giving each process its own Py interpreter??
Marcelo Garbarino on May 31, 2020
Excellent course! Thank you!
sufuang on June 7, 2020
Nice & great presentation. It helps to understand multiprocessing, concurrent feature, and GIL block better. Is any way to get the codes and supporting material for this course? Thanks!
sroux53 on June 23, 2020
Excellent !
SUDHANSHU TIWARI on June 28, 2020
nicely teached even an intermidiate like me can also understand 👍👍
SEOTrafficHack DigitalSEO Marketing auto on July 19, 2020
Great practical applications possibilites for one of most efficient derived data types - namedtuples. For SEO tasks I did namedtuples classes to automate mapping of scrapped data keywords with their attributes. This course is great supplement how to process, map, join, combine and save time running processing data with multiple attributes.
Divyanshu Sharma on July 27, 2020
Can you explain how does multiprocessing.dummy
compare to concurrent.futures.ThreadPoolExecutor
?
Bartosz Zaczyński RP Team on Aug. 3, 2020
The concurrent.futures
package came with Python 3.2, which was years after the multiprocessing.dummy
. It was modeled after the Execution Framework from Java 5 and is now the preferred API for implementing thread pools in Python. That said, you still might want to use multiprocessing.dummy
as an adapter layer for legacy code.
Ghani on Oct. 14, 2020
This functional programming course is really excellent! Although I need to revise it and chew it again, I learned plenty of ways I can make my code more efficient. Thanks Dan.
paulagm12 on Nov. 15, 2020
Great course! It is explained in a very clear way and I have learnt a lot of new and useful things to put into practise in my programming. I really appreciate all the effort put into this.
squeakyboots on April 27, 2021
Thanks so much for this course! I’m excited to try applying this to my own API calls when something might be running more slowly than I’d like.
MOSTA on June 4, 2021
Great contents and clear presentation. Now I have to do my own practicing.
samuelebright on Jan. 30, 2022
Thank you Dan for this course. I’m looking forward to storing data from Excel sheets in immutable data structures and then using some of the strategies from the videos to manipulate the data for use in my programs.
MarkYoung on Sept. 25, 2023
Great course. Number 1 takeway for me was an answer as to why to keep functions (that will be parallelized) small and with no side-effects. An open question I had is that if I don’t use map()
to apply a function to an iterator, can that still be parallelized?
Become a Member to join the conversation.
alexchwu on July 5, 2019
Awesome lesson and thanks for sharing