How Functional Programing Makes Parallel Processing Simple
In this lesson, you’ll see how functional programing makes parallel processing simple. If you can write your program in such a way that it uses a map
operation to transform some input data into some output data, then it’s quite simple to parallelize it.
You just need to import concurrent.futures
and add two lines of code instead of a straight-up map()
call, and your code is running in parallel! This can lead to huge improvements in speed.
00:00 All right. Again, only scratching the surface in this video, but I hope it whetted your appetite for doing more parallel processing in Python and to also do it in a functional programming style, because I think one of the key advantages of functional programming is that it makes it very easy to parallelize your programs, right?
00:22
If you can write your program in a way where it’s using a map()
operation to transform some input data into some output data, or, let’s say, if a part of your program can be written that way, it is extremely simple to parallelize this like crazy. As you’ve seen here, all you need is you need to import concurrent.futures
and you need these two lines of code instead of a straight-up map()
call that’s built into Python, and
00:50 your code is running in parallel. This can lead to huge speedups. This can be a really quick win for a program that’s I/O bound or that’s CPU bound.
01:00 So, highly recommended for any kind of number crunching that you do, or if it’s a super I/O bound program, like most web scrapers are, where you’re waiting a long time for the web requests to finish so that we can parse out the data.
01:13 This is extremely handy for those kinds of situations. I would encourage you to play with it and just get a better understanding of how this works and what the difference is between thread-based parallelism and process-based parallelism, and how the global interpreter lock works in Python. Have fun playing with parallel programming. All right, talk to you soon.
Dan Bader RP Team on Jan. 21, 2020
Check out our Python Concurrency & Parallel Programming Learning Path to build your understanding of these terms from the ground up.
I especially recommend our Speed Up Your Python Program With Concurrency tutorial and taking the associated quiz. Happy Pythoning! :)
breaves on March 30, 2020
@Pygator Let me try.
Your computer has 1 cpu but it can do up to 4 things at once. That’s 4 “cores.” Each core can be doing a totally different thing.
That thing is a “thread.” In this example, it runs one transform() per thread.
The Process ID (pid) is the identifier that your OS assigns to each thread. When it’s running normally you’re using just one core, that’s one pid. How did he know he was running 4 threads? Because the OS gave him 4 different PIDs. (In the old days of 1988 I worked on a computer that could do multiple threads per PID, and it was really difficult to assign one thread to one piece of code, and identify which code was running on which thread. I’m so glad we finally calmed down to running one process per thread).
A pool is a group of available threads. Maybe your computer has 16 cores but for some reason 10 of them are busy with something else. They you have only 6 cores available, so you can allocate a pool of only 6. Well, you can try allocating 16 but you’re only going to get 6. In his example this happened - he tried to allocate a pool of 8 threads, but you can see he got only 4. This is because only 4 were available.
Clear as mud?
breaves on March 30, 2020
@Pygator but in the next lesson he’s going to show me wrong! Looks like you can have multiple threads in one pid, but they can’t both be doing calculations. But they can be waiting on I/O, or on an event, which is how timer.sleep() works. But if instead of timer.sleep you were counting from 1 to 100000, or doing something else with calculations, putting them all on a single PID would force it to wait and it would be the same as running on a single thread. More mud…
cellist on April 1, 2020
A process is a unit of computing capability that has its own memory space and to which the OS assigns a unique process ID, the PID. Processes can be run in parallel on a Multi-CPU system or in a pseudo-parallel fashion on a single CPU if time slices are applied on them.
A process can have many threads executed in parallel as well (all with the same PID), but they are “lighter” because they share the same memory space and thus have less overhead in invocation and run-time management.
breaves on April 10, 2020
@cellist thanks. So it’s the closeness of the memory sharing that’s the key. Indeed, when you want to communicate data from one Process to another you need to use IPC (Inter-process Communication). If you want to share memory between them you need to use mmap(). Sorry I haven’t done that since 1993 or so, in C, so that’s why I’m taking the python class on it, to see how it’s done in Python now 27 years in the future
freddy-rgb on March 28, 2022
Maybe I’m greatly mistaken but it seems to me that in the Overview this video should come after “concurrent.futures vs multiprocessing”?
Tony Ngok on Feb. 19, 2024
I think in a previous video, you say that multithreading is ideal for I/O bound apps, while multiprocessing is ideal for CPU bound apps. Why do you say here that multiprocessing is good for both CPU and I/O bound apps?
Bartosz Zaczyński RP Team on Feb. 19, 2024
@Tony Ngok You can use either multithreading or multiprocessing for I/O-bound tasks, which have different trade-offs. Because of the global interpreter lock (GIL), threads in Python can’t take advantage of genuine parallelism, so your only option in most cases remains multiprocessing for CPU-bound tasks.
Become a Member to join the conversation.
Pygator on Jan. 20, 2020
This is good and all for speeding up the code by parallelizing it. But i’m still so confused about the difference betweeen, “threads”, “pools”, “processes”, and “cpu cores” .