How to Use multiprocessing.Pool()

Functional Programming in Python Dan Bader 05:14

In this lesson, you’ll dive deeper into how you can use multiprocessing.Pool. It creates multiple Python processes in the background and spreads out your computations for you across multiple CPU cores so that they all happen in parallel without you needing to do anything.

You’ll import the os module in order to add some more logging to your transform() function so you can see what’s going on behind the scenes. You’ll use os.pid() to see which process is working on which record, so you’ll be able to see that the same processes are working on different data in different batches. You can even set the number of processes you want to be working at once.

00:00 Now, what is going on here? This is the magic of the multiprocessing.Pool, because what it does is it actually fans out, it actually creates multiple Python processes in the background, and it’s going to spread out this computation for us across these different CPU cores, so they’re all going to happen in parallel and we don’t have to do anything.

00:24 The result we get is exactly the same. The multiprocessing.Pool fans out and does all these computations for us, applies the transform() function, and then brings back the results and assembles this output data structure, here, so we get exactly the same result, here, which is pretty cool!

00:46 Now, there’s a couple more parameters we can tweak here, and I really want to make sure that you see what’s going on behind the scenes here. The first thing we’re going to do is we’re going to add some more logging to this transform() function so we can see what’s going on behind the scenes.

01:00 For that, I’m going to import the os module. Then here, we can print out the process number, or some identifier, for the

01:11 current process. We can do that with the os.getpid() (get process ID) function.

01:17 We can say f'Process'—the name of the process that is working on this record, and then we can say f'Process {os.getpid()} done processing record' […].

01:31 And now, if I run this again, we’re going to get some more extensive output here. So, each process that happens in parallel, it gets a unique identification number.

01:41 And we can see here how the multiprocessing.Pool spreads out our computations across multiple cores. So, what you can see here is that we have these four processes, you know, 17324 to 27,

01:58 and they’re working on stuff in parallel. And then, they’re being reused. So, in the second batch, the same processes—again—are working on other data. And we can influence this behavior, so, we can actually put a limit and say, “Well, I only want 1 process in this Pool here.” And when I run this again, you can see here, well, now we have a single process that’s doing all of the computations. And again, we’re going to end up with a seven-second runtime.

02:31 If you look at the log here, you can see exactly that it’s the same process processing each record one by one, and there’s no parallel computation happening. And now, if I crank this up, I can say, “Well, we want 2 processes.” Now, if we run this again, we have two processes working in parallel and they’re processing these records in parallel, and now we get a little bit of a speedup.

02:54 And, of course, I can go crazy here and I can actually say, okay, I want a process for each record here. The number of processes should be the number of records in this thing here, and that way we can process all of them in parallel and we can really cut down our time to complete to a second, which is about as long as it should take to process a single element.

03:21 And you know, of course, this is a bit of an academic example, or just a toy example, because—well, we’re just sleeping here, so there’s really nothing very interesting going on.

03:31 But if you imagine that this was a call that was waiting on I\O, or it was waiting for a website download to complete—if this was a web scraper of sorts—you could do all of these things in parallel and with with the multiprocessing.Pool you can really influence how these operations are performed across multiple CPUs,

03:52 or multiple CPU cores, and you have a lot of flexibility in configuring it. There’s other options here. For example, we could say, “Okay, I want 2 processes,” and there’s another setting that’s called maxtasksperchild (max tasks per child), and I can say, “Well, I want 2 processes in parallel and I want the process to restart after it has completed a number of tasks.” So,

04:18 if you run this, we’re going to get a slightly different output here. Again, if we look at the process ID, you can see here, okay, we’re starting out with 03, 04, and those are doing some processing.

04:30 And then, we’re getting two brand new processes that are processing the next elements, and then we’re getting two brand new processes, again, and so on.

04:38 With the multiprocessing.Pool you can really influence how it’s distributing your calculations across multiple CPU cores. Of course, you can also leave out these settings and what it’s going to do then—it’s going to spawn as many processes as you have CPU cores.

04:58 So, I’m on a dual-core with hyper-threading machine that Python sees as a quad-core CPU, and that means it’s going to spawn four separate processes to maximize the CPU that I have available.

Michal on Aug. 5, 2019

On my Windows machine, I had to move the top-level code into a main() function, otherwise I was getting this error.

Michal on Aug. 5, 2019

Update: Also, the script works fine after moving the top-level code into an if __name__ == '__main__' block, i.e. without defining a main() function.

But not as a plain top-level code that we can see in the video.

DS on Jan. 12, 2020

Is it not possible to parallelize reduce function because of the accumulator which needs to be shared across multiple processses? Is that the reason why multiprocessing.Pool doesn’t have a reduce function as opposed to map? Can somebody please let me know if my thought process is correct? Thanks in advance.

Dan Bader RP Team on Jan. 12, 2020

@Diwanshu:

Is it not possible to parallelize reduce function because of the accumulator which needs to be shared across multiple processses?

Yep I’d say that’s accurate. Parallelizing the reduce step might be possible depending on the exact use case (and you could do it by splitting up the work manually and taking care of it with another multiprocessing.Pool) but there’s no general implementation of a parallel reduce that the multiprocessing module provides.

somanathsankaran on March 30, 2020

I did some experiments with multi processing .Is the number of processes must be equal to no of cores in our machine ,But when I experimented with more no of process than number of cores the output came fast ,But I have a question is this way of giving more process than number of cores in machine not recommended

Sid on March 31, 2020

I’m experimenting with simple squaring function like this:

def fungsi_kuadrat(n):
    return n**2

a = tuple(range(5000))

waktu_mulai = time()
pool = multiprocessing.Pool()
dikuadratkan = tuple(pool.map(fungsi_kuadrat, a))
#dikuadratkan = tuple(map(fungsi_kuadrat, a))
waktu_selesai = time()
lama_eksekusi = waktu_selesai - waktu_mulai

print('waktu yg dibutuhkan {} detik'.format(lama_eksekusi))

Using conventional map it takes 0.0102119445801 secs, while multiprocessing pool map it needs 0.0106949806213 secs.

Why is multiprocessing.Pool() a bit slower in this case?

Jim Anderson RP Team on March 31, 2020

Sid - The slow down on the multiprocessing pool is probably due to a couple of things. Mainly, spinning up processes takes time, and passing data to and from these processes is relatively slow. The time it takes to pass n from one process to the worker process could well be longer than then time it takes to compute n**2. (I haven’t measured so i don’t know definitively)

If you make your function more compute intensive, you’ll likely see the multiprocessing pool do better.

Hope this helps. jima

Rombituon on April 1, 2020

Hi, i try running the code but got error like

PicklingError: Can't pickle <class '__main__.scientists'>: it's not the same object as __main__.scientists

do you have any idea?

Rombituon on April 1, 2020

Hi sorry, solved. it was because of my namedtuple

amicablezebra on April 17, 2020

multiprocessing is great, but be aware, if your transformation function fails, you have to code up a bunch more stuff to get access to the stacktrace…

Chris James on April 20, 2020

On Python 3.8, I needed to change the code to this

if __name__ == '__main__':
    with multiprocessing.Pool() as pool:
        start = time.monotonic()
        result = pool.map(transform, scientists)
        end = time.monotonic()
        print(f"Time taken: {end - start:.2f}")
        pprint.pprint(result)

The pool needs to be setup before you can use it, the easiest way is to use the context manager, otherwise you will get the bootstrapping error on Mac/Linux too.

Trying to run the code in Jupyter Notebook will deadlock, not entirely sure why, Jupyter and multiprocessing don’t really work togther.

(time.monotonic() always gives you the correct result with this type of usage, even if the system clock gets screwed with inbetween readings. Useful for production code!)

Zarata on May 7, 2020

Well that was unexpected … the mid video set of the pool size to 7 led ~1 sec completion. Yet, later the cores available are shown to be 4. I’d been thinking the pool manages 1 thread per core, one process per thread, so despite the large pool size I thought the problem would have been core bound, i.e. the time would be ~2. So is the pool creating 7 threads, (I see there ARE seven PIDS), doling the 7 threads over the 4 cores, and the OS takes over managing multiple concurrent threads on some of the single cores? Or what? (there was a mention of the hardware supporting ‘hyperthreading’ … is that related to my conundrum?)

Varun Vaddiparty on May 11, 2020

Can someone clarify whats up with the number of processes being more than the number of cores. In that scenario how do all the processes run at the same time and finish in 1sec.

Zarata on May 11, 2020

I think it begins to get answered in a later tutorial. There are two different types of pools in the Futures package: Thread and Process. Futures bears some similarity with the multiprocessing package, and what is revealed there thus may have some relevance here. At least, it made me think further. I suggest: The transform() functions that are used here (and in the future Futures :) ) only contain a “sleep(1)”, so each is very similar as a threaded task, and each largely consists of a hold time doing nothing – the OS is pulling off the wait. Somehow the pool here in multiprocessing is cooperating with the OS and knows “launched a thread, it’s in hold, no CPU being used, I can launch another, …” so that eventually there are 7 threads out there. I can’t say how they are actually distributed across the cores, but they are all held and in wait. Then, after the 1 sec., the OS starts un-waiting the threads, all probably sequentially, but maybe divided between cores; regardless, all release at about the same time. Other than the “huge” wait, very little time is used. So, the threads clean up, and the total exec. is only ~1.00 sec.

Ben on Oct. 22, 2020

After moving everything into a main function and adding an

if __name__ == '__main__':
    main()

block at the bottom, I continued to get an error:

AttributeError: Can't pickle local object 'main.<locals>.transform'

The solution was to move the definition of the transform function, the collection.namedtuple and the place where I defined the scientists variable outside of the main() function.

baesjerker on Dec. 30, 2020

Can anyone please post a complete script to fix the AttributeError: Can’t pickle local.... ?

baesjerker on Dec. 30, 2020

Nevermind. I managed to fix it. Sorry about that.

Jack on Feb. 2, 2021

Great examples for multiprocessing.

Become a Member to join the conversation.