Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set your subtitle preferences in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please refer to our video player troubleshooting guide for assistance.

How to Create a multiprocessing.Pool() Object

In this lesson, you’ll create a multiprocesing.Pool object. This is an interface that you can use to run your transform() function on your input data in parallel, spread out over multiple CPU cores. This Pool instance has a map() function, so you can map() the transform() function over scientists.

Now, when you run your program, you’ll see that you get the same result, but you get it a lot faster. This happened because you did your processing in two batches. In the next lesson, you’ll keep working with multiprocessing.Pool().

cdrr930725 on Dec. 1, 2019

When I run the code my terminal starts going crazy, and I never got the desire output. My code is a replica of the lecture code. Please check the link bellow:

codeshare.io/2B4mxy

cdrr930725 on Dec. 1, 2019

I fix it by putting all the code into if __name__ == '__main__':

linusblady on March 27, 2020

cdrr930725 thanks for the tip. With if name == ‘main’: it runs. If you use the standard IDLE the print statement in the function will not be printed.

dorellaurent on April 8, 2020

Hello, I’m on Window 7 os. I work with IDLE. When I run the script, nothing is printed in the IDLE shell window. I tried with the if name == ‘main’ part and the issue was the same…

renatoamreis on April 27, 2020

I have exactly the same problem as reported above. I also tried with name == ‘main‘

Dan Bader RP Team on April 27, 2020

Quick update on running these examples with IDLE (or presumably also other REPL environments):

You’ll probably run into issues if you don’t run the examples with python your_script_name.py from the command line, like I do in the video.

There are known issues with multiprocessing and IDLE (see this StackOverflow discussion for example)

norcal618 on May 28, 2020

I was able to get the multiprocessing stuff to run by putting some of the code into a function as so…

def run():
    start = time.time()
    pool = multiprocessing.Pool()
    result = pool.map(transform, scientists)
    end = time.time()
    print(f"\nTime to complete: {end - start:.2f}\n")
    pprint(result)

if __name__ == '__main__':
    run()

And the remaining code is left as show in the video

Arif Zuhairi on Oct. 9, 2020

I think because Dan run with Mac and we run with Windows and terminal/cmd prompt going crazy.. Thank you for the fix

Lucy on Oct. 13, 2020

Hi. i’m following very close your comments and fixes, but still gettin this error when using multiprocessing:

pickle.PicklingError: Can’t pickle <class ‘main.Scientist’>: it’s not the same object as main.Scientist

I also tried the norcal618’s code, but same error. Need to understand what is happening to continue learning and advancing. thanks in advance for any help

Daniel on April 12, 2021

Hi Lucy. I was getting a similar error as yours when working with Python 3.8 on macOS.

To solve it, you need to wrap almost all of the tutor’s code within an if __name__ == '__main__': clause.

The only thing you need to leave outside of that if __name__ ... clause is the line where we define the “Scientist” namedtuple.

It’s important to do so. Otherwise, you’ll get the pickling error.

Here’s a working script. Note that I used a with statement to wrap the multiprocessing.Pool() stage. It’s not mandatory to do that, but like it better that way.

Daniel

PS: Here’s an explanation on why you need to put the namedtuple declaration outside of the if __name__ ... clause: stackoverflow.com/a/16377267/8909331

And if you have some experience programming, you might be able to follow this explanation: codefying.com/2019/05/04/dont-get-in-a-pickle-with-a-namedtuple/

import collections
import multiprocessing
import time
from pprint import pprint


Scientist = collections.namedtuple('Scientist', [
    'name',
    'field',
    'born',
    'nobel'
])


def transform(x):
    print(f'Processing record {x.name}')
    time.sleep(1)
    result = {'name': x.name, 'age': 2017 - x.born}
    print(f'Done processing {x.name}')
    return result


if __name__ == '__main__':
    scientists = (
        Scientist(name='Ada Lovelace', field='math', born=1815, nobel=False),
        Scientist(name='Emmy Noether', field='math', born=1882, nobel=False),
        Scientist(name='Marie Curie', field='physics', born=1867, nobel=True),
        Scientist(name='Tu Youyou', field='chemistry', born=1930, nobel=True),
        Scientist(name='Ada Yonath', field='chemistry', born=1939, nobel=True),
        Scientist(name='Vera Rubin', field='astronomy', born=1928, nobel=False),
        Scientist(name='Sally Ride', field='physics', born=1951, nobel=False),
    )

    pprint(scientists)
    print()

    start = time.time()

    with multiprocessing.Pool() as pool:
        result = pool.map(transform, scientists)

    end = time.time()

    print(f'\nTime to complete: {end - start:.2f}s\n')
    pprint(result)

Anand on June 8, 2021

This never gets executed in Jupyter notebook. Interpreter python 3.x

from immutable_data import scientists
import time
import multiprocessing
def transform(x):
    print(f"Processing record {x.name}")
    time.sleep(1)
    result = {"name": x.name, "age": 2021 - x.born}
    print(f"Done processing record {x.name}")
    return result
if __name__ == '__main__':
    start = time.time()
    pool = multiprocessing.Pool()
    pool.map(transform, scientists)
    end = time.time()
    print(f'\nTime to complete: {end - start:.2f}')

any solution?

Dan Bader RP Team on June 9, 2021

@Anand: What happens when you run this in a standalone script? Looks like using multiprocessing from within a Jupyter notebook is generally bug-prone, e.g. see this thread here (plus related links): github.com/microsoft/vscode-jupyter/issues/941

Anand on June 9, 2021

@Dan: It worked perfectly outside Jupyter.

Processing record Ada Lovelace
Processing record Emmy Noether
Processing record Marie Curie
Processing record Tu Youyou
Done processing record Ada Lovelace
Processing record Ada Yonath
Done processing record Emmy Noether
Processing record Vera Rubin
Done processing record Marie Curie
Processing record Sally Ride
Done processing record Tu Youyou
Done processing record Ada Yonath
Done processing record Vera Rubin
Done processing record Sally Ride

Time to complete: 2.22

Thanks for the clarification.

Become a Member to join the conversation.