Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Multi-Processing Version

Christopher Trudeau

Speed Up Python With Concurrency Christopher Trudeau 06:44

Transcript
Discussion

00:00 In the previous lesson, I introduced you to event loops and coroutines using the asyncio library. In this lesson, I’m going to show you the multiprocessing library.

00:11 Both the threading library and the asyncio library operate inside of a single Python interpreter, and therefore are encumbered by the GIL.

00:19 If you’re doing I/O-bound concurrency, this typically isn’t a problem. You can still get speed-up because of the long latency waiting for network or disk access. Python has a library called multiprocessing that allows you to spin up an interpreter per CPU.

00:36 This allows you to do CPU-bound concurrency. As each CPU gets its own instance of the interpreter, the GIL isn’t a problem—you get one GIL per CPU.

00:48 This is the multiprocessor variant on our download program. It looks remarkably similar to the threading example. Line 2 imports the multiprocessing library. Line 8 sets a global session. Because each processor will get its own interpreter, each interpreter has its own memory footprint.

01:09 So when the CPU-specific function gets launched, it will need a session. The creation of this global session is happening inside of this function.

01:19 The download_site() method is very similar to before, simply using the session to do a .get() on the url and getting the web response. Line 15 is slightly different.

01:29 This time, instead of printing out a "J" or an "R", I’m going to print out the number of the CPU that the code is running on so that you can see how things are swapped between them.

01:40 The download_all_sites() method starting on line 18 has a pool in it. This time, instead of being a thread Pool, it’s a multiprocessing.Pool().

01:50 Unless you specify otherwise, you will get one interpreter per CPU inside of your computer. The initializer parameter specifies a function to call once each of the instances is set up. In this case, this is where I’m setting up the requests.Session.

02:10 And just like the thread library, a map happens between the function that does the processing and the data in question. The Pool.map() function takes care of assigning the site values to one of these functions, spreading out the computation. Let me just scroll down.

02:28 This should all look fairly familiar. Line 32 calls download_all_sites() and completely abstracts away any of the multiprocessing going on.

02:38 Time to see it in action.

02:44 And there you go. Just under two seconds. Once again, a significant speed up from even the best synchronous time. Notice the lack of pattern in the CPU numbers being printed out to the screen.

02:57 Keep in mind, this doesn’t have to do with the scheduling of the processes, but when the web server comes back. Printing of the CPU number happens when the web page has finished downloading, so depending on connection speed and network latency, things move back and forth between the CPUs. Because there are four interpreters in this case, four things are happening simultaneously.

03:20 Now, there’s still only one network card and that network card can only do one thing at a time, so there is a bottleneck—the peripheral—but the CPU components of it operate independently.

03:34 The key lines of code to using the multiprocessing library are the creation of the Pool and the mapping of the data. By default, the Pool creates one process per CPU in your computer.

03:46 Each process has its own memory space and the initializer parameter is called once per process within the local memory of that space. In the case of this example, there are still 160 things to download, so as a single download_site() function finishes, the Pool assigns the next one to whatever CPU is currently idle.

04:08 This could account for some of the numbers repeating themselves. If CPU 4 happens to be freed up while 3, 2, and 1 are still waiting on the I/O, 4 would get the next download_site(), and if for whatever reason it happened to be able to download quickly, it might print out a result before one of the other three processes finished downloading their site. And once again, because each CPU gets its own instance of the interpreter, you no longer have the problem of the GIL.

04:38 So if multiprocessing partially solves the GIL problem, why wouldn’t you just do this all the time? Well, first off, it requires a lot of overhead to create a process.

04:49 The implementation of a process happens at the operating system level, so you will also see behavior and scheduling differences between operating systems in your code. Because each process gets its own copy of the interpreter, it tends to require more memory than threading does as well.

05:06 And not only does it require more memory, but you have to spin up the interpreter, so the initialization time of each process tends to be longer than threads.

05:16 In fact, threads were introduced into operating systems as a lightweight way of getting around the overhead involved in processes. Because each process has its own interpreter and does not share memory footprints, communicating between the processes must be done with explicit constructs.

05:35 The multiprocessing library comes with a few that can help you do that. Queue and Pipe are ways of sending data from one process to another, and the Value and Array constructs allow you to share memory between processes.

05:50 The multiprocessing library includes locking mechanisms to make sure that you don’t end up with race conditions or deadlocks when two or more processes are trying to access the same chunk of shared memory.

06:01 The number of threads that you instantiate generally can be as many as you want. More is not always better, but it’s under your control. multiprocessing typically is only used to map processes to CPUs.

06:14 You can instantiate more interpreters than CPUs, but it doesn’t really make sense because then all you’re doing is swapping out those and the overhead of the swap tends to be more expensive and you don’t actually gain any speed-up.

06:29 threading and asyncio tend to be beneficial in I/O-bound situations. They’re not beneficial in CPU-bound situations. That’s where multiprocessing reigns. So in the next lesson, I’ll show you the differences.

Become a Member to join the conversation.