Multiprocessing Testbed Overview
In this lesson, you’re going to look at a little testbed program that you’re going to build and use to measure execution time with the time.time()
function, so that you can compare the single-threaded and multithreaded implementations of the same algorithm.
In the next lesson, you’ll see why you’d want to do all this. Because you wrote your code in a functional programming style, you can parallelize it fairly easily. There’s a parallel map
construct that you can use. That way, you can run your processing steps in parellel.
00:00 So, what I’ve got here is a simple testbed that brings back our data structure here. We’ve got this immutable data structure that represents a bunch of scientists that worked in different fields, when they were born, their names, whether or not they won the Nobel Prize at some point.
00:18 Then, we’ve got a simple transformation here, where we take each scientist and
00:25
then create a transformed output data structure that contains the scientist’s name and their age as of 2017. All of this is going to look familiar, so at this point, you probably want to check out the first video in the series on how and why we set up this data structure the way it looks like here—and, of course, also here, that’s about the namedtuple
—and then you probably will also want to watch the video on the map()
function that showed you how we can take this scientists
structure here and transform it into a new data
00:58
structure. All right, so, let’s run this example because I think that’s going to make it a little bit easier to follow. I saved this as parallel.py
, and when I run this, you can see here, well, okay, we’re just printing out the input data structure—this scientists
thing here—and it looks like this.
01:14
Then, we’re applying the transformation using the map()
function and we’re printing out the result, and this is the result that we get, here.
01:23
We get all of the ages for these scientists and their names. Now, this transform()
function here is really simple, right? We’re working with a relatively small amount of data here, and we’re working with a simple transformation that doesn’t take a lot of computations and it will execute very fast.
01:43 But if you imagine if this was a more complex operation, here—for example, if this needed to go out and fetch some data from the internet and then process it. So, if it was I/O bound, it was waiting for I/O to complete, for that website fetch to complete.
01:57
Or, if this was doing some more extensive number crunching, then performing this map()
operation would actually take quite a while, right? Like, if I run this right now, it’s instantaneous, but if we needed to do something more complicated here, this would actually take longer.
02:13
Just to simulate that I’m going to make this transform()
function just a
02:20
little bit slower. We’re just going to insert a time.sleep()
call here and we’re just going to sleep for a second. And if I run this now I can see here, this actually takes a little bit longer to process.
02:35 We’re still waiting for the results and we can make this a little bit more interesting because I want it to be a little bit more verbose. I want to be able to see what’s going on.
02:44
So I’m just going to say f'Processing record {x.name}'
and that’s going to give us some output as this processing is happening. Then, we can say
03:00
f'Done processing record {x.name}'
03:07
and then, we’re going to return this result
. So, you know, I’ve made this function here a little bit more complex, and also added some logging statements so that we can see what’s going on.
03:20 And now, I can run this and I can trace how the data is being processed, right? Basically, this is now telling me how it is running this transformation,
03:31 and I can see exactly what’s going on and I can see how these records are being processed in parallel.
Dan Bader RP Team on Jan. 21, 2020
@Pygator: Thanks, what I meant to say there at the end was “once we bring in multiprocessing this will allow us to see how the elements are processed in parallel.”
This lesson just sets up the testbed so we can measure the speed improvements we’ll get from parallelizing this code. Check out the next lesson where we’ll actually bring in the multiprocessing
module to execute these transformations concurrently.
juanC on April 4, 2020
I needed to “import time” to get the time.sleep(1) command to work (running Python 3.7). Is that normal?
juanC on April 4, 2020
(wish I could edit/delete my previous post) Ignore that, missed the import the first time. Sorry!
ericguo021 on June 5, 2020
Hello Dan, What’s the IDE you used for tutorial?
Dan Bader RP Team on June 5, 2020
I’m using an alternative Python REPL called bpython
in my videos. You can learn more about it here: bpython-interpreter.org. If bpython
is difficult to install, I can also recommend ptpython.
I’m running the REPL inside iTerm 2 on macOS, and the editor is Sublime Text.
Become a Member to join the conversation.
Pygator on Jan. 20, 2020
At the end you say it’s being processed in parallel, but we haven’t used anything from the multiprocessing module, so it’s still running serially.