CPU Bound Workloads
In the previous lesson, I introduced you to the
multiprocessing library in Python. In this lesson, I’m going to show you the difference between I/O-bound and CPU-bound workloads.
So far, you’ve seen how to use the
multiprocessing libraries. All of the examples up until now have been I/O-bound. In an I/O-bound program, the vast majority of the time the program spends running is in waiting for input and output. Downloading content from the web is a perfect example.
00:31 The amount of latency involved in network transactions far outweighs the computation involved in downloading it. Not all software is I/O-bound. If you’re doing heavy mathematical computations, then most of the time is spent in computation.
asyncio libraries only see speed-up in I/O-bound cases, and that’s because they are scheduling computation while the other threads are waiting for the I/O.
00:59 You get speed-up because you’re not waiting for one piece of I/O at a time—you’re waiting for multiple pieces of I/O at a time. In the examples so far, you see about a seven times improvement against the best case scenario of a synchronous program when downloading 160 items from the internet.
You may have noticed that this speed-up is about the same across
asyncio, and even the
If your program instead is doing something that requires a lot of work inside of the CPU rather than waiting on a peripheral, then
asyncio is not going to give you that kind of improvement. To get improvement in CPU-bound software, you need multiple CPUs working simultaneously.
multiprocessing library does allow you to do this. In the next examples, I’ll show you the differences.
01:52 This is the synchronous version of a CPU-bound program. It’s a bit of a convoluted example. It calculates the sum of the squares between zero and a large number, and in this case, it does that 20 times for the numbers 5,000,000 through 5,000,020. Line 4 defines the function where the calculation of the sum of the squares happens.
Line 7 is the wrapper function that uses a loop to call the calculation once for each of the numbers being passed in. And then inside of the
__main__ section, line 12 defines a list—the numbers 5,000,000 through 5,000,020 to run this calculation on.
02:32 Let’s see this in practice.
02:39 I sped that up so you didn’t have to sit there and watch that, but it took just under 9 and a half seconds. You may recall from the synchronous version of the download program, that there was a large variety in how long it took the program to run, and that was because you’re dependent on the network, the response of the server, how many other people are hitting the server at the same time, and all sorts of things outside of your computer.
03:02 By contrast, CPU-bound programs tend to be consistent. Let me run this again so you can see it.
03:13 Again, I sped that up but it took about 9.4 seconds, so roughly the same amount of time. Because this program is spending all of its time exercising the CPU, the only thing that might slow it down is if the operating system is swapping it out for something else. It’s not having to wait on any I/O.
And now, here’s a threaded version of the software. The
calculate() method does the same thing, the sum of squares.
find_sums() has changed slightly. In this case, instead of just looping through all of the numbers, it creates a thread—one for each of the 20 numbers to be executed on. It then maps that number to the
calculate() method, so the numbers 5,000,000 through 5,000,020 will each get their own thread for the calculation. Let’s see what happens.
04:06 Well, that’s not good. Because this program really isn’t I/O-bound, once it’s loaded into memory, all it’s going to do is do calculations on the CPU. All of the speed-up before was by taking advantage of the latencies of going to the peripherals. That’s not happening in this case.
04:25 The reason this is taking longer is because there’s overhead in creating and managing the threads. In this case, you’re spending about half a second to manage the threads and getting no advantage whatsoever. To speed up something that’s CPU-bound, you’ll actually need more CPUs. Threads or async I/O—which are essentially the same kind of concept—will not give you any speed-up, and in fact can cost you time.
This is the
multiprocessing version. The code is essentially the same as the threading version, but lines 9 and 10 replace the threading
Pool with the
05:08 That’s significantly better. It’s a little better than a three times improvement over the synchronous version. Because this is CPU-bound, throwing more CPUs at it does speed things up. The computer that I’m running this on has four CPUs. Notice that it’s not four times better—it’s only three times better. Creating and managing the processes takes time.
05:31 There’s overhead involved in this. For this calculation, it’s about a 25% cost.
05:39 Donald Knuth said “Premature optimization is the root of all evil (or at least most of it) in programming.” What he means by this is: write your software first, figure out where the bottlenecks are, and then improve on those bottlenecks.
05:54 Most programmers do not actually have an intuitive grasp on where the bottlenecks in their software are. Every time I think I’ve known where it is, I’ve been wrong. You’re much better off writing your program first, measuring your program, and then figuring out how to improve it.
06:12 This leads to the question, “When do you use concurrency?” As you’ve seen, moving from synchronous to concurrent software has additional complications. There’s more code, you have to decide which type of concurrency you’re going to use, you have to worry about thread-safety and memory sharing. And then, of course, if you didn’t write the program right, you might end up with deadlocks or resource starvation.
06:37 So the first thing you really want to do is ask yourself, “Do you really need it?” There are certain domains where it makes an awful lot of sense. If you’re building a GUI, you probably want the GUI to be responsive, so having it on a different thread than the computation makes sense as a design.
06:53 If you’re doing a lot of data computation, then you want to take advantage of the fancy computer you bought and use all of its cores. I’m not saying don’t do concurrency—just make sure that you aren’t incurring the overhead of creating the more complicated code unless you actually need it.
07:10 Once you’ve concluded that you do need it, you need to decide on a model, and therefore a library to use. The easier question is I/O-bound versus CPU-bound, leading you to decide between multiprocessor, or thread and async I/O.
If you are doing I/O-bound, then you should favor async I/O over threads if you can. It tends to be a little lighter-weight and a little faster. That being said, the libraries that you’re interacting with may not be compatible with
asyncio, so you may be left with threads. Next up, I’ll give you a course summary and point you at some further reading if you want to dig in deeper.
Become a Member to join the conversation.
RobyB on Dec. 11, 2020
Wow ! What a lesson ! I will bookmark it, it’s very interesting, and clarifies a lot of things.