Concurrency

Christopher Trudeau

Speed Up Python With Concurrency Christopher Trudeau 13:39

Transcript
Discussion (2)

00:00 In the previous lesson, I talked about the degrees of latency in the different components of your computer. In this lesson, I’m going to show you how that’s taken advantage of to create concurrency.

00:11 Recall this diagram from the previous lesson. It’s common for a program to have to wait long periods of time for it to be able to do the next instruction.

00:21 It needs to wait on RAM, disk, and network. Time slicing is the idea of the operating system mixing other programs into these wait states. While you’re waiting for that network packet to come back from Prague, the computer can insert multiple other programs to take advantage of that time.

00:40 This time slicing is how a single processor computer can look like it’s doing multiple things at the same time.

00:48 Time slicing for multitasking can be thought of at three levels. The lowest level is not having any at all. The olden days of personal computers, the DOS operating system worked this way.

00:59 You could only run a single program at a time. Cooperative multitasking is when the program willingly gives up the CPU. If you know that you’re about to go out to the network and you’re not going to need the CPU for a while, you signal the operating system and the operating system can schedule a different program. In home computing, Windows 3.1 introduced this kind of multitasking.

01:21 If you happen to be old enough to remember using this operating system, you may recall that at certain times the program would just stop refreshing. That would be because it didn’t give up the CPU, but it was stuck in a wait state, so the UI was not being updated because the program had not told the operating system to put it into the wait state. As the name implies, cooperative multitasking requires all of the processes to cooperate with each other. In shared computing, or in the case of a program with a problem, this can be a challenge.

01:54 So preemptive multitasking was created. The operating system, in this case, interrupts the program whether or not it’s ready. The operating system is responsible for swapping between the different programs.

02:07 In most operating systems, this is done intelligently. If your program does do a network call, that triggers the operating system to take advantage of it and put the program into a wait state.

02:18 But it can also do it in the case where the program is being a hog and hasn’t given up the CPU.

02:24 Mainframes have done this from very early on, as has Unix. In the Windows world, NT and Windows 95 is where preemptive multitasking was first introduced.

02:36 On larger computers, you can have this time slicing as well as multiple processors. On each CPU, you can now have different programs running. In some cases, the same program may also be resident on those different CPUs.

02:51 In this diagram, programs 1, 4, and 5 are tied to a single CPU, but programs 2 and 3 are across the two CPUs. Part of the operating system’s responsibility is scheduling within the CPU and across the CPUs.

03:09 Not all types of software can take advantage of concurrency equally. The type of concurrency you have can change what kinds of models you would use when creating a concurrent program.

03:20 Trivial concurrency is that where the different parts of the program can be split apart without concerns for each other. This can usually be achieved when activities within the program are completely independent of each other. This is easiest if there’s no shared data between the different components.

03:38 Consider a web server. You can have multiple clients talking to the web server at a time because each one of those clients don’t need to be able to talk to each other.

03:47 There may be some contention at the disk level, but as you saw in the lesson on latency, a huge amount of processing can be done while a server’s waiting for the disk.

03:56 In the non-trivial case, you have to share data across your concurrent components. When you’re thinking about this situation, it’s useful to think of three steps in processing: input, computing, and output.

04:09 Concurrency is generally done by splitting up the compute portion, but that may mean that there’s coordination necessary for the input and output stages. You may also need coordination amongst the different compute nodes, depending on what kind of algorithm you’re running.

04:25 Using the input, compute, and output model, you can think of programs in three parts. A producer is a component that produces data. This might mean reading something off the disk.

04:37 The worker is the component that does the actual computation. And the consumer is the component that consumes the data or aggregates the output. Consider for a moment doing some image processing on a very large file.

04:52 The producer portion of your program would read the large image format, the worker portion of your program would do the actual filtering on the image, and the consumer component would consume the result from the worker and write the new image to the hard drive. Depending on the architecture of your program, these concepts might be mixed and matched. Using these three ideas, you can introduce different patterns in concurrency.

05:19 This simple pattern has a producer pushing data to a worker, which pushes data to a consumer. It might not be immediately evident where you can get concurrency here, but because producers and consumers are interacting with the disk, a lot of work can be done in the worker while producers and consumers are working independently. Thinking back to the image processing example, the producer might read thousands of bytes off the disk, hand them off to the worker.

05:47 The worker could then manipulate those thousands of bytes, creating a result, which the consumer would then write to the disk. Unless the worker is particularly computationally intensive, in all likelihood it will most of the time be waiting for the producer and consumer.

06:03 But as soon as the worker has been fed data, it can start working. The producer does not have to have read the entire image off of the disk. This can produce some concurrency.

06:15 A variation on this pattern uses multiple workers. Your producer reads information off of the disk, hands it off to a worker, which begins computation. The producer then reads more information off to the disk and hands it off to a second or third worker. The workers work independently, creating a result, and then the consumer is responsible for writing it back to disk.

06:37 This pattern is common with the processing of very large images. For the most part, an image can be broken up into components, and image processing can be done on those components independently.

06:49 This means the workers don’t have to talk to each other very often, giving a high degree of parallelism between the work they’re performing.

06:57 A variation on this pattern is for the producer to broadcast. In this case, each worker sees all of the data. It may not operate on all of the data, but as the data is exposed to it, it can continue to work.

07:11 These patterns can also be mixed and matched. You can have your producers broadcasting to workers, workers talking to other workers, and those workers consolidating information for the consumer.

07:23 Programming concurrent systems introduces all sorts of new challenges to your software. Some things to consider. You must make sure that you’ve got execution coordination—different processes have to be able to sync up with each other.

07:36 Think of the image processing example. Let’s say your image was a megabyte large and split up into 10,000-kilobyte chunks. Each worker could work on one of those 10-kilobyte chunks, but in all likelihood, the borders between those 10-kilobyte chunks might impact the calculation in the chunk of image next door. So although for the most part the workers can work independently, when they reach the borders of the portions they’re working on, they may need to speak with other workers.

08:05 Once you have multiple processes operating, the operating system or the programmer themselves has to determine how memory is allocated across those multiple processes. This is hard enough in a regular program, it just becomes more complicated when dealing with concurrency. Scheduling is about when to determine which processes are active.

08:25 Thankfully, operating systems have done a lot of work on this. For the most part when you’re writing concurrent programs, you just let the operating system do the scheduling. But of course, that also means you’re giving up control, so there may be situations where this isn’t ideal.

08:40 Typically, the reason you’re writing a concurrent application in the first place is to be able to get things done faster. And really what you’re talking about here is higher throughput—more work done per unit time. How you manage execution coordination, memory allocation, and scheduling can change the throughput on your system, and you might need to fine-tune how these things behave in order to actually see speed-up. Somewhat related to scheduling is also distribution.

09:08 How do you split up at the different concurrent pieces amongst the CPUs on your machine or between machines? Writing concurrent software introduces algorithmic challenges.

09:19 Deadlocks are when two or more components are waiting on each other. If A is expecting B to do something, and B is waiting for A to do something, nothing is ever going to happen. And finally, resource starvation may also be a problem.

09:33 The different components of your program may be fighting for memory, disk space, or access to the CPU.

09:41 Concurrency can be applied at different levels in Python. The simplest is at the operating system level. This means, as a programmer, you don’t have to really worry about it.

09:50 You can run Python in a window or in your IDE and let the operating system take care of the fact that other things are happening at the same time. If you have multiple processors available on your computer, you can use the multiprocessor library in Python to have code execute on those different processors.

10:09 Modern operating systems have a lighter-weight way of getting concurrency called threads. Each thread in a program can execute different code. Threads can then be scheduled to take advantage of the I/O-bound nature of your program.

10:22 In addition to having threads in the standard library, Python has another mechanism using coroutines called asyncio. The rest of this course describes the async I/O, thread, and multiprocessor standard libraries in Python and the differences between them. When considering concurrency inside of Python, you have to be aware of the GIL.

10:44 This is the Global Interpreter Lock. The GIL is a thread lock, or mutex, that makes sure that only one thread controls the interpreter at a time.

10:54 This limits the way multiple threads can operate inside of Python. When two concurrent parts of a program operate at the same time, you can get race conditions.

11:05 If you have this kind of race condition inside of memory allocation, you will get memory leaks. The GIL was introduced to solve this problem. One of the powers of Python is the ability to extend it using C. You can write code in C and plug it in underneath and then have it called by Python. Although this makes Python powerful, it also makes memory allocation challenging.

11:29 Now you have to deal with memory both at the interpreter level and at the C extension level. The GIL is particularly important when dealing with these kinds of interactions. If you do a quick Google on Python and GIL, you’ll find all sorts of opinions out there. It is seen by some as a limitation. Whatever you feel about it, it’s deeply nested inside of the interpreter, and removing it is not a simple thing.

11:54 The original inventor of Python, a gentlemen named Guido, has stated that the GIL should only be removed if new code does not decrease the performance of a single-threaded program.

12:05 This is one of the reasons it’s stuck around for such a long time. To be clear, the GIL is an implementation detail of the interpreter. CPython and PyPy both have it but Jython and IronPython do not use the GIL, so depending on your underlying interpreter, you may or may not run into this.

12:25 This course was recorded in the fall of 2020. At the time of recording, PEP 554 was being worked on. This PEP introduces Multiple Interpreters in the Stdlib. This concept is something called subinterpreters.

12:40 CPython already has this idea and it allows C extensions to kick off multiple Python interpreters at a time. Each subinterpreter has its own GIL, and so C extension level code is able to get concurrency without the challenges of the GIL involved.

12:57 The interpreters are essentially independent of each other. PEP 554 proposes exposing these interpreters in the Python standard library. This doesn’t fix the GIL, but would give programmers access to more tools that allow them to work around the GIL. And as changes happen around the GIL, they could be exposed to the programmers earlier inside of these subinterpreters, hopefully improving the ability to do concurrency in Python.

13:25 Although the GIL is a challenge, the I/O-bound nature of most programs means that you can still do concurrency in Python. In the next lesson, I’ll introduce you to the threading library, a good way to do just that.

ldr3tlogy on Dec. 10, 2020

Excellent summary, easy to follow the producer, worker, consumer analogy, and various combinations of each, architecting different concurrent algorithms, thereby improving throughput.

waveman on Nov. 19, 2023

Small niggle: cooperative multitasking was introduced to perosnal computing on Mac System 5, which predated Win 3.1 by 5 years

Become a Member to join the conversation.