Queues and Parallelism
In the lesson on stacks and the LIFO queue, I talked about the problem of thread-safety and race conditions. Python supports several different ways of doing parallelism. Inside of the same process, you can use threads or asyncio. And between separate processes, there is the
00:31 Thread safety problems only occur in concurrent code that executes inside of a single process. By contrast, multiprocess code doesn’t have the thread-safety issue, but doesn’t have access to the same memory space.
For multithreaded queue code, you want the
Queue object in the
queue library. It is similar to the
LifoQueue object discussed in a previous lesson in the section on stacks.
.get_nowait() just like
The order of it, though, is FIFO-based instead of LIFO. I’ll leave a demonstration to your imagination. You can go back in and mentally edit everywhere where it said
LifoQueue and replace it with the word
Queue. Threads and multiple processes are both techniques to achieve concurrency, but with subtle differences. Threads are good for I/O-bound work.
01:30 A lot of computing time is spent waiting for I/O. Having one thread wait while another continues to compute is a great way of achieving speedup. Python has a master lock in the interpreter called the GIL, or Global Interpreter Lock.
01:44 This lock is there to protect the interpreter from race conditions, but the cost is that it is severely limiting on CPU-bound parallelism. Hence, threads are only really useful for I/O-bound processing.
02:09 Each process gets its own instance of the Python interpreter, which is a bit expensive. It has a startup cost and uses more memory. You’re essentially running two copies of Python. To add to the complications, these two copies of Python need special treatment to talk to each other.
One of the ways of communicating between two processes in Python is to use a
Queue from the
multiprocessing library. Conceptually, this is similar to the thread-safe queues but is geared for interprocess communication, or IPC to the cool kids.
A good practice in Python, or any language for that matter, is to assume that any library code is not thread-safe unless the documentation explicitly tells you that it is. Even the lowly
print() function can be dangerous, so I’ve implemented
lprint(), a version of
print() that is atomic.
03:39 It takes a lock and make sure that everything printed to the screen happens at once, before releasing the lock. Without this method, you could potentially see both of the processes printing to the terminal at the same time, and your output could overlap. That in hand, the first thing you’ll need is a method that can shout messages.
This one takes a lock and a multiprocessing queue as arguments. In order to insert some variety in the pacing of this contrived example, I’ve got the process randomly choosing between either sending a message or sending a message and taking a nap. The use of
random.choice() here results in a one-in-four chance of taking a nap after sending a message.
That’s one heck of a power nap. This whole process is wrapped up in a
for loop and is done 10 times. Before exiting the function,
shout() queues the message
"done" to tell the other process that it is finished. Let me just scroll down here.
The companion method to
shout(), it takes a lock and a queue as arguments. The first thing that the
hear() function does is receive a message from the queue. This is a blocking call. If there is nothing on the queue, then this process waits until there is. When the first message is received, you enter the
while loop and the atomic print is used to display the message.
Then the loop blocks, waiting for the next message. Once it’s received, you go to the top of the
while loop. If the
"done", you’re finished. You exit the
hear() function. If it’s not, you repeat the process.
hear() are functions that each run in a separate process. Let me show you how to tie all this together. The code inside of the
if block will only get run if the script is run as an executable, not on import.
The execution section starts by creating a shared
Lock and a shared
Queue. Then, a
Process object is created. The
target argument specifies the function to run inside of the process, passing in
queue as its arguments. For this first
Process object, the
target is the
You don’t want the script to finish before the processes are done. Calling
.join() on the
Process objects causes the script to wait here until the processes are complete—in this case, when the
hear() functions finish inside of their respective processes.
Just after the
shout() process has queued its third message, the
hear() process has won the
lprint() lock and manages to print out what it heard. Both
shout() are still spinning, trying to compete.
shout(), for whatever reason, has won the lock again for the printing, so it’s still spitting out messages. It does this for
"7", and then finally randomness works out and it decides to take a nap.
Within those 100 milliseconds, the
hear() function is now free to keep using the
lprint() lock, and so it consumes the message
"7", and then
"8", at which point in time it blocks.
Somewhere inside of that process,
shout() has woken up. It messages the
"9" and the
"10" and then it messages
"done", and it finishes. At this point, there’s no longer any contention for the print lock, so the
hear() function takes in message
"10", prints it out, gets the
"done" message, which I don’t print, and then it finishes.
Two things have affected the difference in the output here. One, the randomness of when to nap, and two, the lock contention. This time around, the randomness of the napping has meant that the
hear() function seems to be getting more chances to get at its queue.
Become a Member to join the conversation.