In the previous lesson, I introduced you to the
threading library in Python. In this lesson, I’m going to take a little side journey and show you what happens when race conditions complicate your concurrent program.
00:12 I have a little bit of a confession to make. I actually went to school for a lot of this work. My research in grad school was on concurrency and parallelism and distributed computing. And it’s been many, many years since I did that, and the one thing I still remember is concurrency is hard.
00:30 You should always choose carefully whether or not it’s the right thing to do. It introduces all sorts of complications. In an earlier lesson, I went through list including things like deadlocks and resource starvation that can be quite complicated.
00:44 Race conditions are related to deadlock, in that the behavior of the program changes depending on the order of execution. This is particularly noticeable when two threads share the same memory and objects.
Locking mechanisms and
threading.local() in Python can help you, but you have to remember when to use them, when things are local and when they don’t need to be local. A surprising number of things that are common tools in the programmer’s tool belt are not thread-safe.
You’ve already seen the example of a
requests library itself is not thread-safe. If you’re doing multi-threaded programming with the
requests library, you need to make sure that you’ve got locks in place.
Even something as fundamental as
print() isn’t thread-safe. It doesn’t happen often, but it is possible for two different threads to print at the same time, and you end up with a print buffer partially populated from the first thread, populated by the second thread, and then populated by the first thread, which ends up being a mess on the screen.
If you’ve ever done any server-side web programming, you may notice that
logging is the way to go. Python’s logger is actually thread-safe, and so multiple threads in a web server don’t cause this problem.
02:05 Unfortunately, it does this through locking, which slows things down. And this is the compromise you always have to do with concurrency. The speed-up is there, but it’s often done at the expense of thread safety.
02:28 The program I’m about to show you has a race condition in it. It’s also what we in the business call “a convoluted example.” You’ll never see code that actually looks like this, but it does show off the fact that the race condition can happen.
02:42 Race conditions are heisenbugs, the kinds of bugs that often disappear when you’re looking for them. Things are timed correctly, the bug doesn’t happen. Things are timed incorrectly, and the bug does happen.
The heart of this program is this function at line 10. The function named
race() takes a number of threads, and I’m going to call it starting with one thread and then a few more, and you’ll see how it changes based on that.
The key part of this function is messy line number 13. This list comprehension generates a list of numbers that alternate between
1. There’ll be
1000 numbers in it, 500 of which are
-1 and 500 of which are
The function that will be run in parallel on the threads starts at line 5, and this is the
change_counter(). This uses a global variable called
counter, and all it does is
10000 times add to the global
counter the number passed in. What’s going to be passed in will either be a
-1 or a
1. If this code is run synchronously, the sum total of
1 should be
0, no matter how many times you do it. Line 15 defines the thread pool,
Since this is synchronous, the race condition doesn’t happen yet. A
-1 is added to the
counter 10,000 times, then
1 is added to the
counter 10,000 times, and it does that for 998 more times, the end result being
0. Now let me try this again with a few more threads.
Mmm. That’s not
0. And this is the problem. From a mathematics standpoint, the reason this is failing is line 7. You can think of that
for loop like a multiplier. In the synchronous case,
-1 is multiplied by 10,000 and added, and then
1 is multiplied by 10,000 and added, and you result in
0. In the threaded case, you get partway through that multiplication.
05:49 The sum total of the multiplication factors is not the same as the synchronous addition, so you get a race condition. You get the wrong number. Not only is it the wrong number, it’s not even consistently the wrong number.
06:04 If I run this again, I get a different result. And this is because the scheduling changes how things are happening. The effect of the multiplier in line 7 is different this time through because the amount of time each thread is run is different, and you get a different number. Third times the charm.
06:26 Well, still not working. This is an exaggerated example. I’ve done it on purpose to show you what happens. Race conditions generally are far more subtle than this, and it would be far harder to find the problem if one in a hundred executions actually caused the bug to happen, and that’s a common result in race conditions. It makes them nasty to find.
Become a Member to join the conversation.