Race Conditions and Deadlocks
00:00
In the previous lesson, I introduced you to the threading
library in Python. In this lesson, I’m going to take a little side journey and show you what happens when race conditions complicate your concurrent program.
00:12 I have a little bit of a confession to make. I actually went to school for a lot of this work. My research in grad school was on concurrency and parallelism and distributed computing. And it’s been many, many years since I did that, and the one thing I still remember is concurrency is hard.
00:30 You should always choose carefully whether or not it’s the right thing to do. It introduces all sorts of complications. In an earlier lesson, I went through list including things like deadlocks and resource starvation that can be quite complicated.
00:44 Race conditions are related to deadlock, in that the behavior of the program changes depending on the order of execution. This is particularly noticeable when two threads share the same memory and objects.
00:56 The nature of threaded computation means you can’t predict what order the code is going to run in, and depending on how those shared objects are used, it can affect the outcome.
01:06
Locking mechanisms and threading.local()
in Python can help you, but you have to remember when to use them, when things are local and when they don’t need to be local. A surprising number of things that are common tools in the programmer’s tool belt are not thread-safe.
01:21
You’ve already seen the example of a requests.Session()
. The requests
library itself is not thread-safe. If you’re doing multi-threaded programming with the requests
library, you need to make sure that you’ve got locks in place.
01:34
Even something as fundamental as print()
isn’t thread-safe. It doesn’t happen often, but it is possible for two different threads to print at the same time, and you end up with a print buffer partially populated from the first thread, populated by the second thread, and then populated by the first thread, which ends up being a mess on the screen.
01:54
If you’ve ever done any server-side web programming, you may notice that logging
is the way to go. Python’s logger is actually thread-safe, and so multiple threads in a web server don’t cause this problem.
02:05 Unfortunately, it does this through locking, which slows things down. And this is the compromise you always have to do with concurrency. The speed-up is there, but it’s often done at the expense of thread safety.
02:18 The safest thing to do is assume that the code you are using is not thread-safe, unless it explicitly says that it is.
02:28 The program I’m about to show you has a race condition in it. It’s also what we in the business call “a convoluted example.” You’ll never see code that actually looks like this, but it does show off the fact that the race condition can happen.
02:42 Race conditions are heisenbugs, the kinds of bugs that often disappear when you’re looking for them. Things are timed correctly, the bug doesn’t happen. Things are timed incorrectly, and the bug does happen.
02:53 And the timing is all related to how the operating system and how Python schedules the threads and how long each thread is asked to execute. Race conditions can be brutal to try and track down.
03:07
The heart of this program is this function at line 10. The function named race()
takes a number of threads, and I’m going to call it starting with one thread and then a few more, and you’ll see how it changes based on that.
03:21
The key part of this function is messy line number 13. This list comprehension generates a list of numbers that alternate between -1
and 1
. There’ll be 1000
numbers in it, 500 of which are -1
and 500 of which are 1
.
03:39
The function that will be run in parallel on the threads starts at line 5, and this is the change_counter()
. This uses a global variable called counter
, and all it does is 10000
times add to the global counter
the number passed in. What’s going to be passed in will either be a -1
or a 1
. If this code is run synchronously, the sum total of -1
and 1
should be 0
, no matter how many times you do it. Line 15 defines the thread pool,
04:11
and the mapping happens between the data, which is the list that alternates between -1
and 1
, and the change_counter()
that adds that amount to the global.
04:22
The max_workers
parameter of the thread pool indicates how many threads to run. If this is called with one thread, the program is synchronous.
04:31 If this is called with more than one thread, the program will be concurrent.
04:37 Let’s see this in practice. Import the function,
04:42
and I’m going to start by calling the function with 1
thread.
04:47
Since this is synchronous, the race condition doesn’t happen yet. A -1
is added to the counter
10,000 times, then 1
is added to the counter
10,000 times, and it does that for 998 more times, the end result being 0
. Now let me try this again with a few more threads.
05:09
Mmm. That’s not 0
. And this is the problem. From a mathematics standpoint, the reason this is failing is line 7. You can think of that for
loop like a multiplier. In the synchronous case, -1
is multiplied by 10,000 and added, and then 1
is multiplied by 10,000 and added, and you result in 0
. In the threaded case, you get partway through that multiplication.
05:36
So instead of it being 10,000 times -1
, it might be 500 times -1
, and then the other case comes in. That might be 600 times 1
, giving you a delta.
05:49 The sum total of the multiplication factors is not the same as the synchronous addition, so you get a race condition. You get the wrong number. Not only is it the wrong number, it’s not even consistently the wrong number.
06:04 If I run this again, I get a different result. And this is because the scheduling changes how things are happening. The effect of the multiplier in line 7 is different this time through because the amount of time each thread is run is different, and you get a different number. Third times the charm.
06:26 Well, still not working. This is an exaggerated example. I’ve done it on purpose to show you what happens. Race conditions generally are far more subtle than this, and it would be far harder to find the problem if one in a hundred executions actually caused the bug to happen, and that’s a common result in race conditions. It makes them nasty to find.
06:50
I’d like to tell you that asyncio
solves the race condition problem. Unfortunately, it doesn’t, but it is another way of programming concurrency in Python and is of interest.
Christopher Trudeau RP Team on Jan. 5, 2021
Hi hauntarl,
The race condition is a caused by a combination of two things: the variable counter
being global and shared across all threads, and the
fact that the change_counter()
method can be interrupted at any time.
The crux of the matter is the assumption that “counter += amount
” can’t be interrupted. If you put a semaphore lock around “counter += amount
” the code would likely work fine.
I suspect what is happening is that “counter += amount
” breaks down into several different instructions in the underlying virtual machine and occasionally it is being interrupted midway through those instructions.
The code you added to try to show the problem happening is only recording the state outside of the problem area: after the increment/decrement is done. To truly instrument the problem you would have to be able to inspect the value of “counter
” during the “counter += amount
” statement.
The reason I used two large ranges
(once in the “data” variable, the second in the for
-loop in change_counter()
) was to increase the likelihood of just such an interruption. If you drop those numbers down to lower values you might have to run the program a large number of times before you run into a problem – this is what makes race conditions so brutal to catch in real life.
hauntarl on Jan. 6, 2021
Thanks for the wonderful explanation, I think I finally get it.
If we try to breakdown the statement counter += amount
in a 2 threaded system:
-
evaluation of expression
counter + amount
: (Read operation)thread1:
counter = 0
,amount = 1
, evaluation results in1
thread2:
counter = 0
,amount = -1
, evaluation results in-1
As there is no mutual exclusion, both the threads simultaneously read the same “counter” and are trying to add “amount” to it.
This here is a race condition as no thread is willing to wait for the other one to update the counter value after its evaluation, which results in one of them using the old counter value depending on which thread is able to perform write operation first.
-
assignment operation
counter = evaluated value
: (Write operation)thread1:
counter = 1
thread2:
counter = -1
Let us assume that “thread1” is able to perform the write operation before “thread2”, the new “counter” value becomes 1, soon after “thread2” performs its own write operation and writes -1 to the “counter” Finally the counter value results in -1, instead of 0 :)
Correct me if I am wrong and feel free to provide more input.
Christopher Trudeau RP Team on Jan. 6, 2021
Hi hauntarl,
Yep, you’ve got the right idea. The specifics are slightly different, but conceptually you’ve got it.
Remember that Python is an interpreted language. Statements in Python get translated into a series of steps called byte code, it is the byte code that actually gets run by the interpreter. Why do I bring this up? Because even things that look like single statements in Python aren’t necessarily atomic.
You can see the bytes that are run by the interpreter using the “compile()
” built-in and the “dis
” library:
>>> counter = 0
>>> amount = 1
>>> code = compile("counter += amount", "<string>", "exec")
>>> code
<code object <module> at 0x7fe81635ec90, file "<string>", line 1>
>>> import dis
>>> dis.disco(code)
1 0 LOAD_NAME 0 (counter)
2 LOAD_NAME 1 (amount)
4 INPLACE_ADD
6 STORE_NAME 0 (counter)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
Notice here that “counter += amount
” turns into 6 separate byte code operations, and the thread interruption can happen during any of these steps.
The problem is the same as you described, just worse – there are five different places that this thing that looks like a single statement can get interrupted. This only gets worse for any larger chunk of code.
Satish on April 13, 2021
“ The sum total of the multiplication factors is not the same as the synchronous addition, so you get a race condition. ” This doesn’t intuitively explain why the race condition occurs as already pointed out in earlier comments . Essentially :
-
Multiple threads could be reading the same state of a particular variable ( in this case
counter
) . -
Subsequently, out of the evaluations ( of the expression
counter + amount
) being performed by multiple threads ( that would be active at that particular point in time ) , only one of the threads effectively ends up updatingcounter
( overwriting the value assigned by the other threads) . This in turn could lead to either a ‘deficit’ or ‘surplus’ value ofcounter
depending on which threads ( the ones adding -1 v/s those adding +1 ) dominated the in ‘final’ assigment of counter in each group of currently active threads .
In the synchronous version, mathematical equilibrium is maintained because each evaluation of counter += amount
forms the basis for the next
evaluation of this statement . But , in the concurrent version , that doesn’t happen as multiple threads could be using the same ‘base’ value for
starting their computation of counter + amount
and as a result most of the iterations do not contribute towards maintaining the equilibrium of
counter
.
Of course , this is an oversimplified explanation and not accurate ( given the complexities already pointed out the author in the comments ) .
hwoarang09 on May 30, 2022
I think Python uses GIL so only one thread excutes at any one time. When GIL is applied, race condition should occur but it is… Why does a race condition occur? (Sorry for my English)
Bartosz Zaczyński RP Team on May 30, 2022
@hwoarang09 The Global Interpreter Lock (GIL) ensures that only one CPython instruction can work at any given time, effectively making it an atomic operation that can’t be interrupted by another thread. However, in many cases, you’ll want to perform a transaction comprised of more than one instruction, which should all run as an atomic operation. Without explicit locking and mutexes, you can still experience a race condition.
marcinszydlowski1984 on Aug. 3, 2022
@hwoarang09 The Global Interpreter Lock (GIL) ensures that only one CPython instruction can work at any given time, effectively making it an atomic operation that can’t be interrupted by another thread. However, in many cases, you’ll want to perform a transaction comprised of more than one instruction, which should all run as an atomic operation. Without explicit locking and mutexes, you can still experience a race condition.
However, the situation can be easily observed if the counter is stored or cached locally in the context of specific thread and update only at the end of computations. Also, the bigger values of delay is simulating, the bigger chance of encountering corrupted data.
Here are some modifications of the code
Example from this video just only with counters cached locally (usage of “local_counter”)
# io_bound/race.py
from concurrent.futures import ThreadPoolExecutor
counter = 0
def change_counter(amount):
global counter
local_counter = counter
for _ in range(10000):
local_counter += amount
counter = local_counter
def race(num_threads):
global counter
counter = 0
data = [-1 if x %2 else 1 for x in range(1000)]
with ThreadPoolExecutor(max_workers=num_threads) as executor:
executor.map(change_counter, data)
print(counter)
More advanced example for both “manual” and “ThreadPool-way” of creating and running threads; also simulating delay as a parameter
import argparse
import threading
import random
import time
from typing import List, Tuple
from concurrent.futures import ThreadPoolExecutor
counter = 0
def change_counter(data: Tuple[List[int], bool]):
global counter
for d in data[0]:
local_counter = counter
local_counter += d
print(data[0], data[1])
if data[1]:
time.sleep(random.random())
counter = local_counter
print(f'{counter=} ({threading.current_thread().name})')
def prepare_data(num_threads: int) -> List[List[int]]:
"""
Gets sample data for threads. Each thread has its own array of three integers multiplied by index + 1.
Example:
num_threads = 2, data = [[1, 2, 3], [2, 4, 6]]
num_threads = 3, data = [[1, 2, 3], [2, 4, 6], [3, 6, 9]]
:param num_threads: number of threads
:return: array of 3-number arrays for each thread
"""
data = [[]] * num_threads
for n in range(num_threads):
data[n] = [(i + 1) * (n + 1) for i in range(3)]
print(data)
return data
def race_threads_manually(num_threads: int, simulate_delay: bool):
data = prepare_data(num_threads)
# Creating and running threads
T = [None] * num_threads
for it in range(len(T)):
T[it] = threading.Thread(target=change_counter, args=([(data[it], simulate_delay)]), name=f'Thread {it + 1}')
T[it].start()
# Waiting for all to finish
for it in range(len(T)):
T[it].join()
print(f'{counter=}')
def race_thread_pool(num_threads: int, simulate_delay: bool):
data = prepare_data(num_threads)
args = [(data[d], simulate_delay) for d in range(num_threads)]
with ThreadPoolExecutor(max_workers=num_threads) as executor:
executor.map(change_counter, args)
print(f'{counter=}')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-n', '--threads_number', type=int, help='Number of threads.', default=1)
parser.add_argument('-t', '--simulation_type', choices=['loop', 'pool'], help='The way of running threads: by thread pool or in a loop.', default='pool')
parser.add_argument('-d', '--simulate_delay', action='store_true', help='Simulates delay in milliseconds.', default=False)
args = parser.parse_args()
if args.simulation_type == 'pool':
func = race_thread_pool
else:
func = race_threads_manually
func(args.threads_number, args.simulate_delay)
Evangelos Kostopoulos on May 21, 2024
with Python 3.11.5 I am always getting the expected result 0(zero) no matter how many threads I am using. Am I missing something - meaning has anything changed recently?
race(1)
0
race(1000)
0
race(500)
0
Christopher Trudeau RP Team on May 21, 2024
Hi Evangelos,
Yes, I can confirm that it is no longer a race condition in Python 3.11. I dug around a bit and it looks like the problem existed in 3.9 and went away in 3.10.
I used the dis
module to look at the bytecode created and 3.8 and 3.10 produce different bytecode. I also found a post on Reddit that talks about the fact that CPython introduced an optimization that changed which operations caused the GIL to be acquired and released.
For the sample code, the GIL isn’t being released as often, and as the GIL is held throughout the code, that ensures you no longer have a race condition.
Beyond the problem that this makes it a lousy example, unless you’re using Python 3.9 or earlier, there is another complication: the language spec does not require the compiler to behave this way. Some other optimization could possibly cause this code to have a race condition in it again in a future version of Python.
Furthermore, this is CPython’s way of doing it, if you switched to a different interpreter like PyPy, you might see different behaviour. This is one of the reasons race conditions are so tricky to find and debug.
The post on Reddit is quite detailed, in case you’re interested:
And the original thread if you want to see what others said:
www.reddit.com/r/learnprogramming/comments/16mlz4h/race_condition_doesnt_happen_from_python_310/
Welcome to multi-threading, having fun yet? ;)
Become a Member to join the conversation.
hauntarl on Jan. 5, 2021
I tried to keep track of each thread, what amount it is using and how many iterations it is doing with that amount.
From the above output, it is clearly visible that each thread is doing different number of iterations for different amount value, but if we take the delta for iterations done for amount = 1 and amount = -1, it comes out as zero. Yet the counter value comes out as some arbitrary number instead of 0.
I am not able to make sense of this output :(
Also I am finding it difficult to walk through the program and understand exactly how the race condition is occurring.