Solving Race Conditions
00:00 In the previous lesson, I introduced you to threading and gave myself a strong urge to buy a baguette. In this lesson, I’ll be talking about when threads go bad, race conditions, and how they can muck up your results.
00:12 Race conditions come about when two concurrent chunks of code try to operate on a single thing, resulting in the timing of the execution affecting the result.
00:22 If A runs before B, the result is different than if B runs before A, hence, a race between A and B. This concept isn’t actually unique to software and can happen in circuitry as well.
00:34 That’s actually where the term has been stolen from. Programs with race conditions are no longer deterministic. The luck of how the OS scheduled things changes the result.
00:45 This means results tend to be inconsistent between runs. Race conditions aren’t solely the purview of threads. They can happen with any concurrent system, but since threads share the same memory inside of a process, they’re particularly prone to the problem.
01:01 To better understand race conditions, consider this chunk of code that’s updating a bank balance. Thread A reads the balance from a shared place, getting a value of $100.
01:15 It then increments the value in its local copy and then writes it back to the shared location.
01:23 Next, Thread B gets scheduled, it reads the value, gets 200 in its local copy, increments it to 250, and then writes it back to the shared spot. With me?
01:38 Alright, now let’s mess with the order of this transaction and see what happens.
01:44 Back to a hundred dollars and like before starting with Thread A, it reads the a hundred dollars and stores it locally and increments it, but this time the OS kicks in and schedules Thread B.
01:58 This also reads $100, stores it locally and increments it and the naughty OS decides Thread B has had enough time swapping back to Thread A. Thread A writes its incremented local value to the shared resource and then Thread B gets scheduled writing its local copy back to the shared resource and overriding the previous balance.
02:30 Two different execution schedules result in two different results. This is a race condition. Let’s see what one of these actually looks like in Python code.
02:43
I’m inside the appropriately named race.py
with some code that has a race condition inside of it. This script loops over the numbers from zero to 999 and if a number is even, it adds one to a counter, and if it is odd , it subtracts one from the counter. In a single-threaded program, the end result should be zero.
03:02 Add, then subtract, add, then subtract in a loop and you should end up at your starting place.
03:08 Before digging into the details here, a quick note. This code only experiences the adverse effects of a race condition in older versions of Python. Changes to how the GIL works means the race condition doesn’t trigger in Python 3.10 or above.
03:22 That doesn’t mean there isn’t a race condition. It just means that under newer versions of Python you’re accidentally protected from it. In the future, the race condition effect will likely happen again as the GIL continues to change.
03:35 I’ll come back to more on that later. For now, just know, as you’re trying this code yourself, the answers will vary depending on which CPython compiler and interpreter you are using.
03:46
There are several different ways of doing regular threads inside of Python. Here I’m using the ThreadPoolExecutor
, which is a thread manager and one of the simplest ways of doing it.
03:57 You construct a manager, pass it a function, and then Python creates multiple threads each running that function. This counter is my little bit of shared memory, the equivalent of the bank balance that I just showed you, and here is where the race condition happens.
04:13 This method is what changes the counter. It takes an argument, which is how much to change the counter by and then in a tight loop does that 10,000 times.
04:22
I’m basically just trying to give the race condition lots of chances to happen. The race()
function is what sets up the concurrency.
04:30
data
is a list of ones and minus ones a thousand items long, the sum of which should be zero. This list gets iterated over calling the change_counter()
function with each value, and here’s that ThreadPoolExecutor
I was talking about.
04:45
The max_workers
argument tells it how many threads to actually create, and then inside its context block you call map()
on the executor, giving it a reference to a function and a chunk of data to map. The data gets split up and passed into the threads you’ve created for parallel execution of the multiple instances of the change_counter()
function. Down at the bottom, here, I simply print out our result.
05:12 Let me go into the REPL and I’ll try this out.
05:17
Just as a reminder, I’m running this in Python 3.9. I’ll import the race()
function
05:24 and call it using two threads.
05:28 Remember, the result should be zero. My math’s a little rusty, but I’m pretty sure that’s not zero. It ends in a zero. Does that count? Let’s try it again.
05:39 Still not zero. Hmm? Still not zero in the other way. You definitely wouldn’t want this code in charge of your bank balance.
05:49 The problem you’ve seen so far is that some operations need to be atomic. That means they can’t be interrupted. Back to my bank balance example, if you could force the scheduler so that Thread A and Thread B’s three instructions got grouped together and couldn’t be split apart, the problem would be solved.
06:08 You do this with a lock, sometimes known as a mutex, short for mutually exclusive. A lock is a mechanism provided by the operating system or your programming language to create blocks of code that are atomic.
06:20 If Thread A locks the bank balance until it’s done and releases the lock, if Thread B tries to update the balance, it’s made to wait its turn. This gives you atomicity on the code protected by the lock.
06:33 An operation like a function in a library is considered thread safe if multiple threads can interact with it and not suffer race conditions. Typically, this means there’s a lock somewhere ensuring atomic access.
06:46 Most operations in Python are not thread safe and so much so you should assume that they aren’t in your code rather than the other way around.
06:55 By now, you might have a guess as to what the GIL is for. You’ve seen race conditions and you’ve seen that locks are the answer. Next up, let’s parse the term Global Interpreter Lock to see just what it is locking and why.
Become a Member to join the conversation.