Loading video player…

Reviewing Threads and Race Conditions

00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’ll give you a refresher on Python’s threading library and what a race condition looks like.

00:09 The code in this lesson will be the basis for the fixes introduced in later lessons.

00:15 Threads allow two chunks of code to appear to run at the same time. In reality, the operating system is swapping back and forth between them quickly. As a lot of time is spent in software waiting for things, this can give you speed up, as one thread waits on some data to come back from the network, the other can be working.

00:33 The key to this though is that it is the operating system that decides when to swap threads. Threads operate in the same memory space, so if you’ve got values that two or more threads are changing, you have a problem.

00:46 Python is an interpreted language and although the underlying bytecode instructions might be atomic, a single line of Python typically translates into multiple bytecode operations.

00:58 That means there’s a chance that a thread can be swapped out partway through a single line of Python. This can be dangerous and can result in a race condition.

01:08 Race conditions are where the timing of operations affects the end result. This is the opposite of what you typically want in your code, which is determinism.

01:18 I’ll start out by showing you a threading example with a race condition in it. My bank charges me a monthly service fee. It feels way out of proportion for the services they provide, but that’s just banks being banks.

01:31 If I keep enough money in my account, they rebate the fee. So every month I see a charge and a refund. This means somewhere in the bank is a script that goes through and charges and refunds a whole bunch of accounts.

01:44 To do that quickly, you might want a multi-threaded program, which is what I’m going to show you next.

01:51 This is my approximation of that banking script. Since this course is all about dealing with race conditions, I want to make sure I can reliably cause a race condition.

02:01 Although the operating system can swap between threads at any time, if I add a delay into the code, I can increase the likelihood of this happening. To experiment with that a bit, this highlighted line grabs the first argument to the script and turns it into a float value, which will determine the delay that gets used.

02:20 My bank account object has two key methods, one for withdrawing and one for depositing. This code here is what changes the balance during the withdrawal. I’ve got a sleep in the middle of it to try to increase the chance of a thread swap.

02:34 Yes, you could combine the subtraction and storage lines here into one, but then it’s less likely for the swap to occur between them. I get that this is a little convoluted, but it makes the race condition happen consistently, which is good because that means later you can see whether you’ve removed the problem.

02:53 Same idea here for the deposit. Instead of subtracting from the balance I add, and again, there’s a delay between that as well. Let me scroll down a bit.

03:04 To maximize our chance of a thread swap, I’m going to perform a number of withdrawals and deposits from a number of different accounts. This function is a wrapper for applying the service fee to every one of the accounts in this system, and then this function is the reimbursement.

03:19 If there was no threading, the final balance in every account after these functions have run should be the same as before, fee charged, then fee reimbursed.

03:31 Here I’m creating 50 accounts each with a thousand dollars in them, and this is where the threading happens. There are a few different ways of creating threads, but I like the ThreadPoolExecutor.

03:42 I especially like that it can be used in a context block, meaning you don’t have to remember to manually clean your threads up. For this example, I’m going to use only two threads.

03:52 One is for charging the fees and one is for reimbursing them. It is possible in this case for any given account that the reimbursement might happen before the charge, but for our case, let’s ignore that.

04:04 The end result would still be the correct balance if there isn’t a race condition.

04:09 The final step in the script is to print out the results. I use the itertools.batched() function to break my 50 accounts up into five batches, and then print out the balances for each account in each batch.

04:21 I’m using the end argument to the print() call so that the line feed isn’t applied automatically, thus allowing my 10 batched accounts to print on a single line.

04:31 The final print() is there to force the carriage return after the batch.

04:35 Let me open a shell and I’ll try this out.

04:40 I’ll start with a delay of zero. Looks good, right? All 50 balances are what they should be. Why’d this work so smoothly? Well, the default switch time between threads is small, but the amount of processing time to do the work here was even smaller.

04:56 That means that what probably happened was one thread ran and finished before the other even started, so there’s no swap, so there’s no problem.

05:05 By the way, if you want to know what your thread switch time is, you can call the getswitchinterval() function in the sys library.

05:12 On my machine, it’s five milliseconds, which doesn’t sound like a lot, but you can do a whole bunch of computing in that little time on a modern machine.

05:21 Okay, let’s try this again, this time with a delay.

05:29 Waiting on the threads, and there’s the results. Not so good this time around, huh? Not one of the values is correct. 10 of them are overcharged and the other 40 got a nice surprise.

05:42 It’s like that card in Monopoly. Bank error in your favor, collect $14.95. Of course, if you’re the bank, that’s a problem. So why’d this happen?

05:56 Remember this line? At this point in the code, the balance hasn’t been updated. Let’s say this line runs and then during the sleep call, the other thread wakes up.

06:05 That thread accesses the balance for its own calculation, and since the withdrawal hasn’t updated the balance, the deposit gets the original balance of a thousand dollars.

06:16 It then finishes updating the balance.

06:19 Then if the thread swaps back, the new balance value here is still based on the calculation from the original. When this line runs, the balance gets updated without taking into account the change the deposit completed.

06:31 Since the deposit is already done, the final result is just the withdrawn giving you 985.05. It’s as if the deposit portion never happened, and of course, if the swap happens during the deposit phase instead, you end up with 1014.95.

06:48 This is our race condition.

06:52 Now that you’ve seen the problem, in the next lesson, I’ll show you a possible solution.

Become a Member to join the conversation.