Memory Is a Shared Resource
00:00 Earlier in the course, I mentioned that computer memory is like a book filled with short stories. Different processes/authors will come along and write stories/data into that book. And then when a story is no longer relevant, it is erased by the garbage collector. There’s only one problem.
00:20 What happens if two stubborn authors come along and try to write their own separate stories on the same pages of the book at the exact same time? What happens if they both try to modify this shared resource?
00:33 Chaos is what happens. Neither story will be legible because they are both writing over each other. This is where data loss can occur. To put in perspective how big of an issue this is, think about this scenario.
00:47 You and your spouse share a checking account that has $1,000 in it. You go to the ATM to deposit another $1,000. While you’re depositing your money, your spouse is at a different ATM, attempting to withdraw $1,200 from the same account.
01:04 The banking system needs to simultaneously deposit and withdraw money from the account at the same time, which is impossible. It must perform one transaction at a time.
01:15 The problem is, without any safety precautions in the code what order the transactions will occur in is fundamentally unpredictable. That decision is made by the operating system scheduler, which gives the CPU small chunks of time to perform each computation or—in this case, transaction—when it’s trying to do multiple things at once.
01:38 Ideally, the $1,000 is deposited first, bringing the balance to $2,000. Then, your spouse can withdraw $1,200 and the remaining balance will be $800. But what happens if the transactions occur in the opposite order? Well, first the banking system attempts to withdraw $1,200 from an account with $1,000 in it. This will likely trigger an overdraft flag, which may cause you to pay, let’s say, a $10 fee. Then, with a balance of -$210, your $1,000 deposit goes through, bringing your balance to $790.
02:20 That’s $10 less than before, all because of the overdraft.
02:25 The problem here is that two processes—or people—tried to modify their shared resource at once. This is called a race condition. Due to the unpredictable nature of these bugs, they are some of the hardest to fix.
02:40 Race conditions often appear in multithreaded programs—that is, a process that spins up multiple threads of execution to try to do multiple things at once.
02:51 If thread 1 tries to access data in memory, just as thread 2 is freeing it, the program might crash. Here, the threads are like the authors from our book analogy.
03:03 The best way to write multithreaded programs free of race conditions is to write thread-safe code. In thread-safe code, any shared resource that could potentially be accessed by multiple threads simultaneously is protected by what’s called a mutex.
03:18 (Short for mutual exclusion) A mutex has the job of ensuring that only one thread has access to a shared resource at any given time.
03:29 One form of a mutex is a lock, which quite literally locks the shared resource. When one thread is accessing the shared resource in memory, that resource is said to be locked.
03:41 Any other thread is denied access until that thread is done with it. Once the first thread is done, it releases the lock, which can then be acquired by another thread that needs to access the resource.
03:54 As you can imagine, writing thread-safe code can be hard. That’s why thread-safe code is often found in low-level languages like C and C++, which are more likely to be used for multithreaded programming due to the speed and unrestricted access to computer memory they have. Multithreading is helpful when you’re trying to write a high-performance compiler and interpreter for a language, a.k.a.
04:20 CPython. After all, we expect our Python code to run as quickly and efficiently as it possibly could. CPython must also make sure any shared resources are thread-safe or else your Python programs might start to exhibit unpredictable behavior. Understanding the basics of multithreading and its various challenges will better allow you to understand some of the decisions that went into the development of CPython.
04:47 One of the most controversial is the Global Interpreter Lock, coming up next.
Become a Member to join the conversation.