Introduction to Caching
00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’ll show you how caching works generally and a couple of different choices you can make in how you cache things.
00:12 Software often exhibits the principle of locality. Recently used things are likely to be used again. This works at many different levels in your architecture. Memory blocks that were used before are more likely to be used again than other memory blocks.
00:28 Stuff from the disk also often exhibits this behavior. If you have a complex calculation to do that is expensive to perform, there’s a chance you may need that value again.
00:40 And this is quite common on the Web. Your browser has caches built in to avoid refetching things from the Internet. Consider a news site. Even if it is updated regularly, the logos on it aren’t likely to change frequently.
00:53 Keeping the logos in a cache means saving the transit time of downloading them again. Caching values means you can get at them again quickly, but that comes at the cost of space. This cost is usually in memory, but it can also be at the cost of disk space.
01:10 If the cost of computing something or fetching something is high, a cache will save you time if that value is ever reused. This is so common that your processor has this concept built in. Modern CPUs often have several layers of caches to reduce the number of trips to RAM.
01:28
Virtual memory in your operating system allows you to have the total memory use of all your running programs exceed the amount of RAM you have. How the pages of memory are swapped in and out of RAM uses the same kind of mechanisms that I’ll be showing you for your cache. And finally, you can take advantage of these concepts in your own code with the use of the right library. In Python, that’d be the functools
library.
01:54 To better understand caches, consider a chunk of data. I’m going to name this chunk of data 5. You’ll see why in a second.
02:04 The 5 block has some friends, nine of them in fact. These blocks are big, big enough that only a few of them can be used at a time. Consider the base case of only being able to keep one of these blocks in memory at a time. When I need a block, I swap all of its contents into memory.
02:22 And if I want something else, I have to wipe out what is there to make room for another block, block 6 in this case. There aren’t a lot of choices to make if you can only fit one block into memory at a time. So let’s increase our RAM.
02:37 Now I’ve got three blocks. I’ll load the 2, then the 7, then the 0 block. Now I have a decision to make. If I want to load another block, something is going to have to go.
02:55 This decision in caching is called the caching policy. There are several different algorithms you can use to choose which of these blocks gets replaced.
03:06 Let’s look at several different kinds of caching policies. The first is called FIFO, short for First-In/First-Out. This means that whatever got put into the cache first is the thing that gets removed when the cache is full. LIFO, or Last-In/First-Out, means the last thing that got put in the cache is the first to go.
03:27 This mechanism favors whatever got loaded early on and then pretty much stays forever. LRU, or Least Recently Used, determines which item in the cache was used the longest ago. This will be where the rest of this course concentrates.
03:44 MRU, or Most Recently Used, is LRU’s cousin. And LFU, which just sounds rude, is Least Frequently Used, where you keep frequency counters on the blocks and kick out the one that has been accessed the least. FIFO is pretty cheap to implement, but you might be getting rid of something that is used a lot. LFU optimizes for use, but takes a lot of extra bookkeeping. Every time you kick something out, you have to scan through to find the lowest frequency or re-sort your list.
04:15 Both of these can be expensive operations. LRU is a compromise between the two, hence why it is one of the more common policies. Let’s see some of these in action to better understand the choices.
04:30 Once again, I’ve got three spots for blocks. This time around, I’m going to skip the block part and just show you the numbers, but the idea is the same. Along comes a 3.
04:40 The cache is empty, so the 3 gets inserted. Now a 5. There’s still space, so 5 gets filed away. Now another 3. This is called a cache hit. Block 3 can be returned from the cache, and the cache itself looks the same before and after a hit.
05:02 Remember these numbers are really representing large chunks of data. You’d never just cache a single number. Now the 9 block, it takes up the last spot. In the immortal words of Tim Curry, “Antici-pation.” So what do we do?” Here’s the 1 block, and the cache is full. This is called a miss.
05:27 Now you have to decide what gets kicked out. With the FIFO policy, you look at the first thing that was inserted, which in this case was the 3. The 3 goes away and is replaced by the 1 block.
05:41 Let’s do that again. Here comes a 4, another miss. FIFO says the 5 should be replaced,
05:52 and so it is. This cache can be implemented with an ordered list and a pointer. The pointer indicates what is to be replaced next, and when it is, the pointer advances by one spot. In the case of pointing to the last spot, the pointer wraps around to the first spot again.
06:09 This is what I meant when I said FIFO is relatively cheap. You really only need one additional pointer to implement this method. Note that although the 3 was used, it was still the first thing to go.
06:21 And that’s the weakness of this policy.
06:26 Let’s try it again with LIFO.
06:31 3, 5, hit, 9. So far, all the same. Along comes the 1. It’s a miss. LIFO says the last thing put in the cache is the first to come out. In this case, that means the 9.
06:51 LIFO feels weird to me, and I’ll show you why here’s the 4. Still a miss. What gets replaced? Well, the last thing, which was the 1. That was the last thing I put in, so it gets replaced.
07:09 The concept of Last-In/First-Out is common in queuing theory, which is related to this idea, but I’ve never seen it implemented as a caching mechanism. A LIFO cache will fill up with the first few items, then constantly replace the last one. It’s really cheap to implement—just use the last spot in the cache once it is full—but you’re probably not going to have a good balance locality-wise with this choice. Let’s try something else. All right.
07:36 This is the one you came to see: LRU, Least Recently Used. Little bit of deja vu.
07:49 So far, so good. And now a choice that needs to be made. Here comes the 1. That’s still a miss. You can draw a line from the 1 back to the list of things inserted until you are three items deep.
08:05 This is the LRU item, the least recently used. In this case, that’s the 5.
08:13 The power here is that the hit of the 3 block increases its locality, meaning it gets kept in this case. Let’s do another one. Here comes the 4. Still a miss. Draw the line.
08:29 And now it’s the 3’s time. Don’t cry for it. It had a good run. Because you have to track what was used least recently, the algorithm to implement this policy is more complicated than FIFO.
08:43 There is a memory cost and a slight access cost to this. The access cost is negligible in comparison to what you’re caching, or it should be. Otherwise, you shouldn’t bother cashing. But that memory thing might be expensive.
08:57 You’ll see more about how this is done in code in a future lesson.
09:02
I hear you might want some Python in your Python course. Next up, I’ll show you the @lru_cache
decorator and how to use it.
Become a Member to join the conversation.