More lru_cache Features
00:00
In the previous lesson, I showed you how to use the @lru_cache
decorator. In this lesson, I’ll show you how to use its optional arguments to further control your caching needs.
00:12
The @lru_cache
decorator takes a couple of extra arguments. maxsize
specifies how many items are kept in the cache. It defaults to 128. And typed
changes the behavior of the decorator, creating a new cache for each type of argument passed in.
00:29
If your function takes a list or a string as an argument, typed
will mean there is a cache of lists and a separate cache for strings.
00:39 You might recall I made a snide comment about Europeans getting credit for writing something down. Well, let’s correct that. Let me introduce you to Pingala, an Indian mathematician who wrote down the concept called a Fibonacci sequence 1400 years before Fibonacci. Yep, you heard that right: 200 BC. In honor of Pingala, I’ve written a Fibonacci-like function that, instead of being based on the previous two values in the sequence, this one takes the previous seven values.
01:08
I’m just ramping up the amount of exponent in our exponential calculation. The LRU decorator is now using the maxsize
argument, which specifies how many keys are used in the caching dictionary.
01:21 Instead of using the command-line argument to specify what digit to calculate, I’m now using it to specify the size of the cache. I’ve also added a bit more debug information, this time tracking the depth of the call in the stack.
01:35 This will be helpful when I start playing with cache sizes. You’ll better be able to see where the misses happen. Let me scroll down.
01:44
Since I’m using the command-line argument to mean cache size, the for
loop is now hard-coded to calculate the seventh through eleventh values in the sequence. At the bottom here, I’m taking advantage of a feature built into the decorator.
01:57
Not only does the @lru_cache
decorator decorate the function, it also adds a method to the function. Remember even functions are objects in Python.
02:06 This method gives you info about the state of the cache. Let’s run this puppy.
02:19 I’ve specified a cache size of ten items.
02:22
7
gets calculated. It’s a miss. For any number in the sequence under 8
, a hard-coded value gets returned. 7
’s value gets printed out.
02:32
That’s a result of 1
and it took 5.87
millionths of a second to run. Next, the eighth value gets calculated. That’s a miss. It then starts to recurse. The depth goes up for each call.
02:46
The result for the eighth number is a value of 7
, which took about twice as long to calculate. But now the cache is primed. The ninth through eleventh values can rely on the cache, and speedy is their name.
03:00
The .CacheInfo()
call at the bottom tells you there were 22
hits and 11
misses in the cache. It also tells you what size was used for the cache and how big it currently is.
03:11 Let’s try this again with a smaller value for the cache
03:20
size. Lots more to unpack this time around. Let me scroll back up. The calculation for 7
was the same, as was 8
, but 9
is a little different. Because the cache is limited to five values, you start to get misses.
03:35 What previously was a cache hit now requires more calculation.
03:44 This only gets worse as you move on.
03:51
Because I’m still only calculating low values, this is all still happening quickly, but you can see how it could get out of control. There were only 4
hits this time versus 57
misses, and the cache is full.
04:05
By using the .CacheInfo()
method, you can fine-tune your caching needs. If you’re finding that your cache is eating up a lot of memory, turning the maxsize
parameter down will help you—of course, at the potential cost of some of that speed. I haven’t shown it here, but when playing around to write the course, I ran this same code without a cache. Even with small values being calculated, there was a four times speed-up gained by attaching the LRU.
04:34
How is all this built? Well, you can see for yourself if you’re really curious. The underlying decorator is actually written in both Python and C. If you dig around on your computer and find the functools
library, you’ll find the lru_cache
decorator, followed by an import of the same thing from the _functools
library. That second thing is the C version, which overloads the Python version if it is available.
04:58 Even if your system is using the C version, you can still have a look at how the Python version is implemented. The code is more complicated than you might first guess. For starters, it’s thread-safe.
05:09 That means two threads can call your function and not get into a race condition problem because they both invoke the decorator at the same time. As I mentioned before, the underlying cache is stored as a dictionary.
05:20
This gives good performance when you’re trying to determine if a call is a hit or a miss. The function’s arguments are the keys to the dictionary. The arguments are mushed together into a sequence inside of a function logically enough named make_key()
.
05:36 That sequence is then used as the hash key in the cache dictionary. A circular doubly link list is implemented to track what was least recently used. And this is built out of a list of lists.
05:50 The dictionary cache tracks the content, while the circular buffer tracks usage. Between the two, you get an LRU cache. Of course, all this fancy stuff takes up memory.
06:02 This is the ultimate trade-off of a cache: gain speed at the cost of RAM.
06:09 Two lessons from now, I’m going to show you how to implement an LRU cache that has time-based expiry of its contents. But in order to do that, you need to know how decorators work.
06:19 If you’re an old pro at decorators, skip the next lesson. If you need a refresher, that’s what comes up next.
Become a Member to join the conversation.