LRU in Python
Your hardware and your operating system implement caches to help make your computing experience snappy, or at least snappier than it would be without a cache. To take advantage of the same concepts in your Python code, you can use the provided
@lru_cache decorator. As it is a decorator, it is used by wrapping a function that you already have.
00:43 I’m going to stick with the first case, as it means writing less code to demonstrate, but in any situation where the results take time and might be reused, that’s a situation where you might consider caching.
01:06 Big O notation represents a bounded runtime for a computation. The O is short for Ordnung, which means the order of approximation. My apologies to any German speakers in the audience for butchering that pronunciation.
01:32 Getting the first item from a list is usually O(1). The length of that list is irrelevant to the cost of accessing the first item. Linear time, or O(n), is a runtime that is proportionate to the amount of data.
01:47 Finding the largest value in an unordered list is O(n). You have to scan the whole list. Quadratic time is where things start to get messy. In this case, the runtime grows with the square of the amount of data.
02:11 It gets expensive really fast. The example I’m going to be playing with is the Fibonacci sequence. This goes from costing microseconds to calculate early on in the sequence to seven orders of magnitude slower than that by the thirty-fifth spot in the list.
02:31 In case you’re not familiar, let me briefly introduce you to the Fibonacci sequence. Although it sounds like it might be yummy smothered in an Alfredo sauce, the Fibonacci sequence is a numeric computation named after an Italian mathematician who documented the concept in 1202. As with a lot of things in history, it was discovered long before that, but the first European to write it down gets the credit. The sequence goes like this: you start with a 1, then another 1. Then you sum those two values to get 2.
03:03 I didn’t even have to use my fingers. The next item in the sequence is the sum of the previous two numbers. That would be 3. So then 5, then 8. And it goes on forever. Interestingly, this kind of geometric progression is common in nature and is related to spirals.
The number of seeds in each ring of a sunflower follows this kind of sequence. This algorithm is best tackled with recursion. Let’s go take a look. In the top window here, I have a program that prints out ten digits from the Fibonacci sequence, ending with the number passed in on the command line. Lines 3 through 7 are the recursive algorithm. If the number in the sequence is
1, then it returns
1. If the number is more than that, it is based on the two previous numbers in the sequence summed together, which here I’m using recursion to calculate. Lines 10 through 12 get the command-line argument and turn it into an integer, then calculate the range of numbers being requested.
0 to indicate the first number in the sequence because I’m a programmer, and lists start with the zeroth item. Lines 14 through 16 loop through the range, printing each of the numbers.
Python’s implementation of the LRU cache is provided as a decorator inside of the
functools library. All you have to do is wrap your function with the decorator, and you’re instantly cached. That’s it. It’s that easy. Let’s go play with it.
Same basic code, just a couple of new additions. I import the
lru_cache decorator from
functools. I wrap the
fibonacci() function with that decorator. And finally, I’ve added a little bit of debug through a
print() statement in the Fibonacci calculation. Before I do this, take a guess.
Pretty awesome, right? You’ve got the thirty-fifth number coming out at a speed similar to the first couple of numbers. To understand how this worked, think about the
print() statement that I added.
print() statement is only showing up once between the thirty-fourth and thirty-fifth. That is because of the cache. The decorator gets called before the function does, sees that it has a relevant value, and returns that value instead of calling the function. As each call to the Fibonacci function uses the two values before, you’re guaranteed to have them in the cache, which means once the cache is primed, you only need to do one computation per number.
09:03 If your function has an argument that can’t be used as a key in a dictionary, your function won’t work with the LRU decorator. Another minor thing: due to how the arguments are inserted into the dictionary, the LRU decorator considers the order of arguments important.
If you’re using two named arguments, but passing them in in a different order in a future call, you’ll get a cache miss. So
foo() called with
foo() called with
a=3, although programmatically the same thing, end up taking two distinct spots in the cache.
Become a Member to join the conversation.