Comparing Performance
00:00
In the previous lesson, I showed you that even sum()
calls the underlying add method on an object. In this lesson, I’ll talk speed and algorithm selection.
00:10 When considering performance, there are two things you should think about. How fast something is and the less obvious one, how much of a memory hog it is. For small amounts of data, memory often isn’t a big deal, especially on modern computing equipment.
00:24 But for large amounts of data it can still be important. If you’ve got a petabyte, you probably don’t wanna copy it to perform an operation. On the other hand in place modification of data can mean more complicated algorithms.
00:37 In every single example I’ve shown you so far, I’ve at least doubled the amount of data in memory. The result is a new list that has all the values in it that the original list contained.
00:48 Doing it this way made it easier to code and was non-destructive to the original list. But you might want to consider if you’ve got massive amounts of data doing something differently.
00:58 It’s actually even worse than that. Some of the algorithms created multiple copies, but I’ll get into that in a second.
01:05 Not always, but a lot of the time faster tends to mean more memory intensive. Computing’s always a trade off whenever it comes to performance, the first thing you should always do is measure.
01:15 It isn’t always intuitive, so let’s go do that.
01:20
For the first time in this course, I’m not in flatten.py
. This code calls the functions in flattened up high and measures their performance a few constants to start. I wanna flatten something a bit bigger than the nested list I’ve used so far, so how about a 1000 item list of 1000 item lists?
01:39 That’s that size value here.
01:43 The timing tool, I’m going to use measures in microseconds as I’m going to be doing multiple iterations this constant allows me to convert to a more readable millisecond value.
01:53 And as I just said, I’m going to run each function multiple times in order to get an average runtime. This value of 10 is the number of executions I’m going to do.
02:03 Instead of using a list of lists of characters this time I’m going full matrix and doing a list of lists of numbers. This is a funky looking list comprehension that uses the range function and turns it into a list.
02:16 So it starts out as a list with a thousand values in it. Like with adding lists, multiplying lists is overloaded, creating duplicates. In this case, it’ll concat together my list of a thousand items, a thousand times giving me a big old nested beast.
02:33 I’m gonna wanna sort the results to make them easier to read, so I’m gonna have to stick them in a dictionary first so I can sort them afterwards.
02:42
And now onto the meat of the timing algorithm. The dir()
function returns a list of strings of the names of the things in an object. For a module object, that’s the list of things inside the module.
02:56
By doing this to the flatten module, I’m going to get back all the names of functions as strings, as well as some extra stuff that I don’t want. The loop iterates over all those names. As every one of my functions in flatten has a name beginning with flatten, here is where I can ignore the system stuff that comes back from dir()
.
03:16 If it doesn’t start with flatten, I’m not interested and I’ll skip to the next item.
03:21 And this is the actual timing code.
03:25
The timeit()
function in the timeit module takes a string representing some Python to run. You also pass it some context and how many times you want to run the code.
03:35
The f-string here constructs the name of the flatten()
function passing in the matrix object. Since the matrix object is in this Python file, it is considered part of the global collection, which I’m passing to timeit()
.
03:50
This makes the matrix available inside the code that timeit()
is calling. timeit()
calls this function ten times that’s NUM, and records an average response which I’m then storing in the results dictionary.
04:05
Once the results have been collected, I sort them. The items method on a dictionary returns tuples of key names and values by passing a lambda to the key argument of sorted
.
04:16
I’m telling sorted
to sort based on the second part of the key value tuple, which in this case is the timing result. This is actually my most common use of lambdas, changing what part of a data structure is used to sort a list of data structures.
04:31
The result of sorted
then gets iterated over, which is still a iterable of key value pairs but now sorted by results. For each one of them, I print it out. I’ve done a little bit of fancy formatting here with my f-strings so that I’ve got a nice table of function names and their average execution times.
04:51 Are you ready? Let’s try this out.
04:59 Some of these are coming back quick and some of them aren’t.
05:07 I wonder if I’d get sued if I whistled to the Jeopardy theme song here.
05:16 It’ll get there eventually.
05:24 Spoiler alert, we have some performance problems
05:32 and there you go. The results in sorted order. There was a pretty big difference between some of them weren’t there? Note that the last four here all took around 900 milliseconds and since each of these functions got called ten times, that’s why it was roughly nine second pauses in the response.
05:49
Hence the need to whistle Jeopardy. I’ve actually run this code a few times and the order changes a bit, but not the magnitudes. The concat
, that’s our plus equals from the beginning the extends
and the reduce
using iconcat
all come in around the same value of two to three milliseconds.
06:07 This makes sense when you think about it at least with the plus equal and extends, they’re doing the exact same thing and it turns out the iconcat operator function is essentially plus equals.
06:17 So reduce does the work of doing the iteration, but the gluing nested list together is the same as those other two. Hence why these three tend to perform at about the same level.
06:28
The call to chain
is a bit slower, but remember that chain
returns an iterator, which I then cast to a list. That conversion is what is slightly more expensive.
06:38
If you only needed to process a flattened representation rather than store it as a list, chain
would likely be about the same speed as the other with the added benefit of not taking up extra memory. List comprehensions are pretty quick, but our comprehension is iterating over each value.
06:55
This is kind of like the append call in the flatten_loop
version, which you can see is a shade slower than the comprehension itself. These are still relatively performant, but visiting each individual value is more expensive than simply gluing the lists together.
07:10 Alright, well that was the good news. The bad news is the four down here at the bottom. The reason for this is each of them is creating intermediate lists as you go along.
07:21
That means a lot of list copying, especially when you’ve got a thousand lists nested inside your list. The hint as to what’s going on here can be found in the fact that the reduce
using concat
takes 878 milliseconds, whereas reduce
with iconcat
only takes 2.9.
07:40
They’re both using reduce
, so it isn’t reduce
causing the problem.
07:45
The concat
operator is returning a new value based on two added to together while iconcat
is adding to the existing value. It’s a plus equals remember. For our lists.
07:57 that’s the difference between a thousand temporary lists getting created or just extending one. Those temporary lists are expensive and in this case, not only is the approach slower, but it would use more memory as well.
08:11 Remember when I said faster is often more memory hungry? Well, this is a counter example and why I said often rather than always and why I hinted you should measure it.
08:21 Oh, and by the way, if you were curious about the NumPy version I showed in a previous lesson, I timed it off screen. Its performance was rather variable ranging from four milliseconds to sixteen milliseconds.
08:32 I’m not sure exactly why that is. NumPy is known to do some special optimizations, so it’s possible my processor was just busy enough to get in the way. Either way, this is similar performance to our other middle of the pack algorithms and is obviously using some sort of in-place mechanism to avoid our worst case times like the bottom four here. In the final lesson, I’ll summarize the course.
Become a Member to join the conversation.