Graphing Performance With matplotlib
00:00
Plotting Performance Using matplotlib
. In the previous section of the course, you measured the performance of loops and generators for finding the first matching item in an iterable.
00:12
Now you’ll extend the script to use matplotlib
to produce charts of the performance to allow deeper analysis of the two approaches. As always, it’s good practice when using any third-party package to install it into a virtual environment.
00:28 So here you can see one being created and activated on macOS or Linux …
00:58
Once you have a virtual environment created, the command to install matplotlib
is the same regardless of the platform that you’re on.
01:16
The contents of the original script are copied to a new file, chart.py
, and then the code to create the graphs is added. First, matplotlib
’s pyplot
module is imported with the traditional alias.
01:42
The first plot will plot the times for both for
loop and generator.
01:58 The limits of the plot are set appropriately, and the axes and plot are labeled.
02:28
Finally, the plot is displayed. The second plot will plot the ratio of performance between generators and loops. So the generator_ratio
that was previously seen in interactive mode is calculated, along with the looping_ratio
, which is always one.
02:54 Once more, the two lists are plotted against the positions, and the limits, labels, and title are all set.
03:33
Then the plot is displayed depending on the system you’re running and the values for TIMEIT_TIMES
, LIST_SIZE
, and POSITION_INCREMENT
that you use. Running the script can take a while, but it should produce one chart that shows the times plotted against each other. Additionally, after closing the first chart, you’ll get another chart that shows the ratio between the two strategies.
04:01
This chart clearly illustrates that in this test, when the target item is near the beginning of the iterator, generators are far slower than for
loops. However, once the element to find is at position a hundred or greater, generators beat the for
loop quite consistently and by a fair margin.
04:18 You can interactively zoom in on the chart with the magnifying glass icon. The zoomed chart shows there’s a performance gain of around 5 or 6 percent. This may not be anything to write home about, but it’s also not negligible.
04:32
Whether it’s worth it for you depends on the specific data you’ll be using and how often you need to use it. A point of note is that for low values of TIMEIT_TIMES
, you’ll often get spikes in the chart, as seen on-screen.
04:47 These are an inevitable side effect of testing on a computer that’s not dedicated to the task. If the computer needs to do something, then it will pause the Python process without hesitation, and this can inflate certain results.
05:02
If you repeat the test multiple times, then the spikes will appear in random locations. To smooth out the lines and decrease the effect, increase the value of TIMEIT_TIMES
.
05:16
With these results, you can tentatively say that generators are faster than for
loops, even though generators can be significantly slower when the item to find is in the first hundred elements of the iterable. When you’re dealing with small lists, the overall difference in terms of raw milliseconds lost isn’t much, yet for large iterable, where a 5 precent gain can mean minutes, it’s something to bear in mind.
05:43 This last chart shows the performance for very large intervals, with the increase in performance stabilizing at around 6 percent. Now that you’ve seen the performance of the two hard-coded solutions for finding the first match, in the next section of the course, you’ll take a look at a general reusable function which will allow you to do the same in more situations.
Become a Member to join the conversation.