Exploring Practical Applications: Part 2
00:00
In the previous lesson, I showed you how to count letter frequency in a file and how to build an ASCII histogram. In this lesson, I’ll continue showing you some more practical applications of the Counter
class.
00:13 The mode of a sequence is the value that appears most frequently. Consider the sequence one, two, two, three. The mode is two. That is the value that appears most often.
00:26
Let’s write some code to calculate the mode. Dr. Seuss, eat your heart out. The function in the top window here implements a mode()
calculation. Line 5 shouldn’t be a surprise.
00:39
It creates a counter based on the data passed in. Line 6 gets the count of the most frequent item. Remember that .most_common()
returns a list of tuples, so this throws away the most frequent items name using the underscore (_
) convention and stores just the corresponding frequency value in top_count
. Lines 9 through 11 then loop through the data, finding all the items whose frequency match top_count
.
01:05 This will ensure that if there are multiple items that share the top spot, they’ll end up in the results list. Let’s try this out. Importing it. Now some data.
01:23
The number 2
shows up three times in this data, so it’s the mode. How about adding a 3
to have multiple modes? And there you go. Let’s try it with something other than a list of numbers.
01:47
And like previous examples, because the frequency calculation is done through a Counter
object, anything that Counter
takes, mode()
can compute.
02:03 The mode here is both the apples and the oranges. They’re not perfect, but file extensions are a good hint as to what kind of data can be found in a file.
02:15
The next practical application counts the frequency of different file types in a directory. Let me surprise you by starting out by importing our dear friend Counter
.
02:28
And now the Path
object from pathlib
. If you haven’t used this module before, it is a more modern version of the os.path
library.
02:39 It does all the same things, but it’s a bit easier to use and far easier to read.
02:50
One of the powers of pathlib
is the ability to compose Path
objects together using the slash operator (/
). Normally I’m not a big fan of operator overloading.
02:59 I’ve seen too much code where it isn’t clear what the overloaded operator’s supposed to do, but here, paths use slash as separators, and it’s easy to ignore the fact that this is actually overriding division of all things. I’m on a Mac, which is a Unix-based operating system.
03:15
If you’re on Windows and coding along, you’ll have to modify this path to give you something that makes sense in your environment. For me, this joins my .home
directory path with the 'Downloads'
directory underneath it.
03:28
Note that the Path
object doesn’t verify that the path exists until you do something with it. So if you point this somewhere else and make a mistake, your error won’t be seen here. Okay?
03:41
So downloads
variable points to a Path
object, storing a reference to my 'Downloads'
directory. Let’s do something with it.
03:53
The .iterdir()
method on a Path
object returns an iterator of all the files and directories underneath the path. If the path this Path
object pointed to wasn’t valid, this is where it would fall over.
04:15 Here I’ve built a list comprehension that is listing through all of the entries. Those are the files and directories. For each entry found, it checks if it is a file. If it is, it will be used in the comprehension. This way, child directories don’t end up in the list.
04:31
Instead of storing the entry itself, I’m using another feature of the Path
object, the .suffix
property. This is the extension part of the filename. So to recap, extensions
ends up being a list that contains the extensions of each of the filenames found in any directory below the downloads
path.
04:56
And to get the frequency info, just create a counter based on that extensions
list. For my downloads directory, I have forty-six .pdf
files, nineteen .png
files, and loads more.
05:14
Of course, you can use .most_common()
to get at the most frequent items. Quick thing to note: if you look more closely at the counter, you’ll see that it isn’t case-sensitive. There is one PDF
doc in here that’s all caps that isn’t included in those forty-six.
05:29
You could fix this by changing the list comprehension to call .lower()
on the suffix. A natural use of a Python counter is a shopping cart. The keys are the items being put in the cart, and the values are the number of the items being purchased. Let’s try this out in the REPL.
05:51 First off, I’ll create a dictionary with some prices.
06:02 See the comment? The reason I’m saying not to do this is because you should never use a float to store money. Floats are not precise and can get you into trouble.
06:11 If you’re coding with money, you should use the decimal class instead. If you want to see what I mean about precision problems, open up a REPL and add 0.1 to 0.2.
06:22 You might be surprised at the result. Anyhow, caveat being entered.
06:34 I’ve created a counter with the things I’m buying. One copy of the course, three copies of a book, and some wallpaper. I’m assuming it’s digital wallpaper. Otherwise this is a bit of a weird store.
06:47 Let’s look at some info about what’s in our cart.
06:55
Using the .items()
method, I can iterate over the Counter
object. Then I can look up the price for the product
07:10 and then calculate the subtotal for the product by multiplying the price by the number of items in the cart.
07:26
And that long f-string prints it all out. The 7.2f
s you see inside of the f-string are format specifiers indicating that the price and subtotals should be seven characters wide with two decimal places shown.
07:41
And there you have it, the receipt for your cart. For a little homework, add the lines you’d need to print out a total as well. Next up, I’ll show you how to use the Counter
class as a multiset implementation.
Become a Member to join the conversation.