Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set your subtitle preferences in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please refer to our video player troubleshooting guide for assistance.

Exploring Practical Applications: Part 2

00:00 In the previous lesson, I showed you how to count letter frequency in a file and how to build an ASCII histogram. In this lesson, I’ll continue showing you some more practical applications of the Counter class.

00:13 The mode of a sequence is the value that appears most frequently. Consider the sequence one, two, two, three. The mode is two. That is the value that appears most often.

00:26 Let’s write some code to calculate the mode. Dr. Seuss, eat your heart out. The function in the top window here implements a mode() calculation. Line 5 shouldn’t be a surprise.

00:39 It creates a counter based on the data passed in. Line 6 gets the count of the most frequent item. Remember that .most_common() returns a list of tuples, so this throws away the most frequent items name using the underscore (_) convention and stores just the corresponding frequency value in top_count. Lines 9 through 11 then loop through the data, finding all the items whose frequency match top_count.

01:05 This will ensure that if there are multiple items that share the top spot, they’ll end up in the results list. Let’s try this out. Importing it. Now some data.

01:23 The number 2 shows up three times in this data, so it’s the mode. How about adding a 3 to have multiple modes? And there you go. Let’s try it with something other than a list of numbers.

01:47 And like previous examples, because the frequency calculation is done through a Counter object, anything that Counter takes, mode() can compute.

02:03 The mode here is both the apples and the oranges. They’re not perfect, but file extensions are a good hint as to what kind of data can be found in a file.

02:15 The next practical application counts the frequency of different file types in a directory. Let me surprise you by starting out by importing our dear friend Counter.

02:28 And now the Path object from pathlib. If you haven’t used this module before, it is a more modern version of the os.path library.

02:39 It does all the same things, but it’s a bit easier to use and far easier to read.

02:50 One of the powers of pathlib is the ability to compose Path objects together using the slash operator (/). Normally I’m not a big fan of operator overloading.

02:59 I’ve seen too much code where it isn’t clear what the overloaded operator’s supposed to do, but here, paths use slash as separators, and it’s easy to ignore the fact that this is actually overriding division of all things. I’m on a Mac, which is a Unix-based operating system.

03:15 If you’re on Windows and coding along, you’ll have to modify this path to give you something that makes sense in your environment. For me, this joins my .home directory path with the 'Downloads' directory underneath it.

03:28 Note that the Path object doesn’t verify that the path exists until you do something with it. So if you point this somewhere else and make a mistake, your error won’t be seen here. Okay?

03:41 So downloads variable points to a Path object, storing a reference to my 'Downloads' directory. Let’s do something with it.

03:53 The .iterdir() method on a Path object returns an iterator of all the files and directories underneath the path. If the path this Path object pointed to wasn’t valid, this is where it would fall over.

04:15 Here I’ve built a list comprehension that is listing through all of the entries. Those are the files and directories. For each entry found, it checks if it is a file. If it is, it will be used in the comprehension. This way, child directories don’t end up in the list.

04:31 Instead of storing the entry itself, I’m using another feature of the Path object, the .suffix property. This is the extension part of the filename. So to recap, extensions ends up being a list that contains the extensions of each of the filenames found in any directory below the downloads path.

04:56 And to get the frequency info, just create a counter based on that extensions list. For my downloads directory, I have forty-six .pdf files, nineteen .png files, and loads more.

05:14 Of course, you can use .most_common() to get at the most frequent items. Quick thing to note: if you look more closely at the counter, you’ll see that it isn’t case-sensitive. There is one PDF doc in here that’s all caps that isn’t included in those forty-six.

05:29 You could fix this by changing the list comprehension to call .lower() on the suffix. A natural use of a Python counter is a shopping cart. The keys are the items being put in the cart, and the values are the number of the items being purchased. Let’s try this out in the REPL.

05:51 First off, I’ll create a dictionary with some prices.

06:02 See the comment? The reason I’m saying not to do this is because you should never use a float to store money. Floats are not precise and can get you into trouble.

06:11 If you’re coding with money, you should use the decimal class instead. If you want to see what I mean about precision problems, open up a REPL and add 0.1 to 0.2.

06:22 You might be surprised at the result. Anyhow, caveat being entered.

06:34 I’ve created a counter with the things I’m buying. One copy of the course, three copies of a book, and some wallpaper. I’m assuming it’s digital wallpaper. Otherwise this is a bit of a weird store.

06:47 Let’s look at some info about what’s in our cart.

06:55 Using the .items() method, I can iterate over the Counter object. Then I can look up the price for the product

07:10 and then calculate the subtotal for the product by multiplying the price by the number of items in the cart.

07:26 And that long f-string prints it all out. The 7.2fs you see inside of the f-string are format specifiers indicating that the price and subtotals should be seven characters wide with two decimal places shown.

07:41 And there you have it, the receipt for your cart. For a little homework, add the lines you’d need to print out a total as well. Next up, I’ll show you how to use the Counter class as a multiset implementation.

Become a Member to join the conversation.