Exploring Practical Applications: Part 1
In the previous lesson, I introduced you to the
Counter class. In this lesson, I’ll show you a few practical uses of this class. Have you ever wondered how frequently a letter occurs in a block of text? Well, wonder no more. In this first practical application, I’ll show you how to use the
Counter class to count the letters in a file.
00:21 The file I’ll demonstrate with is a text file containing Tim Peters’ The Zen of Python.
This function finds the frequency of letters in a text file. Line 5 creates a counter for tracking the letters. Line 6 opens the file using the
filename argument for the function. Lines 7 through 11 is what counts the actual letters.
The counting is done by iterating through the lines of the file. This is done by the
for loop on line 7. For each line, a list comprehension is created.
Let’s work through that comprehension a bit at a time. The first thing here is what is being output into the list. In this case, it is the value contained in the variable called
for block inside of the comprehension tells you what is going to be inside of
char. It’s going to iterate over the results of
line is a string containing the line from the text file, and
lower() is a function that returns an all-lowercase version of the string.
for block has an additional complexity. It has an
in line condition. The value of
char is only going to be included in the comprehension if the result of
charset.isalpha() is true.
Putting that all together, the list will contain all the letters in a line after they’ve been transformed to lowercase, and only if they’re an alphabetic character. This way, punctuation and numbers won’t be included in the comprehension. Finally, line 11 is where the counting happens. The
letter_counter object is updated.
A new instance of
Counter based on the list comprehension is passed in to update our
letter_counter. Remember the call to
.update() will add to the existing values.
The new counter counts the frequency of letters in line
letters, and the call to
.update() will add the result to the current set of letter frequencies in
letter_counter, the totals.
02:29 Line 13 returns the end result. Let’s play with this. First off, I’ll import it.
02:41 Then I’ll call the function with a text file …
and here’s the result. Not surprisingly to any Scrabble players out there, the letter
e is the most common occurrence … and not a single ten-point
q in the whole thing. In case you wanted only the five most popular …
you can call
.most_common(), passing in an argument of
A histogram is a bar chart that shows the frequency of data. The
Counter class is all about frequency. So let’s build a chart-printing function.
If you want to build a simple ASCII bar chart, you’re going to need to evaluate the frequency of things to be graphed. Of course, the Python
Counter can help with this nicely.
The key part of the function here is line 5, where our counter is created based on the data passed in. The
.most_common() method is called directly and stored.
03:44 Remember that this returns a list of tuples ordered by frequency. The rest of the code is presentation. Line 6 is a dictionary comprehension. This dictionary will contain key-value pairs mapping items to be graphed with the number of symbols in the graph, the ASCII bar in the bar chart.
symbol * frequency part creates a new string, which is the graph symbol, defaulting to the hash sign (
04:11 Multiplication of a string in Python gives you repetition. Note that the order of the items in the dictionary will be the order of the frequency of things being counted. As of Python 3.7, dictionaries are ordered dicts by default.
They were implemented that way in CPython earlier than that, but as of Python 3.7, it is part of the language spec. If you’re running something other than CPython or a version before 3.5, you would need to implement this with an
OrderedDict class instead.
You should also probably look into upgrading if you can. 3.6 has gone end-of-life. Line 7 finds the longest label in the graph so that all the labels can be padded to this length. Lines 8 through 10 loop through the
chart dictionary, printing a padded version of the label and the resulting bar graph. Let’s see this in practice.
Because our function bases the frequency calculations on
Counter, the data argument can be anything that
Counter can take. Let’s graph some fruit.
Now passing the
sales Counter into the graph function and changing the graphing symbol for giggles.
Once again, a nice little ASCII histogram. For a little homework, see if you can add the ability to scale the graph. Add a
max_width argument to the function and scale the bars so that they’re never longer than that maximum.
06:00 It should just be a few lines of code. Next up, some more practical applications.
Become a Member to join the conversation.