Exploring Practical Applications: Part 1
00:00
In the previous lesson, I introduced you to the Counter
class. In this lesson, I’ll show you a few practical uses of this class. Have you ever wondered how frequently a letter occurs in a block of text? Well, wonder no more. In this first practical application, I’ll show you how to use the Counter
class to count the letters in a file.
00:21 The file I’ll demonstrate with is a text file containing Tim Peters’ The Zen of Python.
00:30
This function finds the frequency of letters in a text file. Line 5 creates a counter for tracking the letters. Line 6 opens the file using the filename
argument for the function. Lines 7 through 11 is what counts the actual letters.
00:47
The counting is done by iterating through the lines of the file. This is done by the for
loop on line 7. For each line, a list comprehension is created.
00:58
Let’s work through that comprehension a bit at a time. The first thing here is what is being output into the list. In this case, it is the value contained in the variable called char
.
01:10
The for
block inside of the comprehension tells you what is going to be inside of char
. It’s going to iterate over the results of line.lower()
. line
is a string containing the line from the text file, and lower()
is a function that returns an all-lowercase version of the string.
01:29
This for
block has an additional complexity. It has an in line
condition. The value of char
is only going to be included in the comprehension if the result of charset.isalpha()
is true.
01:43
Putting that all together, the list will contain all the letters in a line after they’ve been transformed to lowercase, and only if they’re an alphabetic character. This way, punctuation and numbers won’t be included in the comprehension. Finally, line 11 is where the counting happens. The letter_counter
object is updated.
02:04
A new instance of Counter
based on the list comprehension is passed in to update our letter_counter
. Remember the call to .update()
will add to the existing values.
02:16
The new counter counts the frequency of letters in line letters
, and the call to .update()
will add the result to the current set of letter frequencies in letter_counter
, the totals.
02:29 Line 13 returns the end result. Let’s play with this. First off, I’ll import it.
02:41 Then I’ll call the function with a text file …
02:49
and here’s the result. Not surprisingly to any Scrabble players out there, the letter e
is the most common occurrence … and not a single ten-point q
in the whole thing. In case you wanted only the five most popular …
03:08
you can call .most_common()
, passing in an argument of 5
.
03:15
A histogram is a bar chart that shows the frequency of data. The Counter
class is all about frequency. So let’s build a chart-printing function.
03:25
If you want to build a simple ASCII bar chart, you’re going to need to evaluate the frequency of things to be graphed. Of course, the Python Counter
can help with this nicely.
03:35
The key part of the function here is line 5, where our counter is created based on the data passed in. The .most_common()
method is called directly and stored.
03:44 Remember that this returns a list of tuples ordered by frequency. The rest of the code is presentation. Line 6 is a dictionary comprehension. This dictionary will contain key-value pairs mapping items to be graphed with the number of symbols in the graph, the ASCII bar in the bar chart.
04:03
The symbol * frequency
part creates a new string, which is the graph symbol, defaulting to the hash sign (#
), repeated frequency
times.
04:11 Multiplication of a string in Python gives you repetition. Note that the order of the items in the dictionary will be the order of the frequency of things being counted. As of Python 3.7, dictionaries are ordered dicts by default.
04:26
They were implemented that way in CPython earlier than that, but as of Python 3.7, it is part of the language spec. If you’re running something other than CPython or a version before 3.5, you would need to implement this with an OrderedDict
class instead.
04:40
You should also probably look into upgrading if you can. 3.6 has gone end-of-life. Line 7 finds the longest label in the graph so that all the labels can be padded to this length. Lines 8 through 10 loop through the chart
dictionary, printing a padded version of the label and the resulting bar graph. Let’s see this in practice.
05:17
Because our function bases the frequency calculations on Counter
, the data argument can be anything that Counter
can take. Let’s graph some fruit.
05:37
Now passing the sales Counter
into the graph function and changing the graphing symbol for giggles.
05:47
Once again, a nice little ASCII histogram. For a little homework, see if you can add the ability to scale the graph. Add a max_width
argument to the function and scale the bars so that they’re never longer than that maximum.
06:00 It should just be a few lines of code. Next up, some more practical applications.
Become a Member to join the conversation.