In this lesson, you’ll walk through a possible solution for ensuring consistent number formatting in your word count implementation. By the end of this lesson, your program will align numbers in columns across multiple lines of output, making it easier to read.
Revisit the Inconsistent Formatting
Previously, you refactored your wordcount.py
script. Refactoring is all about changing the code without altering its behavior. The goals of refactoring typically include improving the code’s readability and maintainability, for example, by removing redundancy or choosing better abstractions. While your code undoubtedly looks better now, it still doesn’t satisfy the acceptance criteria for this task.
When you run the wordcount
command, you may notice that the formatting of the individual file counts doesn’t always agree with each other:
$ wordcount small.txt medium.txt large.txt
1 1 7 small.txt
1 6 31 medium.txt
1 111 999 large.txt
3 118 1037 total
The column misalignment visible above occurs because you print each line as soon as you finish processing the corresponding file. Instead, you should wait until all files have been processed to determine the maximum number of digits across all the lines, so that you can align the columns properly.
Sometimes, the respective file counts will accidentally remain consistent, but you may still need to adjust the formatting of the total counts when their sum is greater than the individual file counts:
$ wordcount file1.txt file2.txt
2 5 28 file1.txt
5 15 93 file2.txt
7 20 121 total
Finally, don’t forget about your two special cases of handling directories and missing files, which completely ignore the formatting by always printing hard-coded messages:
$ wordcount large.txt /tmp missing.txt
1 111 999 large.txt
0 0 0 /tmp/ (is a directory)
0 0 0 missing.txt (no such file or directory)
1 111 999 total
Next up, you’ll devise a mechanism for calculating the maximum width needed for each column across all lines of output.
Combine Counts With File Paths
Rather than printing the counts as you iterate over the paths, you could accumulate them in a list and print them all at once afterward. However, some of these counts will need to include an optional file path, so you really have to retain two pieces of information per file.