Before diving into the solution for this task, take a moment to refactor your code to remove any duplication that may have crept in during the previous tasks of the coding challenge. This will improve the overall readability and maintainability.
Refactor Your Existing Code
You’ll start by identifying repetitive code fragments in your script:
src/wordcount.py
import sys
from pathlib import Path
def main():
if len(sys.argv) > 1:
paths = [Path(arg) for arg in sys.argv[1:]]
else:
paths = [Path("-")]
total_counts = [0, 0, 0]
for path in paths:
try:
if path.name == "-":
raw_text = sys.stdin.buffer.read()
else:
raw_text = path.read_bytes()
text = raw_text.decode("utf-8")
num_lines = text.count("\n")
num_words = len(text.split())
num_bytes = len(raw_text)
total_counts[0] += num_lines
total_counts[1] += num_words
total_counts[2] += num_bytes
max_digits = len(str(max(num_lines, num_words, num_bytes)))
output = (
f"{num_lines:>{max_digits}} "
f"{num_words:>{max_digits}} "
f"{num_bytes:>{max_digits}}"
)
if path.name != "-":
print(output, path)
else:
print(output)
except IsADirectoryError:
print(f"0 0 0 {path}/ (is a directory)")
except FileNotFoundError:
print(f"0 0 0 {path} (no such file or directory)")
if len(paths) > 1:
max_digits = len(str(max(total_counts)))
output = (
f"{total_counts[0]:>{max_digits}} "
f"{total_counts[1]:>{max_digits}} "
f"{total_counts[2]:>{max_digits}}"
)
print(output, "total")
The highlighted lines look nearly identical, indicating a potential area for extracting a common behavior responsible for formatting and printing the output. Think about how you can introduce a reusable component to encapsulate such behavior.
Remove Code Duplication
Notice that you have two sets of counts, but you represent them differently. The counts of the individual files are stored in local variables, while their running total is kept in a Python list. How about consolidating them into a common data type? Although Python doesn’t have a dedicated type to represent these, you can invent your own. Here’s an example:
src/wordcount.py
import sys
from pathlib import Path
from typing import NamedTuple
class Counts(NamedTuple):
lines: int
words: int
bytes: int
def main():
...
In this case, you define a typed variant of Python’s named tuple because it retains some of the sequence-like properties of a list. For example, you can still access its elements by index, although it’s more casual to access them by name.
Now that you have a uniform way of representing the counts, it’s time to find the maximum number of digits across the individual counts. You can do so by defining a property, which is a method that you can access like an attribute, in your Counts
class: