Ensure Consistent Number Formatting (Refactoring)

Before diving into the solution for this task, take a moment to refactor your code to remove any duplication that may have crept in during the previous tasks of the coding challenge. This will improve the overall readability and maintainability.

Refactor Your Existing Code

You’ll start by identifying repetitive code fragments in your script:

Python src/wordcount.py
import sys
from pathlib import Path

def main():
    if len(sys.argv) > 1:
        paths = [Path(arg) for arg in sys.argv[1:]]
    else:
        paths = [Path("-")]
    total_counts = [0, 0, 0]
    for path in paths:
        try:
            if path.name == "-":
                raw_text = sys.stdin.buffer.read()
            else:
                raw_text = path.read_bytes()
            text = raw_text.decode("utf-8")
            num_lines = text.count("\n")
            num_words = len(text.split())
            num_bytes = len(raw_text)
            total_counts[0] += num_lines
            total_counts[1] += num_words
            total_counts[2] += num_bytes
            max_digits = len(str(max(num_lines, num_words, num_bytes)))
            output = (
                f"{num_lines:>{max_digits}} "
                f"{num_words:>{max_digits}} "
                f"{num_bytes:>{max_digits}}"
            )
            if path.name != "-":
                print(output, path)
            else:
                print(output)
        except IsADirectoryError:
            print(f"0 0 0 {path}/ (is a directory)")
        except FileNotFoundError:
            print(f"0 0 0 {path} (no such file or directory)")
    if len(paths) > 1:
        max_digits = len(str(max(total_counts)))
        output = (
            f"{total_counts[0]:>{max_digits}} "
            f"{total_counts[1]:>{max_digits}} "
            f"{total_counts[2]:>{max_digits}}"
        )
        print(output, "total")

The highlighted lines look nearly identical, indicating a potential area for extracting a common behavior responsible for formatting and printing the output. Think about how you can introduce a reusable component to encapsulate such behavior.

Remove Code Duplication

Notice that you have two sets of counts, but you represent them differently. The counts of the individual files are stored in local variables, while their running total is kept in a Python list. How about consolidating them into a common data type? Although Python doesn’t have a dedicated type to represent these, you can invent your own. Here’s an example:

Python src/wordcount.py
import sys
from pathlib import Path
from typing import NamedTuple

class Counts(NamedTuple):
    lines: int
    words: int
    bytes: int

def main():
    ...

In this case, you define a typed variant of Python’s named tuple because it retains some of the sequence-like properties of a list. For example, you can still access its elements by index, although it’s more casual to access them by name.

Now that you have a uniform way of representing the counts, it’s time to find the maximum number of digits across the individual counts. You can do so by defining a property, which is a method that you can access like an attribute, in your Counts class:

Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Already a member? Sign-In

Locked learning resources

The full lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Already a member? Sign-In

Become a Member to join the conversation.