Read Data From Multiple Files (Solution)

In this lesson, you’ll learn how to extend your word count implementation to read text from multiple files, calculate the number of lines, words, and bytes for each file, and provide a total summary when multiple files are involved.

Evaluate Your Existing Code

Below, you’ll find your current implementation of the wordcount.py script, which you’re going to build upon to solve this task:

Python src/wordcount.py
import sys
from pathlib import Path

def main():
    path = Path(sys.argv[1] if len(sys.argv) > 1 else "-")
    try:
        if path.name == "-":
            raw_text = sys.stdin.buffer.read()
        else:
            raw_text = path.read_bytes()
        text = raw_text.decode("utf-8")
        num_lines = text.count("\n")
        num_words = len(text.split())
        num_bytes = len(raw_text)
        max_digits = len(str(max(num_lines, num_words, num_bytes)))
        output = (
            f"{num_lines:>{max_digits}} "
            f"{num_words:>{max_digits}} "
            f"{num_bytes:>{max_digits}}"
        )
        if path.name != "-":
            print(output, path)
        else:
            print(output)
    except IsADirectoryError:
        print(f"0 0 0 {path}/ (is a directory)")
    except FileNotFoundError:
        print(f"0 0 0 {path} (no such file or directory)")

The script is already shaping up nicely, but it’s missing an important feature: the ability to read more than one file at once. You’ll need to modify this code so that it can handle multiple files provided as command-line arguments. That’s your main goal for this task.

Retrieve All Command-Line Arguments

So far, you’ve dealt with only a single Path object in your script. Now, you want to extend it so that you have a collection of paths based on the passed arguments. A common and Pythonic way to quickly grab all command-line arguments involves slicing the sys.argv list.

Take a look at the following short example to get a better idea of how this works:

Shell
$ echo 'import sys; print(sys.argv[1:])' > script.py

$ python script.py
[]

$ python script.py file1.txt
['file1.txt']

$ python script.py file1.txt 42 -
['file1.txt', '42', '-']

You echo another one-liner program into a local script.py file. The provided code snippet takes advantage of sequence slicing to return a shallow copy of the sys.argv list while skipping the first element. The syntax [1:] starts at the second element—which is at index one in the list—and includes all remaining elements, effectively excluding the script name from the argument list.

Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Already a member? Sign-In

Locked learning resources

The full lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Already a member? Sign-In

Become a Member to join the conversation.