In this lesson, you’ll learn how to extend your word count implementation to read text from multiple files, calculate the number of lines, words, and bytes for each file, and provide a total summary when multiple files are involved.
Evaluate Your Existing Code
Below, you’ll find your current implementation of the wordcount.py
script, which you’re going to build upon to solve this task:
src/wordcount.py
import sys
from pathlib import Path
def main():
path = Path(sys.argv[1] if len(sys.argv) > 1 else "-")
try:
if path.name == "-":
raw_text = sys.stdin.buffer.read()
else:
raw_text = path.read_bytes()
text = raw_text.decode("utf-8")
num_lines = text.count("\n")
num_words = len(text.split())
num_bytes = len(raw_text)
max_digits = len(str(max(num_lines, num_words, num_bytes)))
output = (
f"{num_lines:>{max_digits}} "
f"{num_words:>{max_digits}} "
f"{num_bytes:>{max_digits}}"
)
if path.name != "-":
print(output, path)
else:
print(output)
except IsADirectoryError:
print(f"0 0 0 {path}/ (is a directory)")
except FileNotFoundError:
print(f"0 0 0 {path} (no such file or directory)")
The script is already shaping up nicely, but it’s missing an important feature: the ability to read more than one file at once. You’ll need to modify this code so that it can handle multiple files provided as command-line arguments. That’s your main goal for this task.
Retrieve All Command-Line Arguments
So far, you’ve dealt with only a single Path
object in your script. Now, you want to extend it so that you have a collection of paths based on the passed arguments. A common and Pythonic way to quickly grab all command-line arguments involves slicing the sys.argv
list.
Take a look at the following short example to get a better idea of how this works:
$ echo 'import sys; print(sys.argv[1:])' > script.py
$ python script.py
[]
$ python script.py file1.txt
['file1.txt']
$ python script.py file1.txt 42 -
['file1.txt', '42', '-']
You echo another one-liner program into a local script.py
file. The provided code snippet takes advantage of sequence slicing to return a shallow copy of the sys.argv
list while skipping the first element. The syntax [1:]
starts at the second element—which is at index one in the list—and includes all remaining elements, effectively excluding the script name from the argument list.