In this lesson, you’ll explore a solution to accommodate non-ASCII Unicode characters in your word count implementation. This will involve understanding how Python handles multi-byte characters and ensuring that your application can process such characters accurately.
Review Your Current Implementation
Before you start, make sure that your current implementation can read input from standard input and count lines, words, and bytes for ASCII characters. Here’s a quick recap:
Python
src/wordcount.py
import sys
def main():
text = sys.stdin.read()
num_lines = text.count("\n")
num_words = len(text.split())
num_bytes = len(text)
print(num_lines, num_words, num_bytes)
Your task now is to adapt this implementation to handle Unicode characters correctly.