Handle Non-ASCII Unicode Characters (Solution)

In this lesson, you’ll explore a solution to accommodate non-ASCII Unicode characters in your word count implementation. This will involve understanding how Python handles multi-byte characters and ensuring that your application can process such characters accurately.

Review Your Current Implementation

Before you start, make sure that your current implementation can read input from standard input and count lines, words, and bytes for ASCII characters. Here’s a quick recap:

Python src/wordcount.py
import sys

def main():
    text = sys.stdin.read()
    num_lines = text.count("\n")
    num_words = len(text.split())
    num_bytes = len(text)
    print(num_lines, num_words, num_bytes)

Your task now is to adapt this implementation to handle Unicode characters correctly.

Understand Unicode and Byte Counts

Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Already a member? Sign-In

Locked learning resources

The full lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Already a member? Sign-In

Become a Member to join the conversation.