Congratulations on learning more about character encodings! In this lesson, you’ll cover a few caveats to remember when you’re working with encodings and see some resources you can check out to keep learning.
In this course, you learned about:
- Fundamental concepts of character encodings and numbering systems
- Integer, binary, octal, hex, str, and bytes literals in Python
- Differences between Unicode code points and UTF-8 encoding
- Python’s built-in functions related to character encoding and numbering systems
- Other encoding formats included in Python’s Standard Library
It’s very important to know the encoding of any data you read. Using the wrong encoding may result in an exception, or worse it will read successfully but have the wrong content.
Wikipedia has some useful pages:
- Unicode
- List of Unicode Characters
- Unicode Block
- Combining Diacritical Marks
- UTF-8
- ASCII
- Extended ASCII
- IEC_8859-1
- Windows-1252
- Digraph
- Orthographic_ligature
You can also check out these resources:
- Python documentation: Unicode changes in Python 3
- Python documentation: Unicode how-to
- Python documentation: Supported encodings
- Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
- Kunststube: What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text
- Mozilla: A composite approach to language/encoding detection
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
Alain Rouleau on July 2, 2020
Very interesting, thanks!