encodings
The Python encodings package implements the codec lookup and name-normalization infrastructure used internally by Python’s codec system. It maps encoding name strings to their corresponding modules and provides the search function that powers codecs.lookup().
Here’s a quick look at its name-normalization behavior:
>>> import encodings
>>> encodings.normalize_encoding("utf-8")
'utf_8'
>>> encodings.normalize_encoding("UTF - 8")
'UTF_8'
>>> encodings.normalize_encoding("latin-1")
'latin_1'
Key Features
- Normalizes encoding names by collapsing runs of non-alphanumeric characters (except dots) into single underscores and stripping leading and trailing underscores
- Provides
search_function()which the codec registry calls for everycodecs.lookup()request - Contains all standard codec implementations as individual submodules
- Caches resolved codecs for faster repeated lookups
- Registers codec aliases defined by each submodule’s
getaliases()function - Provides Windows code page lookups via
win32_code_page_search_function()(Windows, Python 3.14+)
Frequently Used Classes and Functions
| Object | Type | Description |
|---|---|---|
encodings.normalize_encoding() |
Function | Collapses non-alphanumeric characters (except dots) in an encoding name into underscores |
encodings.search_function() |
Function | Looks up and returns a CodecInfo object for a normalized encoding name |
encodings.CodecRegistryError |
Exception | Raised when a module in the encodings package provides an invalid codec |
Examples
Normalizing encoding names to their canonical underscore-delimited form (dots are preserved, leading and trailing underscores are stripped):
>>> import encodings
>>> encodings.normalize_encoding("utf-8")
'utf_8'
>>> encodings.normalize_encoding("ISO 8859-1")
'ISO_8859_1'
>>> encodings.normalize_encoding("latin-1")
'latin_1'
Looking up a codec via the search_function() function:
>>> info = encodings.search_function("utf-8")
>>> info.name
'utf-8'
>>> type(info)
<class 'codecs.CodecInfo'>
Common Use Cases
The most common tasks for encodings include:
- Normalizing encoding name strings before codec lookups
- Implementing custom codec search functions that integrate with the registry
- Inspecting which codec module handles a given encoding
- Debugging encoding resolution issues in the codec registry
Real-World Example
A developer building a codec allow-list can use normalize_encoding() to canonicalize user input before comparing it against a set of approved encodings:
codec_allow_list.py
import encodings
import codecs
ALLOWED = {"utf_8", "ascii", "latin_1"}
def safe_lookup(user_encoding):
normalized = encodings.normalize_encoding(user_encoding).lower()
if normalized not in ALLOWED:
raise ValueError(f"Encoding not allowed: {user_encoding!r}")
return codecs.lookup(normalized)
info = safe_lookup("UTF - 8")
print(info.name)
Run it on the command line:
$ python codec_allow_list.py
utf-8
The normalize_encoding() call turns variant spellings like 'UTF - 8' into the canonical form 'utf_8' so that the allow-list comparison works reliably.
Related Resources
Tutorial
Unicode & Character Encodings in Python: A Painless Guide
In this tutorial, you'll get a Python-centric introduction to character encodings and unicode. Handling character encodings and numbering systems can at times seem painful and complicated, but this guide is here to help with easy-to-follow Python examples.
For additional information on related topics, take a look at the following resources:
- How to Convert Bytes to Strings in Python (Tutorial)
- Strings and Character Data in Python (Tutorial)
- How to Sort Unicode Strings Alphabetically in Python (Tutorial)
- Reading and Writing Files in Python (Guide) (Tutorial)
- Unicode in Python: Working With Character Encodings (Course)
- How to Convert Bytes to Strings in Python (Quiz)
- Strings and Character Data in Python (Course)
- Python Strings and Character Data (Quiz)
- Reading and Writing Files in Python (Course)
- Reading and Writing Files in Python (Quiz)
By Leodanis Pozo Ramos • Updated April 9, 2026