`encodings`

The Python encodings package implements the codec lookup and name-normalization infrastructure used internally by Python’s codec system. It maps encoding name strings to their corresponding modules and provides the search function that powers codecs.lookup().

Here’s a quick look at its name-normalization behavior:

>>> import encodings

>>> encodings.normalize_encoding("utf-8")
'utf_8'
>>> encodings.normalize_encoding("UTF - 8")
'UTF_8'
>>> encodings.normalize_encoding("latin-1")
'latin_1'

Key Features

Normalizes encoding names by collapsing runs of non-alphanumeric characters (except dots) into single underscores and stripping leading and trailing underscores
Provides search_function() which the codec registry calls for every codecs.lookup() request
Contains all standard codec implementations as individual submodules
Caches resolved codecs for faster repeated lookups
Registers codec aliases defined by each submodule’s getaliases() function
Provides Windows code page lookups via win32_code_page_search_function() (Windows, Python 3.14+)

Frequently Used Classes and Functions

Object	Type	Description
`encodings.normalize_encoding()`	Function	Collapses non-alphanumeric characters (except dots) in an encoding name into underscores
`encodings.search_function()`	Function	Looks up and returns a `CodecInfo` object for a normalized encoding name
`encodings.CodecRegistryError`	Exception	Raised when a module in the `encodings` package provides an invalid codec

Examples

Normalizing encoding names to their canonical underscore-delimited form (dots are preserved, leading and trailing underscores are stripped):

>>> import encodings

>>> encodings.normalize_encoding("utf-8")
'utf_8'
>>> encodings.normalize_encoding("ISO 8859-1")
'ISO_8859_1'
>>> encodings.normalize_encoding("latin-1")
'latin_1'

Looking up a codec via the search_function() function:

>>> info = encodings.search_function("utf-8")
>>> info.name
'utf-8'
>>> type(info)
<class 'codecs.CodecInfo'>

Common Use Cases

The most common tasks for encodings include:

Normalizing encoding name strings before codec lookups
Implementing custom codec search functions that integrate with the registry
Inspecting which codec module handles a given encoding
Debugging encoding resolution issues in the codec registry

Real-World Example

A developer building a codec allow-list can use normalize_encoding() to canonicalize user input before comparing it against a set of approved encodings:

import encodings
import codecs

ALLOWED = {"utf_8", "ascii", "latin_1"}

def safe_lookup(user_encoding):
    normalized = encodings.normalize_encoding(user_encoding).lower()
    if normalized not in ALLOWED:
        raise ValueError(f"Encoding not allowed: {user_encoding!r}")
    return codecs.lookup(normalized)

info = safe_lookup("UTF - 8")
print(info.name)

Run it on the command line:

$ python codec_allow_list.py
utf-8

The normalize_encoding() call turns variant spellings like 'UTF - 8' into the canonical form 'utf_8' so that the allow-list comparison works reliably.

Tutorial

Unicode & Character Encodings in Python: A Painless Guide

In this tutorial, you'll get a Python-centric introduction to character encodings and unicode. Handling character encodings and numbering systems can at times seem painful and complicated, but this guide is here to help with easy-to-follow Python examples.

advanced python

For additional information on related topics, take a look at the following resources: