Skip to content

encodings

The Python encodings package implements the codec lookup and name-normalization infrastructure used internally by Python’s codec system. It maps encoding name strings to their corresponding modules and provides the search function that powers codecs.lookup().

Here’s a quick look at its name-normalization behavior:

Python
>>> import encodings

>>> encodings.normalize_encoding("utf-8")
'utf_8'
>>> encodings.normalize_encoding("UTF - 8")
'UTF_8'
>>> encodings.normalize_encoding("latin-1")
'latin_1'

Key Features

  • Normalizes encoding names by collapsing runs of non-alphanumeric characters (except dots) into single underscores and stripping leading and trailing underscores
  • Provides search_function() which the codec registry calls for every codecs.lookup() request
  • Contains all standard codec implementations as individual submodules
  • Caches resolved codecs for faster repeated lookups
  • Registers codec aliases defined by each submodule’s getaliases() function
  • Provides Windows code page lookups via win32_code_page_search_function() (Windows, Python 3.14+)

Frequently Used Classes and Functions

Object Type Description
encodings.normalize_encoding() Function Collapses non-alphanumeric characters (except dots) in an encoding name into underscores
encodings.search_function() Function Looks up and returns a CodecInfo object for a normalized encoding name
encodings.CodecRegistryError Exception Raised when a module in the encodings package provides an invalid codec

Examples

Normalizing encoding names to their canonical underscore-delimited form (dots are preserved, leading and trailing underscores are stripped):

Python
>>> import encodings

>>> encodings.normalize_encoding("utf-8")
'utf_8'
>>> encodings.normalize_encoding("ISO 8859-1")
'ISO_8859_1'
>>> encodings.normalize_encoding("latin-1")
'latin_1'

Looking up a codec via the search_function() function:

Python
>>> info = encodings.search_function("utf-8")
>>> info.name
'utf-8'
>>> type(info)
<class 'codecs.CodecInfo'>

Common Use Cases

The most common tasks for encodings include:

  • Normalizing encoding name strings before codec lookups
  • Implementing custom codec search functions that integrate with the registry
  • Inspecting which codec module handles a given encoding
  • Debugging encoding resolution issues in the codec registry

Real-World Example

A developer building a codec allow-list can use normalize_encoding() to canonicalize user input before comparing it against a set of approved encodings:

Python codec_allow_list.py
import encodings
import codecs

ALLOWED = {"utf_8", "ascii", "latin_1"}

def safe_lookup(user_encoding):
    normalized = encodings.normalize_encoding(user_encoding).lower()
    if normalized not in ALLOWED:
        raise ValueError(f"Encoding not allowed: {user_encoding!r}")
    return codecs.lookup(normalized)

info = safe_lookup("UTF - 8")
print(info.name)

Run it on the command line:

Shell
$ python codec_allow_list.py
utf-8

The normalize_encoding() call turns variant spellings like 'UTF - 8' into the canonical form 'utf_8' so that the allow-list comparison works reliably.

Tutorial

Unicode & Character Encodings in Python: A Painless Guide

In this tutorial, you'll get a Python-centric introduction to character encodings and unicode. Handling character encodings and numbering systems can at times seem painful and complicated, but this guide is here to help with easy-to-follow Python examples.

advanced python

For additional information on related topics, take a look at the following resources:


By Leodanis Pozo Ramos • Updated April 9, 2026