Skip to content

codecs

The Python codecs module defines base classes for standard Python codecs (encoders and decoders) and provides access to the internal Python codec registry. It supports text encodings, text-to-text codecs, and bytes-to-bytes codecs.

Here’s a quick look at encoding and decoding text:

Python
>>> import codecs

>>> codecs.encode("Hello, World!", "utf-8")
b'Hello, World!'
>>> codecs.decode(b"Hello, World!", "utf-8")
'Hello, World!'

Key Features

  • Provides encode() and decode() functions with configurable error handling strategies
  • Maintains a searchable registry of built-in and custom codecs
  • Supports text encodings, text-to-text transforms, and bytes-to-bytes transforms
  • Offers incremental encoders and decoders suitable for streaming data
  • Provides stream-oriented reader and writer classes for encoded files and network streams
  • Includes BOM (byte order marks) constants, such as codecs.BOM_UTF8, for detecting and writing BOMs
  • Allows registration of custom codecs and custom error handlers

Frequently Used Classes and Functions

Object Type Description
codecs.encode() Function Encodes an object using a named registered codec
codecs.decode() Function Decodes an object using a named registered codec
codecs.lookup() Function Returns a CodecInfo object for a named encoding
codecs.register() Function Registers a custom codec search function with the registry
codecs.register_error() Function Registers a named error handling function for use during encoding or decoding
codecs.iterencode() Function Incrementally encodes strings from an iterator using a named codec
codecs.iterdecode() Function Incrementally decodes bytes from an iterator using a named codec
codecs.IncrementalEncoder Class Base class for building stateful incremental encoders
codecs.IncrementalDecoder Class Base class for building stateful incremental decoders
codecs.StreamReader Class Base class for reading and decoding data from a binary stream
codecs.StreamWriter Class Base class for encoding and writing data to a binary stream
codecs.BOM_UTF8 Constant UTF-8 byte order mark (b'\xef\xbb\xbf'), used to signal UTF-8 encoding at the start of a byte stream
codecs.BOM_UTF16 Constant UTF-16 BOM in native byte order; also available as BOM_UTF16_BE and BOM_UTF16_LE for explicit endianness
codecs.BOM_UTF32 Constant UTF-32 BOM in native byte order; also available as BOM_UTF32_BE and BOM_UTF32_LE for explicit endianness

Examples

Decoding bytes that contain a character that can’t be represented in ASCII, using different error strategies:

Python
>>> import codecs

>>> data = "Caf\u00e9".encode("latin-1")
>>> codecs.decode(data, "ascii", errors="ignore")
'Caf'
>>> codecs.decode(data, "ascii", errors="replace")
'Caf\ufffd'
>>> codecs.decode(data, "ascii", errors="backslashreplace")
'Caf\\xe9'

Inspecting a codec’s metadata with the codecs.lookup() function:

Python
>>> import codecs

>>> info = codecs.lookup("utf-8")
>>> info.name
'utf-8'
>>> info.incrementalencoder
<class 'encodings.utf_8.IncrementalEncoder'>

Using codecs.iterdecode() to decode a stream of byte chunks incrementally:

Python
>>> import codecs

>>> chunks = [b"Hell", b"o, ", b"W\xc3\xb6", b"rld!"]
>>> decoder = codecs.iterdecode(iter(chunks), "utf-8")
>>> list(decoder)
['Hell', 'o, ', 'W\xf6', 'rld!']

Common Use Cases

The most common use case for codecs include:

  • Encoding text to bytes for storage in files or transmission over a network
  • Decoding bytes from legacy systems that use non-UTF-8 encodings such as Latin-1 or Windows-1252
  • Streaming large files in a specific encoding without loading them fully into memory
  • Registering custom codecs for domain-specific or proprietary data formats
  • Applying named error handlers to control behavior when encountering unencodable or undecodable characters

Real-World Example

A script can use codecs.iterdecode() to incrementally transcode a Latin-1 encoded log file to UTF-8 without loading the entire file into memory:

Python transcode_log.py
import codecs

def transcode_to_utf8(source_path, dest_path):
    with open(source_path, "rb") as src, open(dest_path, "wb") as dst:
        reader = codecs.iterdecode(src, "latin-1")
        for line in reader:
            dst.write(line.encode("utf-8"))

transcode_to_utf8("legacy_log.txt", "modern_log.txt")
print("Transcoding complete.")

Run it:

Shell
$ python transcode_log.py
Transcoding complete.

The iterdecode() call processes the file in chunks, converting each segment from Latin-1 to a Python string, which is then re-encoded as UTF-8 and written to the output file.

Tutorial

Unicode & Character Encodings in Python: A Painless Guide

In this tutorial, you'll get a Python-centric introduction to character encodings and unicode. Handling character encodings and numbering systems can at times seem painful and complicated, but this guide is here to help with easy-to-follow Python examples.

advanced python

For additional information on related topics, take a look at the following resources:


By Leodanis Pozo Ramos • Updated March 20, 2026