codecs
The Python codecs module defines base classes for standard Python codecs (encoders and decoders) and provides access to the internal Python codec registry. It supports text encodings, text-to-text codecs, and bytes-to-bytes codecs.
Here’s a quick look at encoding and decoding text:
>>> import codecs
>>> codecs.encode("Hello, World!", "utf-8")
b'Hello, World!'
>>> codecs.decode(b"Hello, World!", "utf-8")
'Hello, World!'
Key Features
- Provides
encode()anddecode()functions with configurable error handling strategies - Maintains a searchable registry of built-in and custom codecs
- Supports text encodings, text-to-text transforms, and bytes-to-bytes transforms
- Offers incremental encoders and decoders suitable for streaming data
- Provides stream-oriented reader and writer classes for encoded files and network streams
- Includes BOM (byte order marks) constants, such as
codecs.BOM_UTF8, for detecting and writing BOMs - Allows registration of custom codecs and custom error handlers
Frequently Used Classes and Functions
| Object | Type | Description |
|---|---|---|
codecs.encode() |
Function | Encodes an object using a named registered codec |
codecs.decode() |
Function | Decodes an object using a named registered codec |
codecs.lookup() |
Function | Returns a CodecInfo object for a named encoding |
codecs.register() |
Function | Registers a custom codec search function with the registry |
codecs.register_error() |
Function | Registers a named error handling function for use during encoding or decoding |
codecs.iterencode() |
Function | Incrementally encodes strings from an iterator using a named codec |
codecs.iterdecode() |
Function | Incrementally decodes bytes from an iterator using a named codec |
codecs.IncrementalEncoder |
Class | Base class for building stateful incremental encoders |
codecs.IncrementalDecoder |
Class | Base class for building stateful incremental decoders |
codecs.StreamReader |
Class | Base class for reading and decoding data from a binary stream |
codecs.StreamWriter |
Class | Base class for encoding and writing data to a binary stream |
codecs.BOM_UTF8 |
Constant | UTF-8 byte order mark (b'\xef\xbb\xbf'), used to signal UTF-8 encoding at the start of a byte stream |
codecs.BOM_UTF16 |
Constant | UTF-16 BOM in native byte order; also available as BOM_UTF16_BE and BOM_UTF16_LE for explicit endianness |
codecs.BOM_UTF32 |
Constant | UTF-32 BOM in native byte order; also available as BOM_UTF32_BE and BOM_UTF32_LE for explicit endianness |
Examples
Decoding bytes that contain a character that can’t be represented in ASCII, using different error strategies:
>>> import codecs
>>> data = "Caf\u00e9".encode("latin-1")
>>> codecs.decode(data, "ascii", errors="ignore")
'Caf'
>>> codecs.decode(data, "ascii", errors="replace")
'Caf\ufffd'
>>> codecs.decode(data, "ascii", errors="backslashreplace")
'Caf\\xe9'
Inspecting a codec’s metadata with the codecs.lookup() function:
>>> import codecs
>>> info = codecs.lookup("utf-8")
>>> info.name
'utf-8'
>>> info.incrementalencoder
<class 'encodings.utf_8.IncrementalEncoder'>
Using codecs.iterdecode() to decode a stream of byte chunks incrementally:
>>> import codecs
>>> chunks = [b"Hell", b"o, ", b"W\xc3\xb6", b"rld!"]
>>> decoder = codecs.iterdecode(iter(chunks), "utf-8")
>>> list(decoder)
['Hell', 'o, ', 'W\xf6', 'rld!']
Common Use Cases
The most common use case for codecs include:
- Encoding text to bytes for storage in files or transmission over a network
- Decoding bytes from legacy systems that use non-UTF-8 encodings such as Latin-1 or Windows-1252
- Streaming large files in a specific encoding without loading them fully into memory
- Registering custom codecs for domain-specific or proprietary data formats
- Applying named error handlers to control behavior when encountering unencodable or undecodable characters
Real-World Example
A script can use codecs.iterdecode() to incrementally transcode a Latin-1 encoded log file to UTF-8 without loading the entire file into memory:
transcode_log.py
import codecs
def transcode_to_utf8(source_path, dest_path):
with open(source_path, "rb") as src, open(dest_path, "wb") as dst:
reader = codecs.iterdecode(src, "latin-1")
for line in reader:
dst.write(line.encode("utf-8"))
transcode_to_utf8("legacy_log.txt", "modern_log.txt")
print("Transcoding complete.")
Run it:
$ python transcode_log.py
Transcoding complete.
The iterdecode() call processes the file in chunks, converting each segment from Latin-1 to a Python string, which is then re-encoded as UTF-8 and written to the output file.
Related Resources
Tutorial
Unicode & Character Encodings in Python: A Painless Guide
In this tutorial, you'll get a Python-centric introduction to character encodings and unicode. Handling character encodings and numbering systems can at times seem painful and complicated, but this guide is here to help with easy-to-follow Python examples.
For additional information on related topics, take a look at the following resources:
- Reading and Writing Files in Python (Guide) (Tutorial)
- How to Convert Bytes to Strings in Python (Tutorial)
- Bytes Objects: Handling Binary Data in Python (Tutorial)
- Unicode in Python: Working With Character Encodings (Course)
- Reading and Writing Files in Python (Course)
- Reading and Writing Files in Python (Quiz)
- How to Convert Bytes to Strings in Python (Quiz)
- Python Bytes (Quiz)
By Leodanis Pozo Ramos • Updated March 20, 2026