Python 3.11 is getting closer to its final release, which will happen in October 2022. The new version is currently going through beta testing, and you can install it yourself to preview and test some of the new features, including support for reading TOML with the new tomllib
module.
TOML is a configuration file format that’s getting more and more popular in the Python ecosystem. This is driven by the adoption of pyproject.toml
as the central configuration file in Python packaging. Other important tools, like Black, mypy, and pytest, also use TOML for their configuration.
In this tutorial, you’ll:
- Install Python 3.11 beta on your computer, next to your current Python installations
- Get familiar with the basics of the TOML format
- Read TOML files with the new
tomllib
module - Write TOML with third-party libraries and learn why this functionality is not included in
tomllib
- Explore Python 3.11’s new typing features, including the
Self
andLiteralString
types as well as variadic generics
There are many other new features and improvements coming in Python 3.11. Check out what’s new in the changelog for an up-to-date list, and read other Python 3.11 previews on Real Python to learn about other features.
Free Download: Click here to download free sample code that demonstrates some of the new features of Python 3.11.
Python 3.11 Beta
A new version of Python is released in October each year. The code is developed and tested over a seventeen-month period before the release date. New features are implemented during the alpha phase. For Python 3.11, seven alpha releases were made between October 2021 and April 2022.
The first beta release of Python 3.11 happened in the early hours of May 8, 2022. Each such pre-release is coordinated by a release manager—currently Pablo Galindo Salgado—and ties together hundreds of commits from Python’s core developers and other volunteers.
This release also marked the feature freeze for the new version. In other words, no new features will be added to Python 3.11 that aren’t already present in Python 3.11.0b1. Instead, the time between the feature freeze and the release date—October 3, 2022—is used to test and solidify the code.
About once a month during the beta phase, Python’s core developers release a new beta version to continue showing off the new features, testing them, and getting early feedback. Currently, the latest beta version of Python 3.11 is 3.11.0b3, released on June 1, 2022.
Note: This tutorial uses the third beta version of Python 3.11. You might experience small differences if you use a later version. However, tomllib
builds on a mature library, and you can expect that what you learn in this tutorial will stay the same through the beta phase and in the final release of Python 3.11.
If you’re maintaining your own Python package, then the beta phase is an important period when you should start testing your package with the new version. Together with the community, the core developers want to find and fix as many bugs as possible before the final release.
Cool New Features
Some of the highlights of Python 3.11 include:
- Enhanced error messages, which help you more effectively debug your code
- Task and exception groups, which streamline the use of asynchronous code and allow programs to raise and handle multiple exceptions at the same time
- TOML support, which allows you to parse TOML documents using the standard library
- Static typing improvements, which let you annotate your code more precisely
- Optimizations, which promise to make Python 3.11 significantly faster than previous versions
There’s a lot to look forward to in Python 3.11! You can already read about the enhanced error messages and task and exception groups in earlier Python 3.11 preview articles. For a comprehensive overview, check out Python 3.11: Cool New Features for You to Try.
In this tutorial, you’ll focus on how you can use the new tomllib
library to read and parse TOML files. You’ll also get a short peek at some of the typing improvements that’ll be shipping with Python 3.11.
Installation
To play with the code examples in this tutorial, you’ll need to install a version of Python 3.11 onto your system. In this subsection, you’ll learn about a few different ways to do this: using Docker, using pyenv, or installing from source. Pick the one that works best for you and your system.
Note: Beta versions are previews of upcoming features. While most features will work well, you shouldn’t depend on any Python 3.11 beta version in production or anywhere else where potential bugs will have serious consequences.
If you have access to Docker on your system, then you can download the latest version of Python 3.11 by pulling and running the python:3.11-rc-slim
Docker image:
$ docker pull python:3.11-rc-slim
3.11-rc-slim: Pulling from library/python
[...]
docker.io/library/python:3.11-rc-slim
$ docker run -it --rm python:3.11-rc-slim
This drops you into a Python 3.11 REPL. Check out Run Python Versions in Docker for more information about working with Python through Docker, including how to run scripts.
The pyenv tool is great for managing different versions of Python on your system, and you can use it to install Python 3.11 beta if you like. It comes with two different versions, one for Windows and one for Linux and macOS. Choose your platform with the switcher below:
Use pyenv install --list
to check which versions of Python 3.11 are available. Then, install the latest one:
$ pyenv install 3.11.0b3
Downloading Python-3.11.0b3.tar.xz...
[...]
The installation may take a few minutes. Once your new beta version is installed, then you can create a virtual environment where you can play with it:
You can also install Python from one of the pre-release versions available on python.org. Choose the latest pre-release and scroll down to the Files section at the bottom of the page. Download and install the file corresponding to your system. See Python 3 Installation & Setup Guide for more information.
Most of the examples in this tutorial rely on new features, so you should run them with your Python 3.11 executable. Exactly how you run the executable depends on how you installed it. If you need help, then have a look at the relevant tutorial on Docker, pyenv, virtual environments, or installing from source.
tomllib
TOML Parser in Python 3.11
Python is a mature language. The first public version of Python was released in 1991, more than thirty years ago. A lot of Python’s distinct features, including explicit exception handling, the reliance on whitespace, and rich data structures like lists and dictionaries, were present even in the early days.
One feature lacking in the first versions of Python, though, was a convenient way to share community packages and modules. That’s not so surprising. In fact, Python was invented at about the same time as the World Wide Web. At the end of 1991, only twelve web servers existed worldwide, and none of them were dedicated to distributing Python code.
Over time, both Python and the Internet got more popular. Several initiatives aimed to allow sharing of Python code. These features evolved organically and led to Python’s somewhat chaotic relationship to packaging.
This has been adressed through several Packaging PEPs (Python Enhancement Proposals) over the last couple of decades, and the situation has improved considerably for both library maintainers and end users.
One challenge was that building packages relied on executing a setup.py
file, but there was no mechanism for knowing which dependencies that file relied on. This created a kind of chicken-and-egg problem where you’d need to run setup.py
to discover how you can run setup.py
.
In practice, pip
—Python’s package manager—assumed that it should use Setuptools to build packages and that Setuptools is available on your computer. This made it harder to use alternative build systems like Flit and Poetry.
To resolve the situation, PEP 518 introduced the pyproject.toml
configuration file, which specifies Python project build dependencies. PEP 518 was accepted in 2016. At the time, TOML was still a fairly new format and there was no built-in support for parsing TOML in Python or its standard library.
As the TOML format has matured and the use of the pyproject.toml
file has settled in, Python 3.11 adds support for parsing TOML files. In this section, you’ll learn more about what the TOML format is, how you can use the new tomllib
to parse TOML documents, and why tomllib
doesn’t support writing TOML files.
Learn Basic TOML
Tom Preston-Werner first announced Tom’s Obvious, Minimal Language—commonly known as TOML—and released version 0.1.0 of its specification in 2013. From the beginning, the aim of TOML has been to provide a “minimal configuration file format that’s easy to read due to obvious semantics” (Source). The stable version 1.0.0 of the TOML specification was released in January 2021.
A TOML file is a UTF-8 encoded, case-sensitive text file. The main building blocks in TOML are key-value pairs, where the key is separated from the value by an equal sign (=
):
version = 3.11
In this minimal TOML document, version
is a key with the corresponding value 3.11
. Values have types in TOML. 3.11
is interpreted as a floating-point number. Other basic types that you may take advantage of are strings, Booleans, integer numbers, and dates:
version = 3.11
release_manager = "Pablo Galindo Salgado"
is_beta = true
beta_release = 3
release_date = 2022-06-01
This example shows most of these types. The syntax is similar to Python’s syntax, except for having lowercase Booleans and a special date literal. In their basic form, TOML key-value pairs resemble Python variable assignments, so they should look familiar. For more details on these and other similarities, check out the TOML Documentation.
At its core, a TOML document is a collection of key-value pairs. You can add some structure to these pairs by wrapping them in arrays and tables. An array is a list of values, similar to a Python list
. A table is a nested collection of key-value pairs, similar to a Python dict
.
You use square brackets to wrap the elements of an array. A table is initiated by starting with a [key]
line naming the table:
[python]
version = 3.11
release_manager = "Pablo Galindo Salgado"
is_beta = true
beta_release = 3
release_date = 2022-06-01
peps = [657, 654, 678, 680, 673, 675, 646, 659]
[toml]
version = 1.0
release_date = 2021-01-12
This TOML document can be represented as follows in Python:
{
"python": {
"version": 3.11,
"release_manager": "Pablo Galindo Salgado",
"is_beta": True,
"beta_release": 3,
"release_date": datetime.date(2022, 6, 1),
"peps": [657, 654, 678, 680, 673, 675, 646, 659],
},
"toml": {
"version": 1.0,
"release_date": datetime.date(2021, 1, 12),
},
}
The [python]
key in TOML becomes represented in Python by a "python"
key in the dictionary pointing to a nested dictionary containing all the key-value pairs in the TOML section. TOML tables can be arbitrarily nested, and a TOML document can contain several TOML tables.
This wraps up your short introduction to TOML syntax. Although TOML by design has a fairly minimal syntax, there are some details that you haven’t covered here. To dive deeper, check out Python and TOML: New Best Friends or the TOML specification.
In addition to its syntax, you should consider how you interpret values in a TOML file. TOML documents are usually used for configuration. Ultimately, some other application uses the information from a TOML document. That application therefore has some expectation about the content of the TOML file. The implication of this is that a TOML document can have two different kinds of errors:
- Syntax error: The TOML document isn’t valid TOML. The TOML parser usually catches this.
- Schema error: The TOML document is valid TOML, but its structure isn’t what the application expects. The application itself must handle this.
The TOML specification doesn’t currently include a schema language that can be used to validate the structure of TOML documents, although several proposals exist. Such a schema would check that a given TOML document includes the correct tables, keys, and value types for a given use case.
As an example of an informal schema, PEP 517 and PEP 518 say that a pyproject.toml
file should define the build-system
table, which must include the keys requires
and build-backend
. Furthermore, the value of requires
must be an array of strings, while the value of build-backend
must be a string. The following is an example of a TOML document fulfilling this schema:
# pyproject.toml
[build-system]
requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"
This example follows the requirements of PEP 517 and PEP 518. However, that validation is typically done by the build front-end.
Note: If you want to learn more about building your own packages in Python, check out How to Publish an Open-Source Python Package to PyPI.
You can check this validation yourself. Create the following erroneous pyproject.toml
file:
# pyproject.toml
[build-system]
requires = "setuptools>=61.0.0"
backend = "setuptools.build_meta"
This is valid TOML, so the file can be read by any TOML parser. However, it’s not a valid build-system
table according to the requirements in the PEPs. To confirm this, install build
, which is a PEP 517 compliant build front-end, and perform a build based on your pyproject.toml
file:
(venv) $ python -m pip install build
(venv) $ python -m build
ERROR Failed to validate `build-system` in pyproject.toml:
`requires` must be an array of strings
The error message points out that requires
must be an array of strings, as specified in PEP 518. Play with other versions of your pyproject.toml
file and note which other validations build
does for you. You may need to implement similar validations in your own applications.
So far, you’ve seen a few examples of TOML documents, but you haven’t explored how you can use them in your own projects. In the next subsection, you’ll learn how you can use the new tomllib
package in the standard library to read and parse TOML files in Python 3.11.
Read TOML With tomllib
Python 3.11 comes with a new module in the standard library named tomllib
. You can use tomllib
to read and parse any TOML v1.0 compliant document. In this subsection, you’ll learn how you can load TOML directly from files and from strings that contain TOML documents.
PEP 680 describes tomllib
and some of the process that led to TOML support being added to the standard library. Two deciding factors for the inclusion of tomllib
in Python 3.11 were the central role that pyproject.toml
plays in the Python packaging ecosystem and the TOML specification’s reaching version 1.0 in early 2021.
The implementation of tomllib
is more or less lifted straight from tomli
by Taneli Hukkinen, who’s also one of the co-authors of PEP 680.
The tomllib
module is quite simple in that it only contains two functions:
load()
reads TOML documents from files.loads()
reads TOML documents from strings.
You’ll first see how you can use tomllib
to read the following pyproject.toml
file, which is a simplified version of the same file in the tomli
project:
# pyproject.toml
[build-system]
requires = ["flit_core>=3.2.0,<4"]
build-backend = "flit_core.buildapi"
[project]
name = "tomli"
version = "2.0.1" # DO NOT EDIT THIS LINE MANUALLY. LET bump2version DO IT
description = "A lil' TOML parser"
requires-python = ">=3.7"
readme = "README.md"
keywords = ["toml"]
[project.urls]
"Homepage" = "https://github.com/hukkin/tomli"
"PyPI" = "https://pypi.org/project/tomli"
Copy this document and save it in a file named pyproject.toml
on your local file system. You can now start a REPL session in order to explore Python 3.11’s TOML support:
>>> import tomllib
>>> with open("pyproject.toml", mode="rb") as fp:
... tomllib.load(fp)
...
{'build-system': {'requires': ['flit_core>=3.2.0,<4'],
'build-backend': 'flit_core.buildapi'},
'project': {'name': 'tomli',
'version': '2.0.1',
'description': "A lil' TOML parser",
'requires-python': '>=3.7',
'readme': 'README.md',
'keywords': ['toml'],
'urls': {'Homepage': 'https://github.com/hukkin/tomli',
'PyPI': 'https://pypi.org/project/tomli'}}}
You use load()
to read and parse a TOML file by passing a file pointer to the function. Note that the file pointer must point to a binary stream. One way to ensure this is to use open()
with mode="rb"
, where the b
indicates binary mode.
Note: According to PEP 680, the file must be opened in binary mode so that tomllib
can ensure that the UTF-8 encoding is handled correctly on all systems.
Compare the original TOML document with the resulting Python data structure. The document is represented by a Python dictionary where all the keys are strings, and different tables in TOML are represented as nested dictionaries. Observe that the comment about version
in the original file is ignored and not part of the result.
You can use loads()
to load a TOML document that’s already represented in a string. The following example parses the example from the previous subsection:
>>> import tomllib
>>> document = """
... [python]
... version = 3.11
... release_manager = "Pablo Galindo Salgado"
... is_beta = true
... beta_release = 3
... release_date = 2022-06-01
... peps = [657, 654, 678, 680, 673, 675, 646, 659]
...
... [toml]
... version = 1.0
... release_date = 2021-01-12
... """
>>> tomllib.loads(document)
{'python': {'version': 3.11,
'release_manager': 'Pablo Galindo Salgado',
'is_beta': True,
'beta_release': 3,
'release_date': datetime.date(2022, 6, 1),
'peps': [657, 654, 678, 680, 673, 675, 646, 659]},
'toml': {'version': 1.0,
'release_date': datetime.date(2021, 1, 12)}}
Similarly to load()
, loads()
returns a dictionary. In general, the representation is based on basic Python types: str
, float
, int
, bool
, as well as dictionaries, lists, and datetime
objects. The tomllib
documentation includes a conversion table that shows how TOML types are represented in Python.
If you prefer, then you can use loads()
to read TOML from files by combining it with pathlib
:
>>> import pathlib
>>> import tomllib
>>> path = pathlib.Path("pyproject.toml")
>>> with path.open(mode="rb") as fp:
... from_load = tomllib.load(fp)
...
>>> from_loads = tomllib.loads(path.read_text())
>>> from_load == from_loads
True
In this example, you load pyproject.toml
using both load()
and loads()
. You then confirm that the Python representation is the same regardless of how you load the file.
Both load()
and loads()
accept one optional parameter: parse_float
. This allows you to take control over how floating-point numbers are parsed and represented in Python. By default, they’re parsed and stored as float
objects, which in most Python implementations are 64-bit with about 16 decimal digits of precision.
One alternative, if you need to work with more precise numbers, is to use decimal.Decimal
instead:
>>> import tomllib
>>> from decimal import Decimal
>>> document = """
... small = 0.12345678901234567890
... large = 9999.12345678901234567890
... """
>>> tomllib.loads(document)
{'small': 0.12345678901234568,
'large': 9999.123456789011}
>>> tomllib.loads(document, parse_float=Decimal)
{'small': Decimal('0.12345678901234567890'),
'large': Decimal('9999.12345678901234567890')}
Here you load a TOML document with two key-value pairs. By default, you lose a bit of precision when using load()
or loads()
. By using the Decimal
class, you keep the precision in your input.
As noted, the tomllib
module is adapted from the popular tomli
module. If you want to use TOML and tomllib
on codebases that need to support older versions of Python, then you can fall back on tomli
. To do so, add the following line in your requirements file:
tomli >= 1.1.0 ; python_version < "3.11"
This will install tomli
when used on Python versions before 3.11. In your source code, you can then use tomllib
or tomli
as appropriate with the following import:
try:
import tomllib
except ModuleNotFoundError:
import tomli as tomllib
This code will import tomllib
on Python 3.11 and later. If tomllib
isn’t available, then tomli
is imported and aliased to the tomllib
name.
You’ve seen how to use tomllib
to read TOML documents. You may wonder how you can write TOML files. It turns out that you can’t write TOML with tomllib
. Read on to learn why, and to see some of the alternatives.
Write TOML
Similar existing libraries like json
and pickle
include both load()
and dump()
functions, where the latter is used to write data. The dump()
function, as well as the corresponding dumps()
, is deliberately left out of tomllib
.
According to PEP 680 and the discussion around it, this has been done for a handful of reasons:
-
The main motivation for including
tomllib
in the standard library is to be able to read TOML files used in the ecosystem. -
The TOML format is designed to be a human-friendly configuration format, so many TOML files are written manually.
-
The TOML format isn’t designed to be a data serialization format like JSON or pickle, so being fully consistent with the
json
andpickle
APIs isn’t necessary. -
TOML documents may contain comments and formatting that should be preserved when written to file. This isn’t compatible with representing TOML as basic Python types.
-
There are different opinions about how to lay out and format TOML files.
-
None of the core developers expressed interest in maintaining a write API for
tomllib
.
Once something is added to the standard library, it becomes hard to change or remove because someone’s relying on it. This is a good thing, as it means that Python stays mostly backward compatible: few Python programs that run on Python 3.10 will stop working on Python 3.11.
Another consequence is that the core team is conservative about adding new features. Support for writing TOML documents can be added later if it becomes clear that there’s a real demand for it.
This doesn’t leave you empty-handed, though. There are several third-party TOML writers available. The tomllib
documentation mentions two packages:
tomli-w
is, as the name implies, a sibling oftomli
that can write TOML documents. It’s a simple module without many options to control the output.tomlkit
is a powerful package for working with TOML documents, and it supports both reading and writing. It preserves comments, indentation, and other whitespace. TOML Kit is developed for and used by Poetry.
Depending on your use case, one of those packages will probably fulfill your TOML writing needs.
If you don’t want to add an external dependency just to write a TOML file, then you can also try to roll your own writer. The following example shows an example of an incomplete TOML writer. It doesn’t support all the features of TOML v1.0, but it supports enough to write the pyproject.toml
example that you saw earlier:
# tomllib_w.py
from datetime import date
def dumps(toml_dict, table=""):
document = []
for key, value in toml_dict.items():
match value:
case dict():
table_key = f"{table}.{key}" if table else key
document.append(
f"\n[{table_key}]\n{dumps(value, table=table_key)}"
)
case _:
document.append(f"{key} = {_dumps_value(value)}")
return "\n".join(document)
def _dumps_value(value):
match value:
case bool():
return "true" if value else "false"
case float() | int():
return str(value)
case str():
return f'"{value}"'
case date():
return value.isoformat()
case list():
return f"[{', '.join(_dumps_value(v) for v in value)}]"
case _:
raise TypeError(
f"{type(value).__name__} {value!r} is not supported"
)
The dumps()
function accepts a dictionary representing a TOML document. It converts the dictionary to a string by looping over the key-value pairs in the dictionary. You’ll have a closer look at the details soon. First, you should check that the code works. Open a REPL and import dumps()
:
>>> from tomllib_w import dumps
>>> print(dumps({"version": 3.11, "module": "tomllib_w", "stdlib": False}))
version = 3.11
module = "tomllib_w"
stdlib = false
You write a simple dictionary with different types of values. They’re correctly written as TOML types: numbers are plain, strings are surrounded by double quotes, and Booleans are lowercase.
Look back at the code. Most of the serialization to TOML types happens in the helper function, _dumps_value()
. It uses structural pattern matching to construct different kinds of TOML strings based on the type of value
.
The main dumps()
function works with dictionaries. It loops over each key-value pair. If the value is another dictionary, then it constructs a TOML table by adding a table header and then calling itself recursively to handle the key-value pairs inside of the table. If the value isn’t a dictionary, then _dumps_value()
is used to correctly convert the key-value pair to TOML.
As noted, this writer doesn’t support the full TOML specification. For example, it doesn’t support all date and time types that are available in TOML, or nested structures like inline or array tables. There are also some edge cases in string handling that aren’t supported. However, it’s enough for many applications.
You can, for example, try to load and then dump the pyproject.toml
file that you worked with earlier:
>>> import tomllib
>>> from tomllib_w import dumps
>>> with open("pyproject.toml", mode="rb") as fp:
... pyproject = tomllib.load(fp)
...
>>> print(dumps(pyproject))
[build-system]
requires = ["flit_core>=3.2.0,<4"]
build-backend = "flit_core.buildapi"
[project]
name = "tomli"
version = "2.0.1"
description = "A lil' TOML parser"
requires-python = ">=3.7"
readme = "README.md"
keywords = ["toml"]
[project.urls]
Homepage = "https://github.com/hukkin/tomli"
PyPI = "https://pypi.org/project/tomli"
Here, you first read pyproject.toml
with tomllib
. Then you use your own tomllib_w
module to write the TOML document back to the console.
You may expand on tomllib_w
if you need better support for writing TOML documents. However, in most cases you should rely on one of the existing packages, like tomli_w
or tomlkit
, instead.
While you’re not getting support for writing TOML files in Python 3.11, the included TOML parser will be useful for many projects. Going forward, you can use TOML for your configuration files, knowing that you’ll have first-class support for reading them in Python.
Other New Features
TOML support is certainly a cause for celebration, but there are several smaller improvements arriving in Python 3.11 as well. One area that has seen such incremental change over a long time is Python’s type checking landscape.
PEP 484 introduced type hints. They’ve been available since Pyhon 3.5, and every new Python version adds capabilities to the static type system. Łukasz Langa talked about type checking in his keynote at the PyCon US 2022 conference.
There are several new typing-related PEPs accepted for Python 3.11. You’ll shortly learn more about the Self
type, the LiteralString
type, and variadic generics.
Note: Type checking enhancements are a bit special, because they depend on both your Python version and the version of your type checking tool. Some of the new Python 3.11 type system features are supported in the latest beta version, but aren’t yet implemented in all the type checkers.
For example, you can monitor the status of mypy’s support for the new features on their GitHub page.
There are even a few new typing-related features that won’t be covered below. PEP 681 adds the @dataclass_transform
decorator, which can label classes with semantics similar to data classes. Additionally, PEP 655 lets you mark required and optional fields in typed dictionaries.
Self Type
PEP 673 introduces a new Self
type that dynamically refers to the current class. This is useful when you implement a class with methods that return instances of the class. Consider the following partial implementation of a two-dimensional point represented by polar coordinates:
# polar_point.py
import math
from dataclasses import dataclass
@dataclass
class PolarPoint:
r: float
φ: float
@classmethod
def from_xy(cls, x, y):
return cls(r=math.hypot(x, y), φ=math.atan2(y, x))
You add the .from_xy()
constructor so that you can conveniently create PolarPoint
instances from their corresponding Cartesian coordinates.
Note: The attribute names .r
and .φ
are deliberately chosen to resemble the mathematical symbols used in formulas.
In general, it’s recommended to use longer and more descriptive names for your attributes. However, sometimes following the conventions of your problem domain can be useful as well. Feel free to replace .r
with .radius
and .φ
with .phi
or .angle
if you prefer.
Python source code is encoded in UTF-8 by default. Still, identifiers like variables and attributes can’t use the full Unicode alphabet. For example, you must stay away from emojis in your variable and attribute names.
You can use your new class as follows:
>>> from polar_point import PolarPoint
>>> point = PolarPoint.from_xy(3, 4)
>>> point
PolarPoint(r=5.0, φ=0.9272952180016122)
>>> from math import cos
>>> point.r * cos(point.φ)
3.0000000000000004
Here, you first create a point representing the Cartesian point (3, 4). In polar coordinates, this point is represented by the radius r
= 5.0 and the angle φ
≈ 0.927. You can convert back to the Cartesian x
coordinate with the formula x = r * cos(φ)
.
Now, you want to add type hints to .from_xy()
. It returns a PolarPoint
object. However, you can’t directly use PolarPoint
as an annotation at this point, because that class hasn’t been fully defined yet. Instead, you can use "PolarPoint"
with quotation marks or add a PEP 563 future import that postpones the evaluation of annotations.
Both of these work-arounds have their drawbacks, and the current recommendation is to use a TypeVar
instead. This approach will work even in subclasses, but it’s cumbersome and error-prone.
With the new Self
type, you can add type hints to your class as follows:
import math
from dataclasses import dataclass
from typing import Self
@dataclass
class PolarPoint:
r: float
φ: float
@classmethod
def from_xy(cls, x: float, y: float) -> Self:
return cls(r=math.hypot(x, y), φ=math.atan2(y, x))
The annotation -> Self
indicates that .from_xy()
will return an instance of the current class. This will also work correctly if you create a subclass of PolarPoint
.
Having the Self
type in your tool box will make it more convenient to add static typing to projects using classes and object-oriented features like inheritance.
Arbitrary Literal String Type
Another new type coming with Python 3.11 is LiteralString
. While the name may remind you of Literal
, which was added in Python 3.8, the main use case of LiteralString
is a bit different. To understand the motivation for adding it to the type system, first take a step back and think about strings.
In general, Python doesn’t care how you construct strings:
>>> s1 = "Python"
>>> s2 = "".join(["P", "y", "t", "h", "o", "n"])
>>> s3 = input()
Python
>>> s1 == s2 == s3
True
In this example, you create the string "Python"
in three different ways. First, you specify it as a literal string. Next, you join a list of six single-character strings to form the string "Python"
. Finally, you read the string from user input using input()
.
The final test shows that the value of each string is the same. In most applications, you don’t need to care about how a particular string is constructed. However, there are times when you need to be careful, in particular when working with user input.
SQL injection attacks against databases are unfortunately common. The Java Log4j vulnerability similarly exploited the logging system to execute arbitrary code.
Return to the example above. While the values of s1
and s3
happen to be the same, your trust in those two strings should be quite different. Say that you need to construct a SQL statement that reads information about a user from a database:
>>> def get_user_sql(user_id):
... return f"SELECT * FROM users WHERE user_id = '{user_id}'"
...
>>> user_id = "Bobby"
>>> get_user_sql(user_id)
"SELECT * FROM users WHERE user_id = 'Bobby'"
>>> user_id = input()
Robert'; DROP TABLE users; --
>>> get_user_sql(user_id)
"SELECT * FROM users WHERE user_id = 'Robert'; DROP TABLE users; --'"
This is an adaptation of a classic SQL injection example. A malicious user can exploit the ability to write arbitrary SQL code to wreak havoc. If the last SQL statement were executed, then it would delete the users
table.
There are many mechanisms to defend against these kinds of attacks. PEP 675 adds one more to the list. A new type is added to the typing
module: LiteralString
is a special kind of string type that’s defined literally in your code.
You can use LiteralString
to mark functions that would be vulnerable to user-controlled strings. For example, a function that executes SQL queries can be annotated as follows:
from typing import LiteralString
def execute_sql(query: LiteralString):
# ...
A type checker will pay special attention to the type of values passed as query
in this function. The following strings will all be allowed as arguments to execute_sql
:
>>> execute_sql("SELECT * FROM users")
>>> table = "users"
>>> execute_sql("SELECT * FROM " + table)
>>> execute_sql(f"SELECT * FROM {table}")
The last two examples are okay because query
is built from literal strings. A string is only recognized as a LiteralString
if all parts of the string are defined literally. For example, the following example will not pass the type check:
>>> user_input = input()
users
>>> execute_sql("SELECT * FROM " + user_input)
Even though the value of user_input
happens to be the same as the value of table
from earlier, the type checker will raise an error here. Users control the value of user_input
and can potentially change it to something that’s unsafe for your application. If you flag these kinds of vulnerable functions by using LiteralString
, type checkers will help you keep track of sitations where you need to be extra careful.
Variadic Generic Types
A generic type specifies a type parametrized with other types, for example a list of strings or a tuple consisting of an integer, a string, and another integer. Python uses square brackets to parametrize generics. You write the two examples as list[str]
and tuple[int, str, int]
, respectively.
A variadic is an entity that accepts a variable number of arguments. For example, print()
is a variadic function in Python:
>>> print("abc", 123, "def")
abc 123 def
You can define your own variadic functions by using *args
and **kwargs
to capture multiple positional and keyword arguments.
You can use typing.Generic
if you want to specify that your own class is generic. Here’s an example for a vector, also known as a one-dimensional array:
# vector.py
from typing import Generic, TypeVar
T = TypeVar("T")
class Vector(Generic[T]):
...
The type variable T
is used as a stand-in for any type. You can use Vector
in a type annotation as follows:
>>> from vector import Vector
>>> position: Vector[float]
In this particular example, T
will be float
. To make your code clearer and more type safe, you can also use type aliases or even dedicated derived types:
>>> from typing import NewType
>>> from vector import Vector
>>> Coordinate = NewType("Coordinate", float)
>>> Coordinate(3.11)
3.11
>>> type(Coordinate(3.11))
<class 'float'>
>>> position: Vector[Coordinate]
Here, Coordinate
behaves like a float
at runtime, but static type checks will differentiate between a Coordinate
and float
.
Now, say that you create a more general array class that can handle a variable number of dimensions. Until now, there’s been no good way to specify such variadic generics.
PEP 646 introduces typing.TypeVarTuple
to handle this use case. These type variable tuples are essentially an arbitrary number of type variables wrapped in a tuple. You can use them to define an array with an arbitrary number of dimensions:
# ndarray.py
from typing import Generic, TypeVarTuple
Ts = TypeVarTuple("Ts")
class Array(Generic[*Ts]):
...
Note the use of the unpacking operator (*
). This is a necessary part of the syntax and indicates that Ts
represents a variable number of types.
Note: You can import TypeVarTuple
from typing_extensions
on Python versions prior to 3.11. However, the *Ts
syntax won’t work on those versions. As an equivalent alternative, you can use typing_extensions.Unpack
and write Unpack[Ts]
.
You can use NewType
to label the dimensions in the array or Literal
to specify an exact shape:
>>> from typing import Literal, NewType
>>> from ndarray import Array
>>> Height = NewType("Height", int)
>>> Width = NewType("Width", int)
>>> Channels = NewType("Channels", int)
>>> image: Array[Height, Width, Channels]
>>> video_frame: Array[Literal[1920], Literal[1080], Literal[3]]
You annotate image
as being a three-dimensional array with the dimensions labeled as Height
, Width
, and Channels
. You don’t specify the size of any of these dimensions. The second example, video_frame
, is annotated with literal values. In practice, this means that video_frame
must be an array with the specific shape 1920 × 1080 × 3.
The main motivation for variadic generics is typing arrays like you’ve seen in the examples above. However, there are also other use cases. NumPy and other array libraries plan to implement variadic generics once the tooling is in place.
Conclusion
In this tutorial, you’ve learned about some of the new features that you can play with in Python 3.11. While the final release happens in October 2022, you can already download a beta release and try out the new features. Here, you’ve explored the new tomllib
module and gotten more familiar with the TOML format along the way.
You’ve done the following:
- Installed Python 3.11 beta on your computer, next to your current Python installations
- Read TOML files with the new
tomllib
module - Written TOML with third-party libraries and created your own function to write a subset of TOML
- Explored Python 3.11’s new typing features, including the
Self
andLiteralString
types as well as variadic generics
Are you already using TOML in your projects? Try out the new TOML parser and share your experiences in the comments below.
Free Download: Click here to download free sample code that demonstrates some of the new features of Python 3.11.