Comparing TOML and Python Types
00:00
Comparing TOML Types and Python Types. In the previous section, you loaded some TOML documents and learned how tomllib
represents, for example, a TOML string as a Python string and a TOML array as a Python list.
00:13 The TOML specification doesn’t explicitly define how Python should represent TOML objects as that’s outside of its scope, but the TOML specification mentions some requirements on its own types.
00:25
For example, a TOML file must be a valid UTF-8 encoded Unicode document, arbitrary 64-bit signed integers should be accepted and handled losslessly, and floats should be implemented as IEEE 754 binary 64
values.
00:43
In general, TOML’s requirements match well with Python’s implementation of the corresponding types. Python usually defaults to using UTF-8
when handling files, and a Python float
follows IEEE 754.
00:55
Python’s int
class implements arbitrary precision integers, which handle the required range and much larger numbers as well. For libraries such as tomli
and tomllib
, the mapping between TOML’s data types and Python data types is quite natural.
01:11
The table seen on screen is found in the documentation of tomllib
. All the Python data types are either built-in or part of datetime
in the standard library.
01:21
Once again, it’s not a requirement that TOML types must map to native Python types, but it’s a convenience that tomli
and tomllib
have chosen to implement.
01:32 But using only standard types is also a limitation. In practice, you can then only represent values and not other information encoded in the TOML document, such as comments or indentation.
01:44 Your Python representation also doesn’t differentiate between values defined inside a regular table or an inline table. In many use cases, this information is irrelevant, so nothing is lost, but sometimes it is important.
01:58
For example, if you’re trying to insert a table into an existing TOML document, then you don’t want all the comments to disappear. You’ll learn about tomlkit
later.
02:09 It represents TOML types as custom Python objects that retain the information necessary to restore the complete TOML document.
02:17
The load()
and loads()
functions have one parameter that you can use to customize the TOML parsing. You can supply an argument to parse_float
to specify how floating-point numbers should be parsed.
02:29 The default implementation fulfills the requirement of using 64-bit floats, which will usually be precise to about 16 significant digits. But if you have an application that relies on very precise numbers, 16 digits may not be enough.
02:44 As an example, consider the concept of Julian days used in astronomy. This is a representation of a timestamp as a number counting the number of days since the beginning of the Julian period, which is more than 6,700 years ago.
02:58 You can see an example of this on screen.
03:03 Astronomers sometimes need to work with very small timescales, such as nanoseconds or even picoseconds. To represent a time of day to nanosecond precision, you’d need about 14 digits after the decimal point in a fractional number.
03:16 You can see another example of this on screen. Numbers like this, which are both large in value and precise to many decimal places, aren’t well represented as floats.
03:28
Let’s take a look at how much precision you do lose if you read this with tomllib
. You first use tomllib
to parse the Julian date, pick out the value, and name it ts
.
03:42
You can see that the value of ts
has been truncated by several decimal places. To figure out how bad the effect of the truncation is, you calculate the number of seconds represented by the fractional part of ts
and compare it to 7,260.
04:01
An integer Julian date represents noon on some day. 2:01 PM is two hours and one minute after noon, and two hours and one minute equals 7,260 seconds. So seconds - 7,260
shows how big of an error is introduced by the parsing.
04:20 In this case, the timestamp is about 10 microseconds off the mark. That might not sound like much, but in many astronomical applications, signals travel at the speed of light.
04:30 In that case, 10 microseconds may cause an error of about three kilometers. One common solution to this is to not store very precise timestamps as Julian dates.
04:40
Instead, many variants with more inherent precision exist, but you can also fix your example by using Python’s Decimal
class, which provides arbitrary precision decimal numbers.
04:53 Go back to the REPL and redo the previous example.
05:25 Now, the small error that’s left comes from the original representation and is about 19 picoseconds, which translates to sub-centimeter errors at the speed of light.
05:36
You can use Decimal
when you know that you require precise floating-point numbers. In more specific use cases, you may also store your data as strings and parse the strings in your application after you’ve read the TOML file.
05:50 So far, you’ve seen how you can read TOML files with Python. But in the next section of the course, you’ll see how you can incorporate a configuration file into your own projects.
Become a Member to join the conversation.