Tools and Challenges
00:00 In the previous lesson, I showed you PyYAML’s serialization and deserialization features in greater detail. This lesson is a hodgepodge of practical advice around tools and problems you might encounter.
00:13 This course has mainly shown you how to use PyYAML to deal with YAML docs in Python. The default installation is actually a pure-Python implementation. PyYAML is also available as a wrapper to LibYAML, a C-based library.
This can give you much higher performance, but it has a separate installation process. You can’t just
pip install it. See the docs if this is something that interests you.
Although PyYAML is still the most popular library out there, it might be argued that it’s a bad idea, seeing as it’s 1.1-based. If you want to move along to something that’s 1.2-based,
ruamel.yaml is just such a beast. There is also
StrictYAML, which implements a subset of the YAML spec, skipping many of the problems I’m going to be mentioning shortly.
You can find
yamllint, a linter for YAML, and the
yq command for interacting with YAML based on the command line. There’s also a tool with the same name that is a compiled version, which is faster, but of course you can’t
pip install that.
shyaml is an alternative to
yq—the Python version, that is.
01:22 The JSON formatter site has YAML capabilities with pages for parsing, formatting, and validating YAML if you want to use online tools instead.
01:35 I touched on this briefly before when showing you the trouble with timestamps, but YAML’s casting can be tricky. YAML 1.2 has fixed some of these challenges, but YAML 1.1 is still very much out there.
01:47 “The yaml document from hell” is a great article by Ruud van Asseldonk. He has constructed a YAML document with content that triggers a lot of the casting problems.
01:57 You can read the article, or you can watch me blow up a YAML parser using Ruud’s Satanic doc right here.
02:05 I’ve split Ruud’s doc into two parts, the part that parses problematically and the part that, well, doesn’t. The first part is in the top here. Let me load this, and then I’ll talk about the surprises.
And there’s the results. Let’s start with the
port_mapping spec. Because YAML doesn’t require quotes for strings, you might think, “Hey, I can just create port maps.” Well, that works for ports 80 and 443, but note what happens with the SSH port.
02:39 This is that timestamp problem I spoke about earlier. YAML 1.1’s base-60 support is turning 22:22 into an integer instead of a string. Let me scroll the document down a little bit to get the next surprise.
This is a sequence of country codes. That’s good, right? Turns out YAML doesn’t like Norway. The country code for Norway is a no, which in YAML 1.1 is
no and treated as false.
I live in Ontario, Canada. The short form for the province is “on.” “on” in YAML 1.1 is
on, which means true. So a list of Canadian provinces has the same problem.
Heaven help anybody in Norway if one of their counties is named Ontario. In the
flush_cache block, you can see the same kind of problem even extends to the keys.
on as a key becomes
True in the Python dictionary. Scrolling down a little more … and the last one for version numbers might seem obvious, but it’s an easy enough mistake to make.
<number>.<number> is a float, so if you’re plugging away with major-minor-patch format for your version numbers, then accidentally drop a
.0 patch part, you’re not going to get a string like the other versions. You’re going to get a float.
04:00 All right, that was fun. Time for one more.
04:06 The doc in the top here is showing some filename shortcuts. Once again, the feature of not having to quote your strings is going to be problematic. Let me parse this baby.
And kaboom, since the
* character is reserved to mean alias and the
! character is part of a tag, the last three items in the sequence are going to cause most parsers to barf.
04:35 All right. You know that part of a DVD or Blu-ray at the beginning where they tell you the opinions or those are the actors in the special features and not the production company?
04:44 Yeah. Well, this is that part, and I’m sure the Internet has places that refutes everything I’m going to say here. So you know, understand this is my take on all this.
04:53 So, when should you use it? Well, the snarky answer is don’t. The answer’s not quite fair, but to be more specific, don’t seek it out for yourself. If you’re working with data that is in YAML, great, don’t reinvent the wheel. If you’re in the DevOps space, you’re going to come across it, and that’s fine.
But if you’re writing your own program and looking for a data format for configuration or storage, I personally wouldn’t choose YAML. This goes double for the
!!python tags in PyYAML.
05:21 I get what they’re trying to achieve here from an extensibility standpoint, but if you’re using them, well pretty much only your project is going to be able to use that YAML.
The only programs that’ll be able to fully read a YAML file with
!!python tags in it is a Python program using PyYAML. Other parsers might read it and ignore the tags or cause an error.
05:43 If you’re going down this road, personally, I would just use TOML instead. It’s not quite as popular as YAML as it’s newer, but it supports the same data types and hierarchical structure and doesn’t have all the weirdnesses that come with quoteless strings.
05:58 It is supported by many different programming languages and is part of the standard library as of Python 3.11. I joked earlier about the irony of a Python programmer kvetching about white space being important, but I do see these as two different things.
06:12 I don’t find that I need copy and paste code very often, and if you do, most IDEs will deal with it for you. Data, on the other hand, gets ferried around all the time, and indentation being significant for what is and isn’t a new line is asking for trouble.
06:29 If TOML isn’t your thing or if you’re storing data to be used by other programming languages, JSON pretty much is the defacto standard. It isn’t quite as readable as YAML, but isn’t bad.
06:40 It tends to be far faster to parse, and the second most popular language on the planet uses it natively. Or maybe this week it’s the most popular, or the third, depends on who you ask anyway.
06:50 One of the strongest arguments for this is the spec itself. YAML 1.2 is a superset of JSON, so by definition, it has to be more complex.
07:01 Somebody remind me to find out if the comments section on this can be turned off for this lesson, huh? Okay, well, I’ll try not to trip as I get down off my soap box. Next up, I’ll summarize the course and point you at other content you might find interesting.
Become a Member to join the conversation.