Introduction to YAML
00:44 It also comes with a mechanism for hooking into your language’s specific types as well. The YAML specification has gone through multiple iterations. The most up-to-date is 1.2, but 1.1 is still frequently encountered. In fact, the most popular Python library for YAML is still based on the 1.1 spec.
This is a sample YAML file that I borrowed from Wikipedia. Like a dictionary, it’s mostly made up of key-value pairs. The first line has
receipt as key and
Oz-Ware Purchase Invoice as the value. In this case, that’s a string.
You can nest the dictionaries, like with the
customer item here. In this case, the key in the parent is
customer, and the value is another dictionary containing the key-value pairs of the
Dorothy and the
family_name: Gale. YAML also supports arrays also known as sequences. It has a couple of different formats for it, and the one shown here uses dashes. Under the
items key is a sequence of two dictionaries, each dictionary being the parts of the warehouse. This is just a taste.
02:06 I mentioned in the overview that YAML is often used to solve the same kinds of problems as XML and JSON. Let’s compare YAML to XML, JSON, and the newer kid on the block, TOML. First, I’ll talk about adoption and support. That’s a fancy way of saying popularity.
02:23 XML and JSON pretty much are the gold standards here. In fact, I’d argue that JSON is even more, at least from the popularity viewpoint. That being said, YAML is still quite common, and there are plenty of solid libraries out there for using it. TOML may be the newer kid on the block, but it has a lot of cross-language support and has become part of Python’s standard library as of Python 3.11.
02:56 YAML and TOML win out here. Both are friendly on the eyes, using white space to build documents that are human-readable. Those little red asterisks (askeri?) on YAML and JSON are partially because of variations.
03:11 JSON is actually pretty readable, until you remove all the spaces from it and shrink it down to send it over the wire. Likewise, basic YAML is very readable, but there are some things you can do that make it problematic.
03:22 I’ll talk about some of those things in a later lesson. Efficiency is kind of hard to measure. It’s more dependent on the library doing the work than the standard. Generally, a simpler standard is going to be faster to parse, and also, the more popular the standard, the more likely someone has optimized the heck out of the parser.
03:43 JSON tends to win this space due to being simplest and most popular. That said, though, it honestly doesn’t matter all that often. As these formats are mostly used for doing simple data storage, the bottleneck in a program isn’t typically parsing a file.
03:58 Last of all is verbosity, and this is where XML loses. XML, and HTML its cousin, drive me crazy. The opening and closing tags are repetitive, the tags have angle brackets around them, too much typing.
04:11 JSON is the least verbose, but so much so that it impacts its readability. Mismatched braces can be hard to notice because everything is a brace. Note I haven’t given any of these five stars. If you want a truly small file, you shouldn’t be using text. Of course, binary means losing the human-readability thing and it kind of sucks that everything is a trade-off, but that’s what makes writing software interesting.
04:37 The title of this slide sounds like a reality show. Anyhow, YAML gets used a lot in the DevOps space. Lots of container and cloud configuration tools use YAML, and often YAML exclusively. I’m not quite sure why this is, but I suspect some early version of a tool did it, and others just copied.
04:57 In addition to the DevOps space, the OpenAPI specification has defined a way of generating REST APIs using a YAML doc. I’ll talk in a later lesson on whether you should choose YAML for your project, but a big factor in that question is whether you’re operating in a space where YAML is necessary.
05:23 As I mentioned in the overview, reading and writing YAML isn’t built into the Python standard library. There are a bunch of third-party libraries for dealing with it, though, and the most popular of which is called PyYAML.
05:35 The challenge with PyYAML is it’s based on the 1.1 spec for YAML, which means there are a couple things that you can’t do and a couple of foot-guns that were removed in YAML 1.2 that are still pointed at your toes.
Like with most third-party libraries, you can install PyYAML using
pip. This would be the point in the course where I remind you to use a virtual environment when installing anything. There, you’ve been reminded. For the next few lessons, all the examples are going to follow a pattern.
06:05 I’ll show you some YAML, and then I’ll show you the Python object that results from that YAML. This takes only a few lines of code, but since I’m going to be doing it a lot, I’m going to put that in a utility function.
sweetpotato.py. As I said, not a lot of code here. PyYAML’s module is called
yaml, which I’m importing here on line 2. As I want a pretty-print a Python object to the screen, I’m going to need the
pprint() function from the
I’m not a huge fan of this. In fact, in my own code, I almost always convert to JSON and then use the
json library to pretty-print, but that doesn’t work for all Python objects, so you’re stuck with the weird indentation that the core developers have defined arbitrarily as pretty. Beholder, see beauty, eye of. Anyhow, the
show_spud() function takes the name of a file and opens it.
Line 6 is a context manager to open the file. Note that it uses read-binary (
"rb") mode. That has to do with the fact that YAML, although text, can use UTF encodings that Python doesn’t handle, so the file is opened in binary.
The top window here has a really simple YAML file inside of it. Notice the hierarchical structure. Each of the yellow labels is the key in a key-value pair. In some cases, the value is the key-value pair below it, and in the case of the
name keys, the value is a string. This results in nested dictionaries in Python.
That nested dict has one key-value pair, the key
'parent' and another nested dict. Going all Russian dolls here. Inside of
'parent', you get two-key value pairs, one called
'child', the other called
Become a Member to join the conversation.