Introduction to YAML
00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’ll show you the basic structure of a YAML file and show you how to read it using the PyYAML library.
00:11 YAML is a text-based data serialization format. It originally stood for Yet Another Markup Language, but later versions renamed it to the recursive acronym YAML Ain’t Markup Language.
00:24 So right out of the gate, it can’t quite decide whether it’s markup or not. Yeah, it’s going to be that kind of course.
00:32 It is text-based and is usually UTF-8 encoded. It has a hierarchical nature, which translates into nested dictionaries quite well, and handles the major data types you’re accustomed to.
00:44 It also comes with a mechanism for hooking into your language’s specific types as well. The YAML specification has gone through multiple iterations. The most up-to-date is 1.2, but 1.1 is still frequently encountered. In fact, the most popular Python library for YAML is still based on the 1.1 spec.
01:04 This causes some problems. I’ll point those out as you go along.
01:11
This is a sample YAML file that I borrowed from Wikipedia. Like a dictionary, it’s mostly made up of key-value pairs. The first line has receipt
as key and Oz-Ware Purchase Invoice
as the value. In this case, that’s a string.
01:27
You can nest the dictionaries, like with the customer
item here. In this case, the key in the parent is customer
, and the value is another dictionary containing the key-value pairs of the first_name:
Dorothy
and the family_name: Gale
. YAML also supports arrays also known as sequences. It has a couple of different formats for it, and the one shown here uses dashes. Under the items
key is a sequence of two dictionaries, each dictionary being the parts of the warehouse. This is just a taste.
02:00 Each of these parts and many others will be explained as you proceed through the course.
02:06 I mentioned in the overview that YAML is often used to solve the same kinds of problems as XML and JSON. Let’s compare YAML to XML, JSON, and the newer kid on the block, TOML. First, I’ll talk about adoption and support. That’s a fancy way of saying popularity.
02:23 XML and JSON pretty much are the gold standards here. In fact, I’d argue that JSON is even more, at least from the popularity viewpoint. That being said, YAML is still quite common, and there are plenty of solid libraries out there for using it. TOML may be the newer kid on the block, but it has a lot of cross-language support and has become part of Python’s standard library as of Python 3.11.
02:45 TOML’s design is very YAML-esque. They pretty much took the good parts of YAML and INI files and threw out the bad parts and called it a standard. Second is readability.
02:56 YAML and TOML win out here. Both are friendly on the eyes, using white space to build documents that are human-readable. Those little red asterisks (askeri?) on YAML and JSON are partially because of variations.
03:11 JSON is actually pretty readable, until you remove all the spaces from it and shrink it down to send it over the wire. Likewise, basic YAML is very readable, but there are some things you can do that make it problematic.
03:22 I’ll talk about some of those things in a later lesson. Efficiency is kind of hard to measure. It’s more dependent on the library doing the work than the standard. Generally, a simpler standard is going to be faster to parse, and also, the more popular the standard, the more likely someone has optimized the heck out of the parser.
03:43 JSON tends to win this space due to being simplest and most popular. That said, though, it honestly doesn’t matter all that often. As these formats are mostly used for doing simple data storage, the bottleneck in a program isn’t typically parsing a file.
03:58 Last of all is verbosity, and this is where XML loses. XML, and HTML its cousin, drive me crazy. The opening and closing tags are repetitive, the tags have angle brackets around them, too much typing.
04:11 JSON is the least verbose, but so much so that it impacts its readability. Mismatched braces can be hard to notice because everything is a brace. Note I haven’t given any of these five stars. If you want a truly small file, you shouldn’t be using text. Of course, binary means losing the human-readability thing and it kind of sucks that everything is a trade-off, but that’s what makes writing software interesting.
04:37 The title of this slide sounds like a reality show. Anyhow, YAML gets used a lot in the DevOps space. Lots of container and cloud configuration tools use YAML, and often YAML exclusively. I’m not quite sure why this is, but I suspect some early version of a tool did it, and others just copied.
04:57 In addition to the DevOps space, the OpenAPI specification has defined a way of generating REST APIs using a YAML doc. I’ll talk in a later lesson on whether you should choose YAML for your project, but a big factor in that question is whether you’re operating in a space where YAML is necessary.
05:15 If you’re writing code that interacts with the tools on this slide, writing Python that speaks YAML is probably for you.
05:23 As I mentioned in the overview, reading and writing YAML isn’t built into the Python standard library. There are a bunch of third-party libraries for dealing with it, though, and the most popular of which is called PyYAML.
05:35 The challenge with PyYAML is it’s based on the 1.1 spec for YAML, which means there are a couple things that you can’t do and a couple of foot-guns that were removed in YAML 1.2 that are still pointed at your toes.
05:49
Like with most third-party libraries, you can install PyYAML using pip
. This would be the point in the course where I remind you to use a virtual environment when installing anything. There, you’ve been reminded. For the next few lessons, all the examples are going to follow a pattern.
06:05 I’ll show you some YAML, and then I’ll show you the Python object that results from that YAML. This takes only a few lines of code, but since I’m going to be doing it a lot, I’m going to put that in a utility function.
06:17
Seeing as I couldn’t resist the tuber-based pun, I’ll be writing show_spud()
, a function that reads a YAML file and pretty-prints the resulting Python object to the screen.
06:27 Let’s go look at my yam-inspired function and parse your first YAML document.
06:34
This is sweetpotato.py
. As I said, not a lot of code here. PyYAML’s module is called yaml
, which I’m importing here on line 2. As I want a pretty-print a Python object to the screen, I’m going to need the pprint()
function from the pprint
module.
06:51
I’m not a huge fan of this. In fact, in my own code, I almost always convert to JSON and then use the json
library to pretty-print, but that doesn’t work for all Python objects, so you’re stuck with the weird indentation that the core developers have defined arbitrarily as pretty. Beholder, see beauty, eye of. Anyhow, the show_spud()
function takes the name of a file and opens it.
07:15
Line 6 is a context manager to open the file. Note that it uses read-binary ("rb"
) mode. That has to do with the fact that YAML, although text, can use UTF encodings that Python doesn’t handle, so the file is opened in binary.
07:29 The PyYAML module does handle these, though. More on this later.
07:34
Line 7 is the key thing. This is what you came here for: the PyYAML safe_load()
function. I pass it the file handle, and it returns a Python object based on the YAML file.
07:46
And finally, I use pprint()
to make it prettier. Not pretty, but prettier. This function’s going to get used over and over in this course. You don’t have to memorize it.
07:56
Just understand that anytime you see show_spud()
that it takes a filename and prints out a Python object composed from the YAML in that file. Got it?
08:05 Good. Let’s go try this out.
08:11
The top window here has a really simple YAML file inside of it. Notice the hierarchical structure. Each of the yellow labels is the key in a key-value pair. In some cases, the value is the key-value pair below it, and in the case of the name
keys, the value is a string. This results in nested dictionaries in Python.
08:31 Let’s parse this baby. Importing my sweet, sweet potato …
08:41
and calling show_spud()
on this file.
08:49
As I said, nested dictionaries. The whole document is represented as a dict
, with the first key-value pair being 'grandparent'
and a nested dict.
08:59
That nested dict has one key-value pair, the key 'parent'
and another nested dict. Going all Russian dolls here. Inside of 'parent'
, you get two-key value pairs, one called 'child'
, the other called 'sibling'
.
09:13
Each of these have dictionaries with names and strings inside of them. See what I mean about prettiness? Why can’t pprint()
display this the way Python Black would if this were code?
09:24 All right, I’ll get off that topic now.
09:29 There you go. You’ve parsed your first YAML document. In the next lesson, I’ll show you all the different data types in YAML and how they look in a doc.
09:37 If I’m feeling creative, there may be even another yam-based pun. That’d be sweet. All right, I’ll stop now.
Become a Member to join the conversation.