Advanced YAML Syntax

YAML: Python's Missing Battery Christopher Trudeau 09:10

Transcript
Discussion

00:00 In the previous lesson, I showed you the data types supported by YAML. In this lesson, I’ll cover some of the more advanced syntax features, like inheritance.

00:10 Remember early on when I put an asterisk beside the readability aspect of YAML? Well, this is where you’ll be learning some features that, if used, may impact that readability. That’s not to say you shouldn’t use them, but that your file will go from a mostly self-explanatory hierarchy of key-value pairs to one requiring the reader to know the more esoteric YAML features.

00:31 YAML supports reuse inside of the YAML file. An anchor is a way to declare that something can be reused, and you define an anchor with a name and an ampersand (&).

00:43 You reference an anchored segment using an alias. Aliases use the name you created with the anchor, but an asterisk instead. For you C programmers in the audience, yep, they use pointer semantics. I’ll open the REPL and show you an example.

01:01 The top window contains a YAML file. In the first child chunk here, I have declared an anchor named push-up by placing an ampersand before the name. I then do that again for the second sequence of muscles, naming it squat, and then again for the third group—you get the idea. Let me scroll down here.

01:20 I reference the anchors by using the same name but the asterisk symbol instead. The monday hash will now contain the push-up and squat segments.

01:29 Let’s see the corresponding Python. Importing sweet, sweet potato … and reading the file.

01:41 Notice that in the schedule dict, the nested monday dict contains a list of lists. The nested lists are those that were declared using the anchors.

01:50 The same goes for tuesday and wednesday as well. As you might have guessed by my warning at the top of the lesson, I’m not sure how I feel about this feature. On one hand, it reduces typing and possibly errors by removing repetition. On the other hand, it feels like you’ve gone from an understandable text format to a programming language. A strong use of this kind of feature is in DevOps configuration.

02:12 You often end up with a fair amount of repetition. If you’re configuring multiple S3 buckets, being able to inherit from your base configuration means each of the buckets will be consistently set up, so there are places where you might want to take advantage of this.

02:31 YAML allows you to anchor and alias within the same node, making a recursive structure. In theory, this can be a powerful mechanism. In practice, a lot of parsers don’t deal with it well, and it can blow things up. Again, back to the “moving from text to code” sort of argument I made before, you should be careful with this.

02:52 In addition to aliases, you can reuse attributes through a merge. Merges are a different way of reusing anchored content. Instead of the asterisk on its own, you use a << (double-less-than). An alias puts the whole chunk as a value in your doc, whereas a merge allows the reuse of key-value pairs as a set.

03:13 An example will help you see the difference.

03:17 The YAML file in the top window here contains both aliases and merges. The location and person hashes have been marked with anchors. Inside of the student hash, I use an alias for the person, meaning the value that goes with the name is the entire chunk of data.

03:34 The << doesn’t assign location, but uses the attributes from location in place in the student. Let’s look at the Python to see what I mean. Importing …

03:50 And there’s the file. Look at the student dict. The name attribute points to a nested dict containing the values from person.

03:59 There is no location dict though. Instead, the two attributes from location get merged into the student. This gives you two different ways of reusing YAML in your YAML file.

04:15 YAML supports multiline text. Indentation is what indicates continuation. This means that a newline in your YAML isn’t necessarily a new line in your data, so to insert an actual new line, you use a blank line.

04:31 Alternatively, you can use the pipe operator (|) to do something like a verbatim block in your text. This can be useful for embedding scripts or other content where the new lines are significant.

04:43 You can also use the > (greater-than) operator to change how the indentation processing works. In this case, a double-indent also means a new line.

04:52 Let’s go play with some multiline text.

04:57 Like before, YAML in the top. There are three chunks of text in the doc, all of which do multiline differently. The first one, which I’ve named multi-line, uses indentation as continuation.

05:09 The value of multi-line will result in two lines in Python. The blank line between the text is what differentiates the first line from the second. The script value uses the pipe operator.

05:22 Everything indented underneath the script is converted as shown. As the name of this example implies, this often gets used for embedding scripts in YAML.

05:31 This happens a lot in DevOps stuff and might be why YAML is so popular in that space. Doing the same thing in JSON could be painful. Let me scroll down a bit. The last value here is the folded value.

05:45 The > symbol changes how continuation works. Now either a blank line or another indent will be treated as a new line. I’m not sure why this one exists, seeing as you can do the same with a blank line, but it does. Let’s see the resulting Python. Import … reading it …

06:08 Let me just scroll back up to the top here and do the same for the code. When you look at the multi-line key in the Python, you’ll notice that it has one newline character inside of it that’s caused by the blank line in the multiline YAML.

06:24 The script value ends up exactly like the YAML content, which is, like I said before, useful if you’re trying to put scripts in sort of verbatim.

06:33 Let me just scroll this down … and the top … and here you can see how both a second indent and a blank line results in a newline. Note that the indented line is indented in the result.

06:47 There’s white space in front of it, which really just brings me back to the question of why you’d want to do this. Personally, I would stick with the multiline style at the top.

06:57 In addition to bringing you overly opinionated Python courses, I’m also one of the folks who curates the PyCoders newsletter. The tool we use is YAML-based.

07:06 The generator and the publisher are two different tools, so part of the process is copying and pasting the YAML from one to the other. I’d say about a third of the time, I miss a space when I grab the doc to copy, and that results in bad YAML being pasted.

07:21 I am aware of the irony of a Python programmer complaining about white space being important, but this can be problematic if you’re doing a lot of work with YAML.

07:34 YAML supports nesting multiple documents in a single file. To do this, you separate the docs with a triple dash (---). You can optionally mark the end of a doc using triple dots (...).

07:45 When dealing with these kinds of files in PyYAML, you use safe_load_all() instead of safe_load(). This function returns an iterator of the documents instead of the object itself.

07:56 Let’s go look at an example.

08:00 The YAML on the top window is one file, but three documents. The three are separated by the triple dashes. Notice the second document uses the optional triple dots to indicate the end.

08:12 I can’t use show_spud() this time, so let me write some Python using the safe_load_all() function.

08:32 Instead of getting a single result back, the safe_load_all() gives a document iterator, which I’m handling inside of a for loop. For each of those documents,

08:44 I’ll print it out. I’m going to skip the pretty-printing, just going to spit it out quickly … And there are the results. Three docs means three different dicts.

08:58 So far you’ve been using the safe_load() function to parse YAML. This is actually a shortcut. PyYAML has multiple ways of loading and writing YAML, and in the next lesson, I’ll show you the differences.

Become a Member to join the conversation.