Data Structures in YAML
00:00 In the previous lesson, I showed you what a YAML document looks like and how to read it into your program using PyYAML. In this chapter, I’ll go into more details about YAML docs and the data structures found within them.
The example I parsed in the previous lesson is in the block on top here. YAML is rather flexible when it comes to structure. It also allows inlining. As the structure maps to hashes (known as dicts in Python), it kind of makes sense that you can inline the structure using curly brackets (
00:30 The block on bottom here is the equivalent to the top, just using fewer lines. It even uses the same brace brackets, or curly brackets, as python, which is convenient for remembering what they mean.
00:46 Strings in YAML are, well, to be blunt, confusing. I get what they were attempting, trying to keep it simple and not require quotes, but the result has a bunch of edge cases that are hard to remember.
Strings can be unquoted, quoted using single quotes, or quoted using double quotes, but each of these behaves subtly differently. An unquoted string is considered a literal. If you put a
\n in it, it will be escaped in Python. You won’t get a newline, you’ll get
01:29 Rather than using the slash as an escape character as every language on the planet since C has, YAML decided to be different and use two single-quotes to indicate a quote. Yeah, you heard me right.
If there is two of them in a row, that isn’t the end of the string, but the single-quote character. This decision baffles me. Double-quoted strings are more C-like, or Python-like, if you want to talk that way. Inside double-quotes, a single-quote is just a single-quote. You don’t have to double them up, and a
\n actually means newline.
02:06 All these choices for strings get complicated. If you’re coming from almost any programming language, it’s going to seem messy. In a moment, I’ll show you some examples, but this is one of those areas where YAML makes me a bit uncomfortable.
For the unquoted string, the single quote is a single quote, and the
\\n. In Python, the slash is escaped rather than becoming a newline. For the single-quoted string, you see the use of the single quote to escape itself. There’s two in the YAML, but only one in the Python, and like the unquoted version, the slash in
\n is escaped.
YAML supports integers in decimal, binary, hex, and octal. Yammel 1.2 uses the
o notation for octal numbers, while YAML 1.1 uses a leading zero. In addition to integers, you can also get floats, including markers for infinity and Not a Number.
05:48 Dates can also be a little tricky. The year-month-date format is handled nicely, but adding the time can be problematic upon occasion. YAML handles a couple more variations on date and timestamps than I have here.
If you’re doing a lot of timestamp work, you’ll want to look the details up. Note that
false are all keywords in YAML, and all of these can be lower, upper, or mixed-case. YAML isn’t picky.
10 is decimal ten.
0b10 is binary, giving you
2 in decimal.
0x10 is hex, giving you
16 decimal, and
010 in YAML 1.1 is octal, giving you decimal
0o10 is YAML 1.2, so PyYAML sees this as a string.
You need to be very aware of what version your parser is using and make sure your file is using the same thing. Okay, onto some floats, using both numbers with decimal points and exponents, as well as infinity and good old
nan. I wonder how he’s doing.
Scrolling down a little more … I must have been in a morbid mood when I wrote this example.
trinity is the first test of the atomic bomb. Notice the subtle difference between
When I showed you this sample YAML document in the previous lesson, I mentioned that YAML supports sequences also known as arrays. There are two different ways of writing these, either using the Python-friendly square brackets (
) inline or by using dashes (
-), kind of like a bullet list in a document.
08:25 Both of these result in the same situation. Note that you can either put leading spaces or not in front of those dashes. The YAML documents I normally use tend to put the spaces here, and I think it’s clearer, as that list does belong to the hash being created by the key, but it does work without them.
YAML 1.1 has the additional bit of fun of supporting base-60 values. The original committee must have had some ancient Mayan members. 2012 forever! Anyhow, base-60 is denoted using a colon (
:), which can create some surprises.
22:22 is base-60, turning into
1342 in Python. Putting a leading zero, like I’ve done here, which to me looks like the way military time writes 24-hour time, becomes a string. Without the leading zero, it’s base-60. Without the leading zero but using hours, minutes, and seconds, it’s a timestamp in Python that turns into—wait for it—not a
datetime object, but an integer counting the number of seconds since midnight.
I’ll talk about some of the PyYAML-specific ones in a later lesson. Let’s look at a couple of YAML-specific tags.
!!float forces the number to be a float. Even though I didn’t put the zero here, Python will see it as
!!string forces a string. If I want
22:22 to be a string rather than base-60, this is how I do it. There’s even
Become a Member to join the conversation.