The <format_spec> Component

Formatting Python Strings Liam Pulsifer 09:19

Transcript
Discussion

00:00 Next up, I’m going to talk about the last part of the replacement field specification, which is the format specification. And as I mentioned earlier, this is the most complex of the three parts of this replacement field, and it’s also the most flexible and the most useful when you’re actually trying to format your data in interesting ways.

00:21 You’ve already seen the three-part specification for replacement fields, <name>, <conversion>, <format_spec>—and all of these are optional, which is denoted by the brackets.

00:30 But what might be a little intimidating is that the <format_spec> itself has a detailed sub specification, including <fill>, <align>, <sign>, a couple of special characters, <width>, <group>, <prec> (precision), and <type>.

00:42 And I wouldn’t blame you if all of that is a little intimidating, because there’s a lot of different fields that you can use here. But I’m going to let you in on a little secret here, which is that pretty much no one in the Python world, no matter how accomplished, knows all of these fields backwards and forwards and knows what they all do at all times and keeps them in their brain. Really what’s important here is that you know generally what each of these does and you know a couple of useful examples for how to use them. The rest, you can always look up in the documentation.

01:11 So what I’m going to show you is I’m going to show you the table of what each of these is and what they do and go through it briefly, but then I’ll just give you a couple of nice little recipes that you can use in your code that make use of these but aren’t really a thorough usage of every possibility that they can bring to the table. So let’s take a look at these format specifiers, and I promise they’ll get less intimidating as I go along.

01:34 <fill> is pretty simple. It specifies how to fill the extra space if you have a value that doesn’t occupy the entire field width. So in this case, the field width is 6 but if you had a one-character item that you’re trying to print, it would be padded with this x character—just the literal character x.

01:51 <align> does a similar thing, so when you have values that don’t occupy the entire width, it says “How do you justify those values?” So in this case, this more than symbol (>) means right-justified, but you can do a less than symbol (<) for left-justified, and you can do a caret (^) for center-justified. So that’s pretty simple.

02:08 The <sign> controls whether a leading sign is included for numeric values. So in this case, this plus sign (+) says “Include a plus sign for positive integers or positive numeric values and a negative sign for negative numeric values.” The minus sign (-) says “Just include a minus for negative values, but nothing for positive.” And there are a couple of other options too.

02:29 This pound sign or hashtag (#) gives an alternate output form for certain types. So in this case, you can use a #x to get a hexadecimal type for numeric input.

02:43 The 0 causes values to be padded on the left with zeros instead of spaces, so it’s pretty much just a different default for <fill>. The <width> specifies the minimum width of the output—so in this case 6, and you can see, I paired that with the <fill> and the <align> characters, and that’s often something that you’ll do. The <group> character just specifies how to group three-digit clusters for numeric output.

03:06 So if I have the number 1000, then this indicates that a comma (,) should separate the 1 from the 000.

03:13 You can also use an underscore (_) .

03:15 The .<prec> (precision) specifier says “How many digits after the decimal point do I show?” As you can imagine, that’s rather useful. You can also use it for the maximum output width for string types, so you can just truncate strings as need be.

03:29 And then this final <type> specifier can represent the output in different types. This b means binary, there’s an x for hexadecimal, and there are all sorts of different specifiers that you can use to print in different types, and that can be really useful as well. Okay, so now that I’ve gone through all of those ad nauseam, let’s go into the REPL and I’ll show you a couple of useful little recipes that I like to use when I’m using these kinds of format specifiers.

03:56 Okay, so something you might want to do with string formatting is you might want to print out a group of numbers but all to the same precision, because generally when you work with some kind of scientific computation, you want to have the same precision for all of your numbers because that’s the level of precision your measurement instruments might have.

04:13 Now, this is pretty simple with these format specifiers. You can just say for num in nums: print out with a replacement field, and let’s say to 2 decimal places, floating point numbers, and then format the number according to that specification. So as you can see, that works pretty nicely, prints out each number to two decimal places as a floating point number. So even these things like 3, which is an integer, will be converted to floats and given two decimal places, even though those will just be zeros. So that’s pretty nice. Now, if I want to, maybe let’s redefine nums and say something like [1, 300, 4832] and then some more big numbers. If you have numbers like these, and you want to print them out to a console, for example—maybe you’re doing some kind of logging program or something—you generally want them to have the same width as one another.

05:09 One way to do that is to just pad on the left with zeros as needed. I’ll do a little bit of cheating here and I’ll say—okay, the longest number here is eight digits, right?

05:21 So I could say something like this. for num in nums: let’s print out

05:29 a formatted string here with a <width> of 8 padded on the left with zeros. And that should do quite nicely to format each of these numbers.

05:40 So as you can see, all of these are now the same width, even though their values are dramatically different in orders of magnitude. So you can specify, “Pad on the left with 0 and then also use a minimum width of 8,” so that each of these has at least a width of 8.

05:57 You could also pad with different characters. So I could pad with let’s do a tilde (~), you know—this little squiggly line. But if I do that, I’m going to have to use a justify character so that this knows where to actually put the tildes.

06:11 Do I put them on the right of the number, on the left of the number, et cetera. So if I print this out now, it will use right-justified tildes for this. But if I switch this to a left-justify,

06:24 take a look at what happens—then the tildes are on the right. And finally, I can actually justify it using this caret character (^), this little upwards arrow, and it will put them on both sides and it will make this as even as it possibly can be. With an even-numbered minimum width it can’t be perfect on either side because some of these just simply won’t work.

06:47 Another thing you might want to do is print out numbers in different base formats—bases like hexadecimal or octal—and luckily that’s pretty easy to do with this pound character (#).

07:00 I can say something like #x for hexadecimal—and of course, this is all in the documentation. And if I do that, I will get the hexadecimal equivalent string representation of this number here.

07:14 And you can do the same thing with octal, so an o instead of an x. And of course you can use decimal, though this won’t really be a problem if you’re—I mean, you won’t really need to do this if you’re already passing in a decimal number. But just for completion’s sake.

07:30 And you can also specify the padding, so you can make the value zero-padded, and you can specify a minimum width, just like you can with other input formats.

07:42 But realize that this is a slightly different format that’s specific to this pound sign paradigm. So you might expect, for example, to put the <width> after the x, So that’s just yet another example of how this specification is really detailed and can be really confusing if you’re not careful about using it, so remember to always browse the documentation when you have a question.

08:02 Go look for online communities because there are people who will be happy to answer your questions about this sort of thing. Now, one other thing I want to address is you might notice that I’ve shown a lot of examples with numbers, and you might wonder “Why aren’t there really so many string formatting characters?” Well, two answers to that. The first is that a lot of these will work for strings.

08:20 So the padding with tildes, for example, with some kind of minimum width and alignment—that will work perfectly fine for strings too. But the longer and more careful answer is that strings have a lot of formatting built into them already. You can call methods on strings, you can truncate them, you can join them together with other strings—you can do all sorts of stuff directly with native string libraries.

08:41 So a lot of the formatting muscle of the string .format() function comes in when you’re trying to format things like numbers or bytes or data that doesn’t have a nice, easy string representation already. And things like strings and dictionaries and so on often already do have that nice string representation.

08:59 So, there are some nice, useful cookbook examples of things you can do with this format specifier. Next, I’m going to talk about nested replacement fields, which are a way to actually dynamically generate some of these formatting options here using other Python variables.

Become a Member to join the conversation.