Using namedtuple

Writing Clean, Pythonic Code With namedtuple Christopher Trudeau 09:47

Transcript
Discussion

00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’ll show you how to create and use a named tuple. Named tuples are an extension to the built-in tuple type.

00:10 Everything you can do with the base data type can be done with the named variant. In addition to that, a named tuple treats its members as attributes, giving you the ability to name the parts of the tuple and access them using dot notation.

00:24 To create a named tuple, you use a factory function. This factory is found inside of the collections module and is called, understandably enough, namedtuple.

00:34 The factory takes two required arguments. The first is a string giving the factory the name the created class will use, and the second is a specifier for the names of the attributes in the tuple.

00:46 This argument has several different formats. You can give it a string where each field name is separated by a space, a string where each field name is separated by commas, or any kind of iterable where each item in the iterable is the name of a field. For example, you can give it a tuple or a list with the attribute names inside of it.

01:07 Let’s go play with named tuples in the REPL. Consider a regular old tuple. Here I’ve created a tuple that represents a point in a grid where the X value is 2, and the Y value is 4. When I show the contents of the tuple, I see the 2 and 4 in parentheses.

01:29 I can access the pieces of the tuple using index notation. With the square brackets here, I’ve accessed the first item in the tuple. Like all iterables in Python, tuples are zero indexed, so the first position is denoted as 0. Tuples are immutable.

01:45 Trying to change part of the tuple results in an error. You’ve probably seen all this before. Let’s take a look at the named variation.

02:00 Named tuples are found in the collections module and created with a factory. Here I’ve imported the factory function. Now I’ll create a named tuple class.

02:09 Note this is creating the class, not an instance object.

02:17 The first argument tells the factory that the underlying class will be named capital-P Point. The second argument tells the factory that I want two attributes in the class, the first named .x and the second .y. Note that I’ve named the class using a capital letter to be consistent with Python style guidelines about class declaration.

02:36 If I examine the class in the REPL, it shows me that this is a class in the __main__ module with the name Point. With the class in place, I can create an instance object the same way I would with any other class.

02:52 I’ve overwritten my regular tuple, also named point, with a named tuple instead. Looking at the value in the REPL … and it looks like a class instance, which it is.

03:06 The result of the type() function is the same as the class a few lines above. It is still a tuple though. The whole point of a named tuple is you can access the attributes of the tuple using the names in the factory.

03:22 There’s the first value, from the field called .x, and the second value in a field called .y. And since it is a tuple, you can still use positional access as well.

03:39 Like I said, it’s still a tuple, and like a tuple, it is immutable. Instead of a TypeError, you get an AttributeError, but either way, setting a value isn’t allowed. There’s a subtle thing here.

03:50 The named tuple itself is immutable, but it can store mutable objects. This can feel a little weird. Let me show you a couple of examples of what I mean.

04:00 I’m going to create a new named tuple that represents the name of a person and the name of their kids.

04:11 Capital-P Person is now a class with two attributes, .name and .kids. I intend to store a list of names in the .kids value, and that’s where things get a little funky. Kids can do that to you.

04:28 Meet john. John is my first instance of Person. John has two kids, Sally and Tim. Since this is a named tuple, I can get at John’s name through the .name attribute.

04:41 I should say name at least one more time in that sentence. This attribute is immutable. I’m not allowed to change it. John is John. If I want Bob, I need a new tuple. Likewise if I want to change the kids.

05:03 .kids are an attribute, just like .name, and trying to assign it to something else has the same problem as the name, but .kids is a list, which is a mutable object.

05:18 That means I can do things to it. As long as I don’t attempt to change what .kids is referencing, I’m good. John can have more kids. In fact, anything you do to a list you can do to John’s kids, including removing all the items.

05:34 You just can’t point the .kids attribute at a new list. This can be a bit confusing in practice. I generally try to avoid it when I’m coding. Although it is legal, it feels counterintuitive to me and might be surprising to someone who is new to named tuples. Before moving on, let’s spend a bit more time with the factory and learn the different ways you can specify the attributes of a named tuple.

06:03 You’ve seen this before. I’ve created a point with .x and .y.

06:10 And there’s an instance. Instead of using a string with spaces to specify the attributes, you can use an iterable.

06:26 This results in the same thing. I kind of like this version. It feels clearer to me. If you like the string style but prefer to use comma-separated values,

06:41 you can do that instead. This variant can be handy if you’re processing CSV files whose first line is the column names in the data. I’ll be playing with CSV files in a later lesson in the course.

07:06 The list example I showed you before is actually a specific of the general case. Anything iterable will work. Here I’m using the for loop to iterate on the string.

07:16 Iterating on a string returns the letters, so I get two fields, one named x and the other y. This is the least clear of the variations, but it shows that if you can provide an iterable of attribute names, you can create a named tuple.

07:30 I’ll be doing that in the CSV example I was talking about later. There are some restrictions on attribute names.

07:44 You can’t name an attribute anything starting with an underscore (_). This is a safety feature. There are utility methods and fields on the class that might cause name clashes, so to keep the attributes distinct from the utilities, all the utilities begin with an underscore.

07:59 I’ll show you these a little bit later.

08:08 You also can’t name any of the attributes a Python keyword. This restriction actually applies to classes in general. namedtuple here is just keeping you compliant with Python syntax.

08:19 The final thing I’m going to show you in this lesson is a few variations on how to name your arguments when creating an instance. Let me just reestablish our go-to Point class here.

08:33 A little deja vu to remind you one way of creating an instance. Here I used position to specify the arguments. Like any other class constructor, you can name it explicitly instead.

08:49 The results are the same as before. Because of this mechanism, you can also use the **kwargs mechanism to create instances as well.

09:04 I’ve created a dictionary here with the keys being the names of the arguments to Point and the values being my data. I can then use the **kwargs mechanism to create an instance based on the dictionary arguments.

09:20 And I get the same kind of data as a result. This particular mechanism can be handy if you’re dealing with data from outside sources, like parsing some JSON.

09:34 I’ve shown you how to create namedtuple classes using the factory found in collections. The factory has optional arguments that gave you even more control over your class. In the next lesson, I’ll show you those.

Become a Member to join the conversation.