Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Data Classes, NamedTuple, and Structs

Christopher Trudeau

Records and Sets: Selecting the Ideal Data Structure Christopher Trudeau 10:14

Transcript
Discussion (2)

00:00 In the previous lesson, I talked about using dictionaries, tuples, and classes to represent your records. In this lesson, I’m going to talk about the data class shortcut, the NamedTuple object, and structs.

00:14 Python 3.7 introduced a new library called dataclasses that contains the dataclass object. This is a new way of writing class code that requires less boilerplate than I showed you in the previous lesson. It’s particularly handy if you’re trying to create plain old data objects that don’t need methods to go along with them.

00:33 It also provides a better default .__repr__() method than regular classes, so you don’t necessarily have to override this one either. Once again, less boilerplate to write. And finally, it supports type annotations, so if you’re using a type checker that supports type annotations, you can get a higher degree of type safety.

00:53 To create a data class, you need the @dataclass decorator. You get that by importing it from the dataclasses library.

01:05 Now use the decorator to decorate a class, and you don’t have to write as much code.

01:18 And that’s it! I’ve created a data object called Plane. It has two fields, one for color and one for the number of engines. And because it’s a data class, I don’t have to create the constructor and I’ve got annotations on what types are expected in each one of those fields.

01:34 I create a new instance of this kind of object the same way I would with a class, giving the fields to the constructor in the same order as the declaration.

01:45 I can examine the f16 and see the built-in repr. That’s much better than the obscure at, __main__, equals 0x hex something or other. That’s kind of useful. It tells you the fields, their contents, and it gives you the object. Because this is an object, I can get at the fields individually.

02:06 I can change my mind about how to spell the word 'grey' and directly manipulate the field… and that’s not a problem. And just like classes in the previous lesson, I can also create new fields on the fly.

02:23 The F-16 had a max speed of Mach 1.6. Although the format of data classes lends itself to using type annotations, do note that this isn’t enforced by Python itself.

02:36 You have to use an extra tool like Mypy to enforce this kind of thing as part of a pre-compilation step. Let me show you how this can be broken.

02:50 I have successfully passed in the string "one" instead of the number 1 for the number of engines inside of the MiG-23. Essentially, the int hint is completely ignored.

03:05 This is no different than the typical class constructor if you aren’t using the casting trick that I showed you in the previous lesson. You may be familiar with the namedtuple() factory method inside of the collections library. Well, Python 3.6 added another way of using named tuples, and that’s inside of the typing library, it created the NamedTuple object.

03:29 The end result is essentially the same but because it’s an object, you can use inheritance instead of the namedtuple() function. This makes the code look an awful lot like the dataclass object that I just showed you. Similar to those data classes, there’s support for annotations inside of this inheritance but no enforcement from Python itself. You need a third-party tool.

03:52 Here’s an example of using the inheritance. You import the NamedTuple object, then have your class inherit from that object, specifying the fields.

04:02 Very, very similar to the idea of a data class, but this time it’s a immutable tuple instead. Another type of record structure is available in the struct library.

04:15 Python has always integrated closely with the C language, and the struct library gives you a way of converting back and forth between Python types and C values. This is particularly useful if you’re managing structured binary data or if you’re calling a Python extension.

04:31 The Struct object defines what types are being used based on a little mini language similar to that used in formatting strings. You can specify the size and alignment of the data, big-endian versus little-endian, and what kind of data is expected there—chars, shorts, unsigneds, et cetera.

04:52 To demonstrate the Struct object, I’m going to build a TCP IP packet header. First off, I need to import the Struct object from the struct library.

05:06 A TCP IP header has four fields in it. The first 16 bits is an unsigned short, which is the source port. The second 16 bits is another unsigned short, which is the destination port.

05:19 The third field is 32 bits long, an unsigned long, and is the sequence count. And the fourth field is for acknowledgement—once again, an unsigned long. Let me define a TCP header object based on this information.

05:36 The capital "H" in the string here indicates an unsigned short and the capital "L" is an unsigned long. "HHLL" is the 16-bit source, 16-bit destination, 32-bit sequence, and 32-bit acknowledgement fields.

05:53 Now I can use the .pack() method to create an object based on this structure with some binary data inside of it.

06:06 I’ve passed in four integers, each in hexadecimal format—not necessary, but a common practice when you’re dealing with binary data. And TCPHeader’s .pack() method takes this and packs it together in the format specified inside of the TCPHeader object when it was constructed. If I look at the header, you’ll see an awful lot of binary content here.

06:30 If you’re not used to byte ordering, this can be a little confusing to look at. When Python displays binary data, if that binary data maps to a valid ASCII value, then Python will show the ASCII value. 0x1234 (hex 1234) gets broken down into two segments: an 8-bit byte, the 34 first, and that 0x34 (hex 34) maps to the ASCII digit '4'. 0x12 (hex 12) is not a printable character inside of ASCII, so the 12 is shown here.

07:00 Different kinds of computer processors store data in memory in big-byte order first or little-byte order first. Because I haven’t specified otherwise, this one is doing little-byte order first, hence the 34 from the 0x1234 is packed first into this binary data, and the 12 is done secondly. Similarly, the next character is the 'P'. 0x0050 maps to the ASCII capital letter 'P', and it goes on through the rest of the data. Because I gave very small values for the third and fourth fields, there’s an awful lot of null padding inside of this data.

07:39 Corresponding to the .pack() method, there is also an .unpack() method. I can take the header object, which is the binary data, call .unpack() on it, and take a look at what’s there.

07:52 This returns a tuple with each position in the tuple corresponding to the field in the header that was defined. The Python REPL always prints out data in decimal, so 0x1234 gets converted into 4660, 0x0050 gets converted into 80, 0x1a and 0x1b into 26 and 27 respectively.

08:17 In a larger program, you might have some records in dictionaries and some records in classes. It can sometimes be frustrating to remember which is which in the object that you’re using, and this is important because the syntax of getting at the fields of a class is different than the syntax of getting at the keys in a dictionary. In order to help with this problem, Python 3.3 added the object called SimpleNamespace.

08:41 This is a dictionary where you’re allowed to use dot notation to access the keys as attributes. Like some of the other mechanisms that I’ve shown you, it also provides a decent .__repr__() method if you’re going to print it out. Let me start by importing it.

09:00 And now I’ll use the SimpleNamespace object to construct a new record.

09:12 By passing in name-value pairs inside of the constructor, I’ve told the SimpleNamespace object what kinds of attributes to create and what their values are.

09:22 Let’s take a look at what’s inside of jack_pine. I can access individual fields inside of jack_pine.

09:31 I can use assignment to change the value.

09:39 Now it’s 'light green'. And like a normal dictionary or class, I can add attributes on the fly.

09:49 Unlike the class and REPL mechanisms that I’ve shown you in previous lessons, the namespace is actually smart enough to track this new field.

10:02 You’ve had quite a tour of different types of records and ways of storing them in Python. Next up, I’ll talk about how to choose between them and give you some references if you wish to do further investigation.

fairchild on Oct. 23, 2021

A couple months ago I was working on a script that involved character encoding. As part of it, I tried teaching myself re little and big endianness. At the time I only was able to grasp just to the point where it was practical for the project I was working on at the time. Your discussion re here explained it much better than I had previously. Thank you.

Christopher Trudeau RP Team on Oct. 23, 2021

Glad you found it useful @fairchild. Funny timing on your comment, I’m working on a course on binary and bitwise operators which has a lesson on endianness right now :)

Become a Member to join the conversation.