Data Classes, NamedTuple, and Structs
In the previous lesson, I talked about using dictionaries, tuples, and classes to represent your records. In this lesson, I’m going to talk about the data class shortcut, the
NamedTuple object, and structs.
Python 3.7 introduced a new library called
dataclasses that contains the
dataclass object. This is a new way of writing class code that requires less boilerplate than I showed you in the previous lesson. It’s particularly handy if you’re trying to create plain old data objects that don’t need methods to go along with them.
It also provides a better default
.__repr__() method than regular classes, so you don’t necessarily have to override this one either. Once again, less boilerplate to write. And finally, it supports type annotations, so if you’re using a type checker that supports type annotations, you can get a higher degree of type safety.
And that’s it! I’ve created a data object called
Plane. It has two fields, one for color and one for the number of engines. And because it’s a data class, I don’t have to create the constructor and I’ve got annotations on what types are expected in each one of those fields.
I can examine the
f16 and see the built-in repr. That’s much better than the obscure
0x hex something or other. That’s kind of useful. It tells you the fields, their contents, and it gives you the object. Because this is an object, I can get at the fields individually.
I can change my mind about how to spell the word
'grey' and directly manipulate the field… and that’s not a problem. And just like classes in the previous lesson, I can also create new fields on the fly.
This is no different than the typical class constructor if you aren’t using the casting trick that I showed you in the previous lesson. You may be familiar with the
namedtuple() factory method inside of the
collections library. Well, Python 3.6 added another way of using named tuples, and that’s inside of the
typing library, it created the
The end result is essentially the same but because it’s an object, you can use inheritance instead of the
namedtuple() function. This makes the code look an awful lot like the
dataclass object that I just showed you. Similar to those data classes, there’s support for annotations inside of this inheritance but no enforcement from Python itself. You need a third-party tool.
Python has always integrated closely with the C language, and the
struct library gives you a way of converting back and forth between Python types and C values. This is particularly useful if you’re managing structured binary data or if you’re calling a Python extension.
Struct object defines what types are being used based on a little mini language similar to that used in formatting strings. You can specify the size and alignment of the data, big-endian versus little-endian, and what kind of data is expected there—chars, shorts, unsigneds, et cetera.
05:19 The third field is 32 bits long, an unsigned long, and is the sequence count. And the fourth field is for acknowledgement—once again, an unsigned long. Let me define a TCP header object based on this information.
"H" in the string here indicates an unsigned short and the capital
"L" is an unsigned long.
"HHLL" is the 16-bit source, 16-bit destination, 32-bit sequence, and 32-bit acknowledgement fields.
I’ve passed in four integers, each in hexadecimal format—not necessary, but a common practice when you’re dealing with binary data. And
.pack() method takes this and packs it together in the format specified inside of the
TCPHeader object when it was constructed. If I look at the header, you’ll see an awful lot of binary content here.
If you’re not used to byte ordering, this can be a little confusing to look at. When Python displays binary data, if that binary data maps to a valid ASCII value, then Python will show the ASCII value.
0x1234 (hex 1234) gets broken down into two segments: an 8-bit byte, the
34 first, and that
0x34 (hex 34) maps to the ASCII digit
0x12 (hex 12) is not a printable character inside of ASCII, so the
12 is shown here.
Different kinds of computer processors store data in memory in big-byte order first or little-byte order first. Because I haven’t specified otherwise, this one is doing little-byte order first, hence the
34 from the
0x1234 is packed first into this binary data, and the
12 is done secondly. Similarly, the next character is the
0x0050 maps to the ASCII capital letter
'P', and it goes on through the rest of the data. Because I gave very small values for the third and fourth fields, there’s an awful lot of null padding inside of this data.
This returns a tuple with each position in the tuple corresponding to the field in the header that was defined. The Python REPL always prints out data in decimal, so
0x1234 gets converted into
0x0050 gets converted into
In a larger program, you might have some records in dictionaries and some records in classes. It can sometimes be frustrating to remember which is which in the object that you’re using, and this is important because the syntax of getting at the fields of a class is different than the syntax of getting at the keys in a dictionary. In order to help with this problem, Python 3.3 added the object called
This is a dictionary where you’re allowed to use dot notation to access the keys as attributes. Like some of the other mechanisms that I’ve shown you, it also provides a decent
.__repr__() method if you’re going to print it out. Let me start by importing it.
10:02 You’ve had quite a tour of different types of records and ways of storing them in Python. Next up, I’ll talk about how to choose between them and give you some references if you wish to do further investigation.
Become a Member to join the conversation.