Comparing Data Structures
00:00 In the previous lesson, I showed you some common uses of named tuples and how they can make your code easier to read. This lesson contains a big showdown: named tuples against all comers, why you might choose it in comparison to other data structures.
00:14 Let’s get ready to rumble?
00:17 Python comes with many data structures out of the box. So how do you know which to choose? Some things to consider are the principle of least surprise. This means use what other programmers would expect you to use.
00:30 If you need an ordered sequence of items, you could use an ordered dictionary, but unless you’re actually doing random access by key, that isn’t your best choice.
00:39 This principle goes far beyond data structure choice and is a good general philosophy in coding. My favorite example: don’t put side effects in your getters.
00:47 Nobody expects it, and this isn’t the Happy Birthday kind of surprise, more the murdering clown in the closet kind of surprise. So in short, do what’s expected. The entire previous lesson was on readability, so I’ve probably already convinced you that that’s important. Tuples, named or otherwise, are immutable.
01:08 Whether you need mutability is an important factor in your data structure choice. If you’re doing a lot of work on data, the memory footprint of a data structure might become important.
01:19 And likewise its speed characteristics. I’m going to use these last four characteristics to compare some data structures to named tuples. First off, dictionaries.
01:32 These two pieces of code accomplish similar things. In both cases, I get objects with named fields. So what are some of the differences that would affect your choice? Well, dictionaries are mutable, whereas named tuples are not.
01:47 Depending on what you want to do with your data, you may want either situation. Generally speaking, if you’re not going to change your objects, using an immutable class is better because if you accidentally try to change it, it will throw an exception. Instead, if you used a mutable object even though you weren’t planning on changing it, if you accidentally did, that might go undetected for a long time.
02:10 Dictionaries don’t support dot notation. I’ve never understood why. There are even classes out there allow this behavior, but they don’t. Named tuples require you to import a module, whereas dictionaries are built in. And finally, named tuples require less memory than an equivalent dictionary. For small objects, a named tuple can be less than half the size of a corresponding dict.
02:36 If you’re using a lot of them in your code, you might want to take advantage of this.
02:42 Python 3.7 added the concept of a data class. This is a restricted version of a class that contains data fields. It has a lot in common with a named tuple.
02:53
You build a data class using the @dataclass
decorator on your regular class structure. Data classes are mutable by default, but you can freeze them with an extra argument to the decorator.
03:06
Like named tuples, data classes use dot notation for attribute access. But unlike named tuples, the fields are not iterable. You could implement your own version of .__iter__()
to make it so, but that requires extra work.
03:20 Data classes generally are pretty close in size to dictionaries and so are bigger than a named tuple. And performance-wise, the cost of creation of these two kinds of objects are about equivalent.
03:35 In case having one thing named named tuple wasn’t enough, Python has another one. Sometimes I wonder whether the core maintainers get together and just giggle over this end of Python.
03:45
The NamedTuple
class found in the typing
module was added in Python 3.5, and it allows you to create a named tuple using data class–like semantics.
03:55
Instead of using a decorator like you do with a data class, you inherit from NamedTuple
, but like the data class, you specify the attributes to be found in the class. Once this is done, you can use this class as a named tuple. There’s our trusty jane
again.
04:10 She’s much younger this time around.
04:14 Although the naming is a bit confusing, it was done this way because the result actually is a named tuple. It’s just a different way of creating the class.
04:22 This means that the memory and performance are more or less equal to the factory method that you’ve been using so far in the course.
04:32 The final showdown is named tuples against the original tuple. Family squabbles are always the worst. Named tuples are tuples. You, of course, get the dot notation and the exact same memory footprint.
04:45
The performance characteristics are different, though. Creating a regular old tuple is about three times faster than the named variety. So if you’re doing a lot of data processing, you might want to stick with regular tuples. If you need to, you can always convert them to the named variety using the ._make()
class method at the point in your code where performance isn’t as pressing.
05:09 Named tuples are classes, and just like classes, they can be subclassed. I’ll show you how in the next lesson.
Become a Member to join the conversation.