Dictionaries, Tuples, and Classes

Christopher Trudeau

Records and Sets: Selecting the Ideal Data Structure Christopher Trudeau 09:03

Transcript
Discussion (2)

00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’m going to start the first section on records, structs, and data transfer objects, starting with dictionaries, tuples, and classes.

00:14 The purpose of record data structures is to group together fields that are related. This is sometimes known as a struct or a data object. Some programming languages like C actually have struct as a keyword to do this kind of grouping. Frequently when you’re dealing with databases and ORMs, or object-relational models, these kinds of records map directly to the contents of a table in the database. This makes sense because whether you’re talking about a record or a class or a dictionary or a table in the database, you’re usually trying to group together fields that are of the same purpose.

00:49 If I’m describing a person and I need to know their first name, their last name, and their address, then there’s a table or an object named Person and it has fields for the first name, last name, and address.

01:00 All of these things can be considered records and, really, just different ways of storing those records—either temporarily in memory or permanently on disk, like in the use of a database.

01:11 A quick and convenient mechanism for creating this kind of relationship in Python is to use the built-in dict type as a dictionary, grouping fields together for a record. Consider this dictionary.

01:25 The car1 object has three fields: the color, the mileage, and a Boolean dictating whether or not it’s automatic. Fundamentally, this is a record.

01:35 There are pros and cons of using a dictionary. Because it’s a built-in type for Python, it’s actually been quite optimized and it’s a convenient way of quickly putting something together. The fields themselves are dynamic, and that can be a bit of a problem. That means there’s no type checking.

01:51 If I put in "color" : 3.14, it’ll allow it. There’s no field name checking. This in itself can also be a problem. If you want to add a field that isn’t supposed to be there, it lets you!

02:04 There’s also no way of indicating mandatory fields. I could create another dictionary for car2, forget the color, and then have my program throw an exception when I go to use the color of car2. And finally, spelling errors can cause tricky-to-find bugs. Personally, I come from one of those countries that spells color differently than what’s in that dictionary above.

02:26 It would be easy for me to introduce a bug by accidentally spelling 'colour' with a 'u' and causing the code not to find the actual 'color' field.

02:35 These kinds of bugs usually aren’t found until runtime and typically are found because of a KeyError being thrown. The dictionary isn’t the only built-in type that allows you to group fields together.

02:48 If you need an immutable record, a tuple can work. Each position in the tuple would correspond to a field in the record. You have to be careful with this, though.

02:58 Two tuples of the same length don’t necessarily have the same fields. My first tuple could have 'color' and 'number', and my second tuple could have 'number' and then 'color'—and those aren’t going to work together.

03:10 The collections library has a function that is a factory for a type of class called a namedtuple. This allows you to create tuples where each of the positions in the tuple corresponds to an actual field.

03:22 This makes attribute access cleaner and more obvious in your code. So if you need an immutable record, the namedtuple is one way of tackling this problem.

03:33 The object-oriented aspects of Python allow you to create classes to group things together. Using a class is a little more formal than a dictionary, but it still has some of the problems the dictionary has. For example, there is no way to prevent the addition of fields for a class, just like there isn’t in the dictionary.

03:49 The default .__repr__() method of a class is pretty useless and doesn’t give you very much information, so you need to override that if you’re going to write a good class, so that’s more work that you have to do if you’re going to use a class as a record. Classes do support the @property decorator, so you can create the concept of a read-only value.

04:08 So if that’s important to you, this is a feature that you can’t do with a dictionary. So there’s more control here, but a lot more work that you have to do to get it going. Typically, classes are only used if you’re going to include business logic in methods.

04:22 They don’t tend to get used as plain data objects. Let me show you an example.

04:29 car.py defines the Car class. Inside of the constructor, you can see the three fields that I intend to use: .color, ._mileage, and .automatic. Notice that I’m casting each of the fields into the type that I’m expecting to come in from the constructor.

04:47 By doing this, I’m guaranteeing a certain degree of type safety. It’s not perfect, but it’s better than nothing at all. I don’t guarantee that what you pass in in the .color field is actually a color, but I can guarantee later on that when I go to use it, it’ll be a string.

05:04 I’m doing something else with the ._mileage field as well, and that’s the leading underscore (_). Python has no concept of public or private members of a class, but by convention, anything with a leading underscore isn’t meant to be publicly exposed. By putting the underscore here, I’m indicating to other programmers that I don’t intend for others to be using this field directly.

05:27 And the reason I’ve done this is because this class supports different ways of getting at the fuel economy of the car. Inside of the .mpg property, it uses the ._mileage field directly. Inside of the .km_per_liter property, I convert mileage into kilometers per liter.

05:47 Both of these are ways of measuring fuel efficiency, and by using this underscore on ._mileage, I’m indicating to other programmers that they should be using the .mpg or .km_per_liter properties instead. Finally in this class, I’m overriding the .__repr__() method.

06:04 The default one for a class gives you very little information. Instead, I want to print out information about the class itself. The convention in Python is to have your .__repr__() method return a string that, if it were run in the REPL, would create the object that you are currently using.

06:21 I’m constructing a new Car, with the .color, ._mileage, and .automatic values of the current object.

06:30 Let me show you the use of this inside of the REPL. First, I’ll import it.

06:37 Now I’ll create a Car.

06:42 And if I examine the object, I’ll see what comes back from the .__repr__() method. There it is—our red car that gets 25 miles per gallon and is automatic.

06:54 The typecasting mechanism that I used inside of the constructor gives me a certain amount of type safety. Let me show you this in practice.

07:05 The second field is expecting a float. When I pass in a string, that string can’t be cast to a float and so I get a ValueError. Python is a dynamic language and provides no mechanism for preventing the addition of new fields.

07:19 You can do this simply by adding a field to the object.

07:27 Notice this has no effect on the repr. The repr prints out just those fields defined in .__repr__(). I can, though, get at the field.

07:41 Now let’s look at those properties. First, miles per gallon.

07:46 And second, kilometers per liter.

07:53 Just like the example I gave in the dictionaries, spelling errors can cause you problems as well. If I attempt to directly manipulate the ._mileage field

08:03 but forget the underscore, it has absolutely no effect on the .mpg. Now car1 has both ._mileage and .mileage as fields.

08:15 This is one of the downsides of Python being a dynamic language. You can shoot yourself in the foot doing these kinds of things rather easily.

08:24 Finally, to emphasize the importance of overriding that .__repr__() method, let me create a quick class to show you what you would get if I hadn’t.

08:38 There’s the class Bike.

08:42 I’ve created a Bike, and that’s the output of the default .__repr__(). Not particularly helpful.

08:52 In the next lesson, I’ll show you the data classes shortcut that allows you to remove a bunch of the boilerplate I just demonstrated, the NamedTuple object, and structs.

Alain Rouleau on March 28, 2021

Christopher, always enjoy your videos. I’m from Canada as well so I fully understand your comment about color and colour. But, and not to be picky, whenever I hear people refer to the __init__ method as the so-called “constructor” it drives me crazy.

The __init__ method is NOT a constructor. This whole notion of the __init__ method being a constructor started years ago and, no doubt, came from programmers who migrated over from Java, C++, etc. But is still perpetuated to this day. Just like those programmers who are using getters & setters in Python. Wrong! And not to state the obvious but as you showed at the end of your video there’s no need to have an __init__ method to “construct” an object. You actually constructed a “Bike” without using __init__. Talk about proof.

Plus, just the word “init” tells you all you need to know. It’s __init__ and not __construct__. Values of fields are simply “initialized” and that’s it. Btw, Python does actually have a constructor but __init__ is not it.

A better way to think of the __init__ method is to ask yourself if you want your car to be painted “Red” at the factory? Or do you want to paint it “Red” yourself later on at home? Either way the car still gets “constructed” at the factory.

I know you know this and, like I said, maybe I’m just being picky. But it can create confusion or problems for people who are new to programming. Words are important.

Christopher Trudeau RP Team on March 28, 2021

Hi Alain,

Interesting. Honestly it isn’t something I’ve ever thought about, even when I’ve used it in other languages. I’ve not really thought about the literal meaning of the term, just thought of the constructor as the place where you do initialization :)

As you don’t have to return anything from init, you’re right it isn’t one.

I’ll try to remember this in the future, but don’t make any promises. When you’ve been coding for 30 years, some habits are just ingrained.

Become a Member to join the conversation.