Class Internals
00:00
In the previous lesson, I introduced you to multiple inheritance. In this lesson, I’ll show you different bits about class internals and how you can use them to make more powerful code. In part one of this multi-part course, I showed you how the @property
and @.setter
decorators work to let you write code that makes methods behave like attributes.
00:20 This is part of something called the descriptor protocol. A protocol in Python is a loose collection of functions—or in this case, methods—that act as a promise.
00:30 If you implement their interface, Python will give you some functionality. Many things that seem like magic are actually implemented as protocols underneath, meaning you can write your own classes that behave the same way.
00:43 For example, both iteration and sequence functionality are built on protocols. The descriptor protocol is the one that makes methods act like attributes.
00:53
The @property
and @.setter
decorators are the simpler version of the descriptor protocol. They abstract away a deeper mechanism that uses dunder methods.
01:02
.__set_name__()
, .__get__()
, and .set()
are all used to implement the same features those decorators do, and you can override these methods and define your own behavior.
01:13 Lets go look at an example.
01:17 If you ever needed an integer that had a more positive outlook on life, well, let’s build one. One that doesn’t allow negative values. They clog up your aura.
01:27
You can write a class that abstracts a positive integer using the descriptor protocol. First off, .__set_name__()
is called when a descriptor is instantiated, and it’s passed the name of the variable it is being assigned to. For example, if you read a line of code that says radius = PositiveInteger
, instantiating a PositiveInteger
object and storing it in a reference called radius
, the name radius
gets passed into .__set_name__()
.
01:54
What you typically do with that value is store it for later use. I’m sticking it in ._name
, logically enough.
02:03
The core of what you do with the descriptor protocol is get and set values. .__get__()
is the get part. This method is past the object that the descriptor is attached to and the class of the descriptor itself.
02:16
That’d be the radius
and the Circle
class in the example I just mentioned. I’m not really sure why they bother passing in the class as you can always get it through self.__class__
, but it’s part of the protocol, so it has to be in the signature.
02:32
All the getter for PositiveInteger
needs to do is return the associated value. The value is on the instance object under the name associated with this descriptor.
02:43
.__dict__
is the dictionary associated with all Python objects where Python stores the objects attributes. It feels like there’s a lot going on in this line, but all it’s doing is getting the value from the associated instance object.
02:59
There are situations where this won’t work, but I’ll come back to that edge case in a later lesson. All right, you’ve got the getter. Now the setter. This signature takes the instance object associated with the value and the new value to set it to. The purpose of PositiveInteger
is to make sure that it can only store positive integer. This line enforces that.
03:23
Once the error checking’s out of the way, now I need to actually assign the value. This uses the same .__dict__
mechanism as the getter, but this time assigning the value to that dictionary.
03:36 Let me scroll down, and you’ll see how this gets used.
03:40
Circles are, so yesterday. Let’s define an Ellipse
. The ellipse has two radius-like things: a width and a height. Mathematically, there are a half dozen ways to specify an ellipse, but width and height are commonly used in drawing libraries, so let’s stick with that. Instead of specifying focal points, I want both my width and my height to be positive integers, so I use our newly minted class.
04:05
This is a little tricky, .width
and .height
here are class attributes, which means the instantiation of PositiveInteger
is associated with the Ellipse
’s class, not the objects.
04:18
This is how you register a descriptor. Inside Ellipse
’s .__init__()
, When I go to assign a value, the descriptor on the class is getting invoked, but because the descriptor’s code actually stores the contents on the object, the end result is a value associated with the object, not the class.
04:37 My brain hurts a little when I think about this. You don’t really need to understand this. When you use descriptors, you can treat it like magic. If you build a descriptor and assign it in the class, the end result is stored values with the same name on the object.
04:50 It isn’t magic though. There’s no extra special stuff going on here. From the interpreter’s point of view, it’s a class attribute that references a particular kind of instance object, which implements the descriptor, which when assigned to passes the value through the descriptor protocol to the owning instance object. Does it spoil the magic to know how the trick works?
05:13 Alright, magic or not, let’s see it in practice. Importing …
05:21
and I’ll create an ellipse
.
05:25 I’m gonna scroll the top window back up so that you can see the descriptor class. Since I put print statements in the descriptor, you can sort of see what’s going on.
05:35
The .__init__()
for Ellipse
assigns .width
and .height
. That assignment triggers .__set_name__()
for PositiveInteger
because it’s registered as a descriptor, as a class attribute. This gets called twice, once for .width
and once for .height
. Now, when I access the .width
attribute,
05:55
the print()
in the getter is issued and returns the value.
06:01
Same goes for .height
, and if I try to assign .height
,
06:08 I get the error checking in the setter enforcing our positive outlook. No negativity for you.
06:18
Dictionaries aren’t free. They take up more memory than just their contents. The attributes on an object are stored as a dictionary. By default, That dictionary is named .__dict__
.
06:31
If you don’t want the overhead cost of a dictionary associated with your class, you can slim it down using the special attribute .__slots__
.
06:40
What you put in .__slots__
is a tuple with the names of the attributes your class uses. Technically, it can be a list instead of a tuple, but that has extra overhead as well, and you’re never going to edit the contents. When .__slots__
is present, the underlying .__dict__
gets removed.
06:58
If you attempt to use .__dict__
, you’ll get an AttributeError
because it isn’t there, and of course, without .__dict__
, there is nowhere to put new attributes.
07:08
Attempting to add an attribute that isn’t in .__slots__
will also result in an error.
07:16
Consider this class containing an "x"
and a "y"
coordinate. By putting "x"
and "y"
in .__slots__
, I get rid of the underlying .__dict__
.
07:25 This saves a bit of memory, a little over 400 bytes per object. Remember earlier when I said I’ll save that for later in the lesson? Well, it’s later in the lesson.
07:36
The PositiveInteger
descriptor class worked by assuming the instance object it was associated with had a .__dict__
. If you use .__slots__
, that assumption isn’t valid.
07:48 I played around with this for a while trying to figure out if I could get the two things to work together. I found some code on the internet that showed a way to make it work, but it had two problems.
07:57
Problem one is it resulted in a new .__dict__
being created, which defeats the purpose of .__slots__
, and problem two was I couldn’t get it to work in Python 3.
08:06 Python 3 seems to be a bit more aggressive about the constraint. In theory, you could write a descriptor class that stored values in the descriptor class instead of on the instance object. As there can be multiple instance objects, you’d need to store that value in a nested dictionary with the top level keyed on the instance and the second level storing the values for that instance.
08:29
I haven’t tried this, but there’s no reason it wouldn’t work that I can think of. But if you get what I’m laying down, the short version of it is, I just said, reimplement .__dict__
, but somewhere else.
08:40
You’ll essentially be losing almost all the gains of .__slots__
. Not quite all, but close enough. The short version is don’t mix descriptors and slots.
08:49 It’s a recipe for disaster.
08:53 I hope you’re not tired of dunder stuff. More to come in the next lesson.
Ash D on Dec. 9, 2023
Updating to my previous comment (I can’t see an edit button): after watching further, then going back and re-watching 2:00-4:00 several times, I can mostly understand how the PositiveInteger methods are working. What I’m still struggling with is the explanation between 4:00 and 5:00. I think this lesson needs to be reworked a bit.
Christopher Trudeau RP Team on Dec. 9, 2023
Yep Ash,
As I attempted to say in the video, this isn’t easy stuff. Let me take one more stab at it.
Let’s start with two bits of terminology: first is the “class” which is the declaration, and second is the “object” which is an instance. When I write a method that belongs to a class, it is part of the declaration, whereas when I assign a value using “self”, it is adding that value to the object instance.
In the Ellipse example, the first use of “width” and “height” are part of the class declaration. So Ellipse gets two class attributes which themselves are instances of PostiveInteger objects.
An object instance has access to all of the class properties. Although “width” is declared at the class level, an instance of the Ellipse object can access it as well. If I had an instance of Ellipse called e
, then e.width
would be accessing “width” which was declared as a class attribute.
Inside __init__
, that is what is happening. The line saying self.width = width
is assigning the method’s “width” argument to self.width
. In a normal case, that would overwrite the value contained in self.width
. But, in this case, because self.width
is an instance of a descriptor, instead of overwriting the value, the descriptors __set__
method is called.
The descriptor has implemented storing the value, so future access to self.width
still gives you whatever was passed in as an argument to __init__
, but it has side effects. The side effects check that the value being set is positive and raise ValueError
if the validation fails.
Accessing self.width
also invokes the descriptor. On access of self.width
, instead of the object directly giving you the value, it uses the associated descriptor, calling its __get__
method. As the implementation of __get__
simply returns the stored value, the caller sees no difference in the behavior than if this wasn’t a descriptor.
Descriptors are tricky things and honestly you don’t come across them very frequently in Python. If you’re new to OO, you can accomplish a lot without fully groking descriptors, coming back to them later when you’ve had some practice.
I hope my second attempt at an explanation has made it better, rather than worse :)
Ash D on Dec. 11, 2023
Hi Christopher, thanks for writing such a thorough response. I think I’ve improved my understanding of this, and narrowed down where I think I’m still hung up. Here’s how I’m currently thinking:
The way that ‘width’ and ‘height’ are declared at the Ellipse class level and at the instance level gets tangled up in my understanding (from part 1) of class vs. instance attributes. Eg. declaring self.width in __init__
whilst there is also a ‘width’ class attribute, would surely result in an unrelated but identically named instance variable being created. I initially couldn’t see how you could create a new Ellipse instance - which results in __init__
setting the instance attributes width and height to the passed width and height arguments - when width and height are also class attributes (bear with me, I think I partly answer this further down). Here’s a simple non-descriptor example of what I’m getting at:
class Thing:
size = 5
def __init__(self, size):
self.size = size
thing = Thing(10)
>>> Thing.size
5
>>> thing.size
10
In terms of line-by-line execution, class attribute ‘size’ is initialised as an int object with a value of 5, which any instances of Thing can access. Then later if/when a new Thing object is instantiated and __init__
is executed, its instance attribute ‘size’ is initialised to whatever ‘size’ argument was passed during instantiation, in this case 10. At that point, Thing.size and thing.size are coincidentally-named but totally separate attributes, right?
I then wonder: How does the Ellipse example not result in the class attributes (which are PositiveInteger objects) being overwritten by their same-named instance attribute counterparts (which are probably integers)? I think I figured out the answer:
In the Ellipse example, by the time that __init__
is called, due to the two class variables the interpreter already knows about PositiveInteger and its descriptoryness, so when anything (including __init__
) tries to set self.size to something, the interpreter will call the descriptor’s __set__
method instead. But part of me still struggles with this.
What type is e.width - is it a PositiveInteger instance? (with an e.__dict__[width]
attribute that’s an integer value 50?).
Christopher Trudeau RP Team on Dec. 11, 2023
Hi Ash,
Your example is actually different from the descriptor example.
In your example, what is happening underneath when you assign self.width
is there is no descriptor, so the instance variable overrides the class variable, and a new instance level attribute gets created. This is specific to the instance:
>>> class Thing:
... width = 5
...
>>> Thing.width
5
>>> t = Thing()
>>> t.width
5
>>> t.width = 10
>>> t.width
10
>>> Thing.width
5
>>> u = Thing()
>>> u.width
5
The override of t.width
only effects t
, not changing Thing
or u
.
In the descriptor case, when you attempt to assign self.width
the override hasn’t happened yet. The descriptor gets triggered instead. The mechanism in the language that decides what to do when you say self.width = X
, is the same mechanism that triggers the descriptor. When there isn’t a descriptor, that mechanism overrides the value. When there is a descriptor, it calls the descriptor instead.
I’m not 100% sure how they’ve implemented it underneath, but you can almost think of it as there always being a descriptor, and the default descriptor’s behaviour is to overload the attribute in the instance.
Instance values actually are stored in a dictionary-like thing underneath, so you could think of it as a default descriptor’s job to replace the value in the dictionary.
To be clear, I haven’t played with the actual underlying C-code in Python, so it may not be built this way, but it isn’t a bad mental model.
Ash D on Dec. 11, 2023
Yep, that makes sense - your “In the descriptor case…” paragraph confirms where I’d started to get to, in terms of assumptions about how it must be working - so it’s great to get confirmation. It’s very rewarding to finally get it - thanks for your patience.
So this would mean that e.width is actually an int object, right?
Of course I had to go and confirm this, and found it interesting how deeply hooked-in to the language the descriptor protocol is - there’s no way around the descriptor’s __get__
method being called:
>>>type(e.width)
Getting width # nice try
<class 'int'>
>>>e.width.__class__
Getting width # but no cigar
<class 'int'>
>>>dir(e.width)
Getting width # resistance is futile
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'as_integer_ratio', 'bit_count', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
Become a Member to join the conversation.
Ash D on Dec. 9, 2023
This lesson seems like it suddenly becomes way too advanced, and I’m really struggling to understand the descriptor methods, and what their point or purpose is. Some intermediate step seems to have been skipped or not explained clearly.