Using __getstate__ and __setstate__
00:00
In the last lesson, you saw how you can use dill
to extend the capabilities of pickle
. This doesn’t always work, however, as there are still some cases where dill
can’t serialize a certain data type.
00:12
When this occurs, you can sometimes exclude things from the serialization process. In this lesson, you’re going to learn how to exclude items from serialization and then reinitialize them when deserializing. When you call pickle
on an object, it looks for the .__getstate__()
dunder to determine what needs to be serialized.
00:30
If there isn’t anything defined, it will use the default .__dict__
dunder to determine what it needs stored. To see this, I’ve created a new Python script called custom_pickling.py
.
00:42
In your text editor, go ahead and import pickle
and then define a new class called foobar
. You can then make a constructor, so define your .__init__()
method, which will take self
. And then set the .a
property to like 35
, the .b
property to a string of "test"
, and then get a little complicated and set .c
equal to a lambda expression.
01:09
So from before, pickle
shouldn’t have any trouble serializing .a
or .b
, because it knows how to handle integers and strings.
01:17
But we know that there’s a problem with this lambda. To get around this, you can define a new method called .__getstate__()
. So with two underscores, type in __getstate__()
, and this will also take self
. And here you want to take the properties to be serialized and return them.
01:33
You can say something like attributes
and set this equal to self
, and then access this .__dict__
property, and from .__dict__
call the .copy()
method.
01:43
In its default behavior, pickle
is going to look at this self.__dict__
to return all of the data that needs to be serialized. Because the .c
property is a lambda
function and cannot be serialized, you can delete that. So say del
, go to attributes
, and get rid of 'c'
like so.
02:02
And then now you can return attributes
. To see how this works, go ahead and make an instance of the foobar
class,
02:12
set this equal to foobar()
. Now my_pickle_string
is going to equal pickle
, and you’ll dump a string and pass in the foobar
instance. Now you can go ahead and deserialize it, so say my_new_instance
and set this equal to pickle
, and this time you’ll load from a string and pass in my_pickle_string
.
02:39
And to see what this looks like, go ahead and print my_new_instance
and access .__dict__
off of it. Okay! Before running this, let’s go ahead and take a look at what happened here.
02:51
You’ve defined a new custom class that contains a property that pickle
cannot handle.
02:59
You then modified what gets pickled by defining the .__getstate__()
method and removing the attribute that pickle
can’t handle, and then you return that.
03:09
So if you run this, try and think about what you expect to see. I’m going to save it, and then I’m going to run python custom_pickling.py
. And like you may have expected, you can see that the 'a'
property and the 'b'
property made it over. The 'c'
property that contained the lambda
function is nowhere to be found. So while this works, you did lose a pretty significant part of your custom class, and if you don’t want this to happen, you can get around this by reinitializing the property with the .__setstate__()
dunder method. If this is present, this method is called when deserializing the object and can modify what comes out. So going back to your custom class,
03:51
go ahead and define a new method called .__setstate__()
,
03:58
which will take self
and state
.
04:03
Inside here, you’ll define self.__dict__
and set this equal to state
, and now you can reinitialize that property by saying self.c
is equal to the lambda expression that you defined earlier, so x
is x * x
.
04:21
So now this method will be serialized with the custom class so when you deserialize it, it’s called. So state
here will return the .a
and .b
properties, and then you’ve re-added .c
right here.
04:36
So let’s save this and see what comes out! Okay. So you can see you still have the 'a'
and 'b'
properties, and now you have 'c'
, which is representing this function over here.
04:50
This may seem strange with how you can’t save a lambda
function as a property of a custom class, but you can save it by reinitializing it when you deserialize it, and one way to think about this is that pickle
doesn’t know how to handle the lambda expression itself, but it can handle the instructions on how to redefine that lambda expression.
05:11
Now keep in mind, because this method is run every time this is deserialized, there are some security concerns here because you’re running code. So go ahead and add in a print()
statement here, like 'I am deserializing'
, and then save it and rerun it.
05:28
And you’ll see that when running the script, I am deserializing
printed out. So anything that gets put into the .__setstate__()
method is executed by whatever is deserializing it.
05:38
We’ll talk about this in a little bit more detail in a later lesson. So anyway, you should have a pretty good idea on how you can get around some of these unserializable data types by using the .__getstate__()
and .__setstate__()
methods. In the next video, you’re going to see how you can compress your serialized output when using pickle
.
Become a Member to join the conversation.