Using __getstate__ and __setstate__
In the last lesson, you saw how you can use
dill to extend the capabilities of
pickle. This doesn’t always work, however, as there are still some cases where
dill can’t serialize a certain data type.
When this occurs, you can sometimes exclude things from the serialization process. In this lesson, you’re going to learn how to exclude items from serialization and then reinitialize them when deserializing. When you call
pickle on an object, it looks for the
.__getstate__() dunder to determine what needs to be serialized.
In your text editor, go ahead and import
pickle and then define a new class called
foobar. You can then make a constructor, so define your
.__init__() method, which will take
self. And then set the
.a property to like
.b property to a string of
"test", and then get a little complicated and set
.c equal to a lambda expression.
But we know that there’s a problem with this lambda. To get around this, you can define a new method called
.__getstate__(). So with two underscores, type in
__getstate__(), and this will also take
self. And here you want to take the properties to be serialized and return them.
In its default behavior,
pickle is going to look at this
self.__dict__ to return all of the data that needs to be serialized. Because the
.c property is a
lambda function and cannot be serialized, you can delete that. So say
del, go to
attributes, and get rid of
'c' like so.
set this equal to
my_pickle_string is going to equal
pickle, and you’ll dump a string and pass in the
foobar instance. Now you can go ahead and deserialize it, so say
my_new_instance and set this equal to
pickle, and this time you’ll load from a string and pass in
So if you run this, try and think about what you expect to see. I’m going to save it, and then I’m going to run
python custom_pickling.py. And like you may have expected, you can see that the
'a' property and the
'b' property made it over. The
'c' property that contained the
lambda function is nowhere to be found. So while this works, you did lose a pretty significant part of your custom class, and if you don’t want this to happen, you can get around this by reinitializing the property with the
.__setstate__() dunder method. If this is present, this method is called when deserializing the object and can modify what comes out. So going back to your custom class,
Inside here, you’ll define
self.__dict__ and set this equal to
state, and now you can reinitialize that property by saying
self.c is equal to the lambda expression that you defined earlier, so
x * x.
This may seem strange with how you can’t save a
lambda function as a property of a custom class, but you can save it by reinitializing it when you deserialize it, and one way to think about this is that
pickle doesn’t know how to handle the lambda expression itself, but it can handle the instructions on how to redefine that lambda expression.
Now keep in mind, because this method is run every time this is deserialized, there are some security concerns here because you’re running code. So go ahead and add in a
print() statement here, like
'I am deserializing', and then save it and rerun it.
We’ll talk about this in a little bit more detail in a later lesson. So anyway, you should have a pretty good idea on how you can get around some of these unserializable data types by using the
.__setstate__() methods. In the next video, you’re going to see how you can compress your serialized output when using
Become a Member to join the conversation.