Using getstate and setstate

Serializing Objects With the Python pickle Module Joe Tatusko 06:01

00:00 In the last lesson, you saw how you can use dill to extend the capabilities of pickle. This doesn’t always work, however, as there are still some cases where dill can’t serialize a certain data type.

00:12 When this occurs, you can sometimes exclude things from the serialization process. In this lesson, you’re going to learn how to exclude items from serialization and then reinitialize them when deserializing. When you call pickle on an object, it looks for the .__getstate__() dunder to determine what needs to be serialized.

00:30 If there isn’t anything defined, it will use the default .__dict__ dunder to determine what it needs stored. To see this, I’ve created a new Python script called custom_pickling.py.

00:42 In your text editor, go ahead and import pickle and then define a new class called foobar. You can then make a constructor, so define your .__init__() method, which will take self. And then set the .a property to like 35, the .b property to a string of "test", and then get a little complicated and set .c equal to a lambda expression.

01:09 So from before, pickle shouldn’t have any trouble serializing .a or .b, because it knows how to handle integers and strings.

01:17 But we know that there’s a problem with this lambda. To get around this, you can define a new method called .__getstate__(). So with two underscores, type in __getstate__(), and this will also take self. And here you want to take the properties to be serialized and return them.

01:33 You can say something like attributes and set this equal to self, and then access this .__dict__ property, and from .__dict__ call the .copy() method.

01:43 In its default behavior, pickle is going to look at this self.__dict__ to return all of the data that needs to be serialized. Because the .c property is a lambda function and cannot be serialized, you can delete that. So say del, go to attributes, and get rid of 'c' like so.

02:02 And then now you can return attributes. To see how this works, go ahead and make an instance of the foobar class,

02:12 set this equal to foobar(). Now my_pickle_string is going to equal pickle, and you’ll dump a string and pass in the foobar instance. Now you can go ahead and deserialize it, so say my_new_instance and set this equal to pickle, and this time you’ll load from a string and pass in my_pickle_string.

02:39 And to see what this looks like, go ahead and print my_new_instance and access .__dict__ off of it. Okay! Before running this, let’s go ahead and take a look at what happened here.

02:51 You’ve defined a new custom class that contains a property that pickle cannot handle.

02:59 You then modified what gets pickled by defining the .__getstate__() method and removing the attribute that pickle can’t handle, and then you return that.

03:51 go ahead and define a new method called .__setstate__(),

03:58 which will take self and state.

04:03 Inside here, you’ll define self.__dict__ and set this equal to state, and now you can reinitialize that property by saying self.c is equal to the lambda expression that you defined earlier, so x is x * x.

04:21 So now this method will be serialized with the custom class so when you deserialize it, it’s called. So state here will return the .a and .b properties, and then you’ve re-added .c right here.

04:36 So let’s save this and see what comes out! Okay. So you can see you still have the 'a' and 'b' properties, and now you have 'c', which is representing this function over here.

04:50 This may seem strange with how you can’t save a lambda function as a property of a custom class, but you can save it by reinitializing it when you deserialize it, and one way to think about this is that pickle doesn’t know how to handle the lambda expression itself, but it can handle the instructions on how to redefine that lambda expression.

05:11 Now keep in mind, because this method is run every time this is deserialized, there are some security concerns here because you’re running code. So go ahead and add in a print() statement here, like 'I am deserializing', and then save it and rerun it.

05:28 And you’ll see that when running the script, I am deserializing printed out. So anything that gets put into the .__setstate__() method is executed by whatever is deserializing it.

05:38 We’ll talk about this in a little bit more detail in a later lesson. So anyway, you should have a pretty good idea on how you can get around some of these unserializable data types by using the .__getstate__() and .__setstate__() methods. In the next video, you’re going to see how you can compress your serialized output when using pickle.

Become a Member to join the conversation.

Using __getstate__ and __setstate__