Python Notes

The pickle module

The pickle module provides functions and classes for serializing and deserializing objects. In programming, serialization means the conversion of any data into a set of bytes, which is then usually saved to a file or transmitted over the network. Deserialization is the restoration of objects from their byte representations.

Often, serialization is used to save user data between different sessions of an application, usually a game. A simpler example - you work interactively and create a list or dictionary that you want to use next time. To do this, you use the dump() function of the pickle module to save the object to a file, and with the help of load() you will restore it the next time.

>>> a = [1, 10, 0, -3, 9]
>>> b = {'a': 1, 'b': 2}
>>> import pickle
>>> f = open('data', 'wb')
>>> pickle.dump(a, f)
>>> pickle.dump(b, f)
>>> f.close()
>>> del a
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> f = open('data', 'rb')
>>> a = pickle.load(f)
>>> a
[1, 10, 0, -3, 9]
>>> c = pickle.load(f)
>>> c
{'a': 1, 'b': 2}
>>> c == b
True
>>> c is b
False

In Python, most objects can be serialized. However, not all. In addition, you should be careful when serializing objects created from your own classes. Only the object is serialized, not the class. When deserializing, the class is imported. Therefore, you should not define classes in the same file where objects will be created and serialized.

In addition to the dump() and load() methods, there are similar dumps() and loads() that work without files. A set of bytes is used in the program.

>>> a = [1, 10, 0, -3, 9]
>>> ab = pickle.dumps(a)
>>> ab
b'\x80\x03]q\x00(K\x01K\nK\x00J\xfd\xff\xff\xffK\te.'
>>> b = pickle.loads(ab)
>>> b
[1, 10, 0, -3, 9]
>>> type(ab)
<class 'bytes'>

It is better to use the pickle only in python applications. When exchanging data between different programming languages, the json module is usually used. Also, the pickle does not solve the issue of data security. Therefore, one should not deserialize data from unknown sources.

Earlier versions of Python may not understand the serialization protocol of later versions. For compatibility, you can pass the protocol number.

>>> pickle.DEFAULT_PROTOCOL
3
>>> pickle.HIGHEST_PROTOCOL
4
>>> a = [1, 2, 3]
>>> ab = pickle.dumps(a, 2)
>>> b = pickle.loads(ab)
>>> b
[1, 2, 3]