The pickle module has an optimized cousin called the cPickle module. As its name implies, cPickle is
written in C, so it can be up to 1000 times faster than pickle.
However it does not support subclassing of the Pickler() and Unpickler() classes, because in cPickle these
are functions, not classes. Most applications have no need for this functionality, and can
benefit from the improved performance of cPickle. Other than that, the
interfaces of the two modules are nearly identical; the common interface is described in this
manual and differences are pointed out where necessary. In the following discussions, we use
the term ``pickle'' to collectively describe the pickle and cPickle modules.
The data streams the two modules produce are guaranteed to be interchangeable.
Python has a more primitive serialization module called marshal, but in general pickle
should always be the preferred way to serialize Python objects. marshal
exists primarily to support Python's .pyc files.
The pickle module differs from marshal several significant ways:
- The pickle module keeps track of the objects it has already
serialized, so that later references to the same object won't be serialized again. marshal doesn't do this.
This has implications both for recursive objects and object sharing. Recursive objects
are objects that contain references to themselves. These are not handled by marshal, and
in fact, attempting to marshal recursive objects will crash your Python interpreter.
Object sharing happens when there are multiple references to the same object in different
places in the object hierarchy being serialized. pickle stores
such objects only once, and ensures that all other references point to the master copy.
Shared objects remain shared, which can be very important for mutable objects.
- marshal cannot be used to serialize user-defined classes and
their instances. pickle can save and restore class instances
transparently, however the class definition must be importable and live in the same module
as when the object was stored.
- The marshal serialization format is not guaranteed to be
portable across Python versions. Because its primary job in life is to support .pyc files, the Python implementers reserve the right to change the
serialization format in non-backwards compatible ways should the need arise. The pickle serialization format is guaranteed to be backwards compatible
across Python releases.
Warning: The pickle module is not intended to
be secure against erroneous or maliciously constructed data. Never unpickle data received
from an untrusted or unauthenticated source.
Note that serialization is a more primitive notion than persistence; although pickle reads and writes file objects, it does not handle the issue of
naming persistent objects, nor the (even more complicated) issue of concurrent access to
persistent objects. The pickle module can transform a complex object
into a byte stream and it can transform the byte stream into an object with the same internal
structure. Perhaps the most obvious thing to do with these byte streams is to write them onto
a file, but it is also conceivable to send them across a network or store them in a database.
The module shelve provides a simple
interface to pickle and unpickle objects on DBM-style database files.