1.2.1 Reference Counts
The reference count is important because today's computers have a finite (and often
severely limited) memory size; it counts how many different places there are that have a
reference to an object. Such a place could be another object, or a global (or static) C
variable, or a local variable in some C function. When an object's reference count becomes
zero, the object is deallocated. If it contains references to other objects, their reference
count is decremented. Those other objects may be deallocated in turn, if this decrement makes
their reference count become zero, and so on. (There's an obvious problem with objects that
reference each other here; for now, the solution is ``don't do that.'')
Reference counts are always manipulated explicitly. The normal way is to use the macro Py_INCREF()
to increment an object's reference count by one, and Py_DECREF()
to decrement it by one. The Py_DECREF() macro is considerably
more complex than the incref one, since it must check whether the reference count becomes zero
and then cause the object's deallocator to be called. The deallocator is a function pointer
contained in the object's type structure. The type-specific deallocator takes care of
decrementing the reference counts for other objects contained in the object if this is a
compound object type, such as a list, as well as performing any additional finalization that's
needed. There's no chance that the reference count can overflow; at least as many bits are
used to hold the reference count as there are distinct memory locations in virtual memory
(assuming sizeof(long) >= sizeof(char*)). Thus, the reference count increment
is a simple operation.
It is not necessary to increment an object's reference count for every local variable that
contains a pointer to an object. In theory, the object's reference count goes up by one when
the variable is made to point to it and it goes down by one when the variable goes out of
scope. However, these two cancel each other out, so at the end the reference count hasn't
changed. The only real reason to use the reference count is to prevent the object from being
deallocated as long as our variable is pointing to it. If we know that there is at least one
other reference to the object that lives at least as long as our variable, there is no need to
increment the reference count temporarily. An important situation where this arises is in
objects that are passed as arguments to C functions in an extension module that are called
from Python; the call mechanism guarantees to hold a reference to every argument for the
duration of the call.
However, a common pitfall is to extract an object from a list and hold on to it for a while
without incrementing its reference count. Some other operation might conceivably remove the
object from the list, decrementing its reference count and possible deallocating it. The real
danger is that innocent-looking operations may invoke arbitrary Python code which could do
this; there is a code path which allows control to flow back to the user from a Py_DECREF(), so almost any operation is potentially dangerous.
A safe approach is to always use the generic operations (functions whose name begins with
"PyObject_", "PyNumber_",
"PySequence_" or "PyMapping_").
These operations always increment the reference count of the object they return. This leaves
the caller with the responsibility to call Py_DECREF() when they
are done with the result; this soon becomes second nature.
|