The ZODB/ZEO Programming Guide has been moved into it's own package

parent c217a116
This directory contains Andrew Kuchling's programmer's guide to ZODB
and ZEO. It was originally taken from Andrew's project on
SourceForge. Because the original version is no longer updated, this
version is best viewed as an independent fork now.
Write section on __setstate__
Continue working on it
Suppress the full GFDL text in the PDF/PS versions
% Administration
% Importing and exporting data
% Disaster recovery/avoidance
% Security
import sys, time, os, random
import transaction
from persistent import Persistent
from ZEO import ClientStorage
import ZODB
from ZODB.POSException import ConflictError
from BTrees import OOBTree
class ChatSession(Persistent):
"""Class for a chat session.
Messages are stored in a B-tree, indexed by the time the message
was created. (Eventually we'd want to throw messages out,
add_message(message) -- add a message to the channel
new_messages() -- return new messages since the last call to
this method
def __init__(self, name):
"""Initialize new chat session.
name -- the channel's name
""" = name
# Internal attribute: _messages holds all the chat messages.
self._messages = OOBTree.OOBTree()
def new_messages(self):
"Return new messages."
# self._v_last_time is the time of the most recent message
# returned to the user of this class.
if not hasattr(self, '_v_last_time'):
self._v_last_time = 0
new = []
T = self._v_last_time
for T2, message in self._messages.items():
if T2 > T:
new.append( message )
self._v_last_time = T2
return new
def add_message(self, message):
"""Add a message to the channel.
message -- text of the message to be added
while 1:
now = time.time()
self._messages[ now ] = message
except ConflictError:
# Conflict occurred; this process should pause and
# wait for a little bit, then try again.
# No ConflictError exception raised, so break
# out of the enclosing while loop.
# end while
def get_chat_session(conn, channelname):
"""Return the chat session for a given channel, creating the session
if required."""
# We'll keep a B-tree of sessions, mapping channel names to
# session objects. The B-tree is stored at the ZODB's root under
# the key 'chat_sessions'.
root = conn.root()
if not root.has_key('chat_sessions'):
print 'Creating chat_sessions B-tree'
root['chat_sessions'] = OOBTree.OOBTree()
sessions = root['chat_sessions']
# Get a session object corresponding to the channel name, creating
# it if necessary.
if not sessions.has_key( channelname ):
print 'Creating new session:', channelname
sessions[ channelname ] = ChatSession(channelname)
session = sessions[ channelname ]
return session
if __name__ == '__main__':
if len(sys.argv) != 2:
print 'Usage: %s <channelname>' % sys.argv[0]
storage = ClientStorage.ClientStorage( ('localhost', 9672) )
db = ZODB.DB( storage )
conn =
s = session = get_chat_session(conn, sys.argv[1])
messages = ['Hi.', 'Hello', 'Me too', "I'M 3L33T!!!!"]
while 1:
# Send a random message
msg = random.choice(messages)
session.add_message( '%s: pid %i' % (msg,os.getpid() ))
# Display new messages
for msg in session.new_messages():
print msg
# Wait for a few seconds
pause = random.randint( 1, 4 )
time.sleep( pause )
% Indexing Data
% BTrees
% Full-text indexing
% What is ZODB?
% What is ZEO?
% OODBs vs. Relational DBs
% Other OODBs
This guide explains how to write Python programs that use the Z Object
Database (ZODB) and Zope Enterprise Objects (ZEO). The latest version
of the guide is always available at
\subsection{What is the ZODB?}
The ZODB is a persistence system for Python objects. Persistent
programming languages provide facilities that automatically write
objects to disk and read them in again when they're required by a
running program. By installing the ZODB, you add such facilities to
It's certainly possible to build your own system for making Python
objects persistent. The usual starting points are the \module{pickle}
module, for converting objects into a string representation, and
various database modules, such as the \module{gdbm} or \module{bsddb}
modules, that provide ways to write strings to disk and read them
back. It's straightforward to combine the \module{pickle} module and
a database module to store and retrieve objects, and in fact the
\module{shelve} module, included in Python's standard library, does
The downside is that the programmer has to explicitly manage objects,
reading an object when it's needed and writing it out to disk when the
object is no longer required. The ZODB manages objects for you,
keeping them in a cache, writing them out to disk when they are
modified, and dropping them from the cache if they haven't been used
in a while.
\subsection{OODBs vs. Relational DBs}
Another way to look at it is that the ZODB is a Python-specific
object-oriented database (OODB). Commercial object databases for C++
or Java often require that you jump through some hoops, such as using
a special preprocessor or avoiding certain data types. As we'll see,
the ZODB has some hoops of its own to jump through, but in comparison
the naturalness of the ZODB is astonishing.
Relational databases (RDBs) are far more common than OODBs.
Relational databases store information in tables; a table consists of
any number of rows, each row containing several columns of
information. (Rows are more formally called relations, which is where
the term ``relational database'' originates.)
Let's look at a concrete example. The example comes from my day job
working for the MEMS Exchange, in a greatly simplified version. The
job is to track process runs, which are lists of manufacturing steps
to be performed in a semiconductor fab. A run is owned by a
particular user, and has a name and assigned ID number. Runs consist
of a number of operations; an operation is a single step to be
performed, such as depositing something on a wafer or etching
something off it.
Operations may have parameters, which are additional information
required to perform an operation. For example, if you're depositing
something on a wafer, you need to know two things: 1) what you're
depositing, and 2) how much should be deposited. You might deposit
100 microns of silicon oxide, or 1 micron of copper.
Mapping these structures to a relational database is straightforward:
int run_id,
varchar owner,
varchar title,
int acct_num,
primary key(run_id)
CREATE TABLE operations (
int run_id,
int step_num,
varchar process_id,
PRIMARY KEY(run_id, step_num),
FOREIGN KEY(run_id) REFERENCES runs(run_id),
CREATE TABLE parameters (
int run_id,
int step_num,
varchar param_name,
varchar param_value,
PRIMARY KEY(run_id, step_num, param_name)
FOREIGN KEY(run_id, step_num)
REFERENCES operations(run_id, step_num),
In Python, you would write three classes named \class{Run},
\class{Operation}, and \class{Parameter}. I won't present code for
defining these classes, since that code is uninteresting at this
point. Each class would contain a single method to begin with, an
\method{__init__} method that assigns default values, such as 0 or
\code{None}, to each attribute of the class.
It's not difficult to write Python code that will create a \class{Run}
instance and populate it with the data from the relational tables;
with a little more effort, you can build a straightforward tool,
usually called an object-relational mapper, to do this automatically.
\url{} for a quick hack
at a Python object-relational mapper, and
for Joel Shprentz's more successful implementation of the same idea;
Unlike mine, Shprentz's system has been used for actual work.)
However, it is difficult to make an object-relational mapper
reasonably quick; a simple-minded implementation like mine is quite
slow because it has to do several queries to access all of an object's
data. Higher performance object-relational mappers cache objects to
improve performance, only performing SQL queries when they actually
need to.
That helps if you want to access run number 123 all of a sudden. But
what if you want to find all runs where a step has a parameter named
'thickness' with a value of 2.0? In the relational version, you have
two unappealing choices:
\item Write a specialized SQL query for this case: \code{SELECT run_id
FROM operations WHERE param_name = 'thickness' AND param_value = 2.0}
If such queries are common, you can end up with lots of specialized
queries. When the database tables get rearranged, all these queries
will need to be modified.
\item An object-relational mapper doesn't help much. Scanning
through the runs means that the the mapper will perform the required
SQL queries to read run \#1, and then a simple Python loop can check
whether any of its steps have the parameter you're looking for.
Repeat for run \#2, 3, and so forth. This does a vast
number of SQL queries, and therefore is incredibly slow.
An object database such as ZODB simply stores internal pointers from
object to object, so reading in a single object is much faster than
doing a bunch of SQL queries and assembling the results. Scanning all
runs, therefore, is still inefficient, but not grossly inefficient.
\subsection{What is ZEO?}
The ZODB comes with a few different classes that implement the
\class{Storage} interface. Such classes handle the job of
writing out Python objects to a physical storage medium, which can be
a disk file (the \class{FileStorage} class), a BerkeleyDB file
(\class{BDBFullStorage}), a relational database
(\class{DCOracleStorage}), or some other medium. ZEO adds
\class{ClientStorage}, a new \class{Storage} that doesn't write to
physical media but just forwards all requests across a network to a
server. The server, which is running an instance of the
\class{StorageServer} class, simply acts as a front-end for some
physical \class{Storage} class. It's a fairly simple idea, but as
we'll see later on in this document, it opens up many possibilities.
\subsection{About this guide}
The primary author of this guide works on a project which uses the
ZODB and ZEO as its primary storage technology. We use the ZODB to
store process runs and operations, a catalog of available processes,
user information, accounting information, and other data. Part of the
goal of writing this document is to make our experience more widely
available. A few times we've spent hours or even days trying to
figure out a problem, and this guide is an attempt to gather up the
knowledge we've gained so that others don't have to make the same
mistakes we did while learning.
The author's ZODB project is described in a paper available here,
This document will always be a work in progress. If you wish to
suggest clarifications or additional topics, please send your comments to
Andrew Kuchling wrote the original version of this guide, which
provided some of the first ZODB documentation for Python programmers.
His initial version has been updated over time by Jeremy Hylton and
Tim Peters.
I'd like to thank the people who've pointed out inaccuracies and bugs,
offered suggestions on the text, or proposed new topics that should be
covered: Jeff Bauer, Willem Broekema, Thomas Guettler,
Chris McDonough, George Runyan.
% links.tex
% Collection of relevant links
Introduction to the Zope Object Database, by Jim Fulton:
Goes into much greater detail, explaining advanced uses of the ZODB and
how it's actually implemented. A definitive reference, and highly recommended.
Persistent Programing with ZODB, by Jeremy Hylton and Barry Warsaw:
Slides for a tutorial presented at the 10th Python conference. Covers
much of the same ground as this guide, with more details in some areas
and less in others.
% Related Modules
% PersistentMapping
% PersistentList
% BTrees
% Total Ordering and Persistence
% Iteration and Mutation
% BTree Diagnostic Tools
\section{Related Modules}
The ZODB package includes a number of related modules that provide
useful data types such as BTrees.
The \class{PersistentMapping} class is a wrapper for mapping objects
that will set the dirty bit when the mapping is modified by setting or
deleting a key.
\begin{funcdesc}{PersistentMapping}{container = \{\}}
Create a \class{PersistentMapping} object that wraps the
mapping object \var{container}. If you don't specify a
value for \var{container}, a regular Python dictionary is used.
\class{PersistentMapping} objects support all the same methods as
Python dictionaries do.
The \class{PersistentList} class is a wrapper for mutable sequence objects,
much as \class{PersistentMapping} is a wrapper for mappings.
\begin{funcdesc}{PersistentList}{initlist = []}
Create a \class{PersistentList} object that wraps the
mutable sequence object \var{initlist}. If you don't specify a
value for \var{initlist}, a regular Python list is used.
\class{PersistentList} objects support all the same methods as
Python lists do.
\subsection{BTrees Package}
When programming with the ZODB, Python dictionaries aren't always what
you need. The most important case is where you want to store a very
large mapping. When a Python dictionary is accessed in a ZODB, the
whole dictionary has to be unpickled and brought into memory. If
you're storing something very large, such as a 100,000-entry user
database, unpickling such a large object will be slow. BTrees are a
balanced tree data structure that behave like a mapping but distribute
keys throughout a number of tree nodes. The nodes are stored in
sorted order (this has important consequences -- see below). Nodes are
then only unpickled and brought into memory as they're accessed, so the
entire tree doesn't have to occupy memory (unless you really are
touching every single key).
The BTrees package provides a large collection of related data
structures. There are variants of the data structures specialized to
integers, which are faster and use less memory. There
are five modules that handle the different variants. The first two
letters of the module name specify the types of the keys and values in
mappings -- O for any object, I for 32-bit signed integer, and (new in
ZODB 3.4) F for 32-bit C float. For example, the \module{BTrees.IOBTree}
module provides a mapping with integer keys and arbitrary objects as values.
The four data structures provide by each module are a BTree, a Bucket,
a TreeSet, and a Set. The BTree and Bucket types are mappings and
support all the usual mapping methods, e.g. \function{update()} and
\function{keys()}. The TreeSet and Set types are similar to mappings
but they have no values; they support the methods that make sense for
a mapping with no keys, e.g. \function{keys()} but not
\function{items()}. The Bucket and Set types are the individual
building blocks for BTrees and TreeSets, respectively. A Bucket or
Set can be used when you are sure that it will have few elements. If
the data structure will grow large, you should use a BTree or TreeSet.
Like Python lists, Buckets and Sets are allocated in one
contiguous piece, and insertions and deletions can take time
proportional to the number of existing elements. Also like Python lists,
a Bucket or Set is a single object, and is pickled and unpickled in its
entirety. BTrees and TreeSets are multi-level tree structures with
much better (logarithmic) worst-case time bounds, and the tree structure
is built out of multiple objects, which ZODB can load individually
as needed.
The five modules are named \module{OOBTree}, \module{IOBTree},
\module{OIBTree}, \module{IIBTree}, and (new in ZODB 3.4)
\module{IFBTree}. The two letter prefixes are repeated in the data types
names. The \module{BTrees.OOBTree} module defines the following types:
\class{OOBTree}, \class{OOBucket}, \class{OOSet}, and \class{OOTreeSet}.
Similarly, the other four modules each define their own variants of those
four types.
The \function{keys()}, \function{values()}, and \function{items()}
methods on BTree and TreeSet types do not materialize a list with all
of the data. Instead, they return lazy sequences that fetch data
from the BTree as needed. They also support optional arguments to
specify the minimum and maximum values to return, often called "range
searching". Because all these types are stored in sorted order, range
searching is very efficient.
The \function{keys()}, \function{values()}, and \function{items()}
methods on Bucket and Set types do return lists with all the data.
Starting in ZODB 3.3, there are also \function{iterkeys()},
\function{itervalues()}, and \function{iteritems()} methods that
return iterators (in the Python 2.2 sense). Those methods also apply to
BTree and TreeSet objects.
A BTree object supports all the methods you would expect of a mapping,
with a few extensions that exploit the fact that the keys are sorted.
The example below demonstrates how some of the methods work. The
extra methods are \function{minKey()} and \function{maxKey()}, which
find the minimum and maximum key value subject to an optional bound
argument, and \function{byValue()}, which should probably be ignored
(it's hard to explain exactly what it does, and as a result it's
almost never used -- best to consider it deprecated). The various
methods for enumerating keys, values and items also accept minimum
and maximum key arguments ("range search"), and (new in ZODB 3.3)
optional Boolean arguments to control whether a range search is
inclusive or exclusive of the range's endpoints.
>>> from BTrees.OOBTree import OOBTree
>>> t = OOBTree()
>>> t.update({1: "red", 2: "green", 3: "blue", 4: "spades"})
>>> len(t)
>>> t[2]
>>> s = t.keys() # this is a "lazy" sequence object
>>> s
<OOBTreeItems object at 0x0088AD20>
>>> len(s) # it acts like a Python list
>>> s[-2]
>>> list(s) # materialize the full list
[1, 2, 3, 4]
>>> list(t.values())
['red', 'green', 'blue', 'spades']
>>> list(t.values(1, 2)) # values at keys in 1 to 2 inclusive
['red', 'green']
>>> list(t.values(2)) # values at keys >= 2
['green', 'blue', 'spades']
>>> list(t.values(min=1, max=4)) # keyword args new in ZODB 3.3
['red', 'green', 'blue', 'spades']
>>> list(t.values(min=1, max=4, excludemin=True, excludemax=True))
['green', 'blue']
>>> t.minKey() # smallest key
>>> t.minKey(1.5) # smallest key >= 1.5
>>> for k in t.keys():
... print k,
1 2 3 4
>>> for k in t: # new in ZODB 3.3
... print k,
1 2 3 4
>>> for pair in t.iteritems(): # new in ZODB 3.3
... print pair,
(1, 'red') (2, 'green') (3, 'blue') (4, 'spades')
>>> t.has_key(4) # returns a true value, but exactly what undefined
>>> t.has_key(5)
>>> 4 in t # new in ZODB 3.3
>>> 5 in t # new in ZODB 3.3
% XXX I'm not sure all of the following is actually correct. The
% XXX set functions have complicated behavior.
Each of the modules also defines some functions that operate on
BTrees -- \function{difference()}, \function{union()}, and
\function{intersection()}. The \function{difference()} function returns
a Bucket, while the other two methods return a Set.
If the keys are integers, then the module also defines
\function{multiunion()}. If the values are integers or floats, then the
module also defines \function{weightedIntersection()} and
\function{weightedUnion()}. The function doc strings describe each
function briefly.
\code{BTrees/} defines the operations, and is the official
documentation. Note that the interfaces don't define the concrete types
returned by most operations, and you shouldn't rely on the concrete types
that happen to be returned: stick to operations guaranteed by the
interface. In particular, note that the interfaces don't specify anything
about comparison behavior, and so nothing about it is guaranteed. In ZODB
3.3, for example, two BTrees happen to use Python's default object
comparison, which amounts to comparing the (arbitrary but fixed) memory
addresses of the BTrees. This may or may not be true in future releases.
If the interfaces don't specify a behavior, then whether that behavior
appears to work, and exactly happens if it does appear to work, are
undefined and should not be relied on.
\subsubsection{Total Ordering and Persistence}
The BTree-based data structures differ from Python dicts in several
fundamental ways. One of the most important is that while dicts
require that keys support hash codes and equality comparison,
the BTree-based structures don't use hash codes and require a total
ordering on keys.
Total ordering means three things:
\item Reflexive. For each \var{x}, \code{\var{x} == \var{x}} is true.
\item Trichotomy. For each \var{x} and \var{y}, exactly one of
\code{\var{x} < \var{y}}, \code{\var{x} == \var{y}}, and
\code{\var{x} > \var{y}} is true.
\item Transitivity. Whenever \code{\var{x} <= \var{y}} and
\code{\var{y} <= \var{z}}, it's also true that
\code{\var{x} <= \var{z}}.
The default comparison functions for most objects that come with Python
satisfy these rules, with some crucial cautions explained later. Complex
numbers are an example of an object whose default comparison function
does not satisfy these rules: complex numbers only support \code{==}
and \code{!=} comparisons, and raise an exception if you try to compare
them in any other way. They don't satisfy the trichotomy rule, and must
not be used as keys in BTree-based data structures (although note that
complex numbers can be used as keys in Python dicts, which do not require
a total ordering).
Examples of objects that are wholly safe to use as keys in BTree-based
structures include ints, longs, floats, 8-bit strings, Unicode strings,
and tuples composed (possibly recursively) of objects of wholly safe
It's important to realize that even if two types satisfy the
rules on their own, mixing objects of those types may not. For example,
8-bit strings and Unicode strings both supply total orderings, but mixing
the two loses trichotomy; e.g., \code{'x' < chr(255)} and
\code{u'x' == 'x'}, but trying to compare \code{chr(255)} to
\code{u'x'} raises an exception. Partly for this reason (another is
given later), it can be dangerous to use keys with multiple types in
a single BTree-based structure. Don't try to do that, and you don't
have to worry about it.
Another potential problem is mutability: when a key is inserted in a
BTree-based structure, it must retain the same order relative to the
other keys over time. This is easy to run afoul of if you use mutable
objects as keys. For example, lists supply a total ordering, and then
>>> L1, L2, L3 = [1], [2], [3]
>>> from BTrees.OOBTree import OOSet
>>> s = OOSet((L2, L3, L1)) # this is fine, so far
>>> list(s.keys()) # note that the lists are in sorted order
[[1], [2], [3]]
>>> s.has_key([3]) # and [3] is in the set
>>> L2[0] = 5 # horrible -- the set is insane now
>>> s.has_key([3]) # for example, it's insane this way
>>> s
OOSet([[1], [5], [3]])
Key lookup relies on that the keys remain in sorted order (an efficient
form of binary search is used). By mutating key L2 after inserting it,
we destroyed the invariant that the OOSet is sorted. As a result, all
future operations on this set are unpredictable.
A subtler variant of this problem arises due to persistence: by default,
Python does several kinds of comparison by comparing the memory
addresses of two objects. Because Python never moves an object in memory,
this does supply a usable (albeit arbitrary) total ordering across the
life of a program run (an object's memory address doesn't change). But
if objects compared in this way are used as keys of a BTree-based
structure that's stored in a database, when the objects are loaded from
the database again they will almost certainly wind up at different
memory addresses. There's no guarantee then that if key K1 had a memory
address smaller than the memory address of key K2 at the time K1 and
K2 were inserted in a BTree, K1's address will also be smaller than
K2's when that BTree is loaded from a database later. The result will
be an insane BTree, where various operations do and don't work as
expected, seemingly at random.
Now each of the types identified above as "wholly safe to use" never
compares two instances of that type by memory address, so there's
nothing to worry about here if you use keys of those types. The most
common mistake is to use keys that are instances of a user-defined class
that doesn't supply its own \method{__cmp__()} method. Python compares
such instances by memory address. This is fine if such instances are
used as keys in temporary BTree-based structures used only in a single
program run. It can be disastrous if that BTree-based structure is
stored to a database, though.
>>> class C:
... pass
>>> a, b = C(), C()
>>> print a < b # this may print 0 if you try it
>>> del a, b
>>> a, b = C(), C()
>>> print a < b # and this may print 0 or 1
That example illustrates that comparison of instances of classes that
don't define \method{__cmp__()} yields arbitrary results (but consistent
results within a single program run).
Another problem occurs with instances of classes that do define
\method{__cmp__()}, but define it incorrectly. It's possible but
rare for a custom \method{__cmp__()} implementation to violate one
of the three required formal properties directly. It's more common for
it to "fall back" to address-based comparison by mistake.
For example,
class Mine:
def __cmp__(self, other):
if other.__class__ is Mine:
return cmp(,
return cmp(, other)
It's quite possible there that the \keyword{else} clause allows
a result to be computed based on memory address. The bug won't show
up until a BTree-based structure uses objects of class \class{Mine} as
keys, and also objects of other types as keys, and the structure is
loaded from a database, and a sequence of comparisons happens to execute
the \keyword{else} clause in a case where the relative order of object
memory addresses happened to change.
This is as difficult to track down as it sounds, so best to stay far
away from the possibility.
You'll stay out of trouble by follwing these rules, violating them
only with great care:
\item Use objects of simple immutable types as keys in
BTree-based data structures.
\item Within a single BTree-based data structure, use objects of
a single type as keys. Don't use multiple key types in a
single structure.
\item If you want to use class instances as keys, and there's
any possibility that the structure may be stored in a
database, it's crucial that the class define a
\method{__cmp__()} method, and that the method is
carefully implemented.
Any part of a comparison implementation that relies (explicitly
or implicitly) on an address-based comparison result will
eventually cause serious failure.
\item Do not use \class{Persistent} objects as keys, or objects of a
subclass of \class{Persistent}.
That last item may be surprising. It stems from details of how
conflict resolution is implemented: the states passed to conflict
resolution do not materialize persistent subobjects (if a persistent
object P is a key in a BTree, then P is a subobject of the bucket
containing P). Instead, if an object O references a persistent subobject
P directly, and O is involved in a conflict, the states passed to
conflict resolution contain an instance of an internal
\class{PersistentReference} stub class everywhere O references P.
Two \class{PersistentReference} instances compare equal if and only if
they "represent" the same persistent object; when they're not equal,
they compare by memory address, and, as explained before, memory-based
comparison must never happen in a sane persistent BTree. Note that it
doesn't help in this case if your \class{Persistent} subclass defines
a sane \method{__cmp__()} method: conflict resolution doesn't know
about your class, and so also doesn't know about its \method{__cmp__()}
method. It only sees instances of the internal \class{PersistentReference}
stub class.
\subsubsection{Iteration and Mutation}
As with a Python dictionary or list, you should not mutate a BTree-based
data structure while iterating over it, except that it's fine to replace
the value associated with an existing key while iterating. You won't
create internal damage in the structure if you try to remove, or add new
keys, while iterating, but the results are undefined and unpredictable. A
weak attempt is made to raise \exception{RuntimeError} if the size of a
BTree-based structure changes while iterating, but it doesn't catch most
such cases, and is also unreliable. Example:
>>> from BTrees.IIBTree import *
>>> s = IISet(range(10))
>>> list(s)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> for i in s: # the output is undefined
... print i,
... s.remove(i)
0 2 4 6 8
Traceback (most recent call last):
File "<stdin>", line 1, in ?
RuntimeError: the bucket being iterated changed size
>>> list(s) # this output is also undefined
[1, 3, 5, 7, 9]
Also as with Python dictionaries and lists, the safe and predictable way
to mutate a BTree-based structure while iterating over it is to iterate
over a copy of the keys. Example:
>>> from BTrees.IIBTree import *
>>> s = IISet(range(10))
>>> for i in list(s.keys()): # this is well defined
... print i,
... s.remove(i)
0 1 2 3 4 5 6 7 8 9
>>> list(s)
\subsubsection{BTree Diagnostic Tools}
A BTree (or TreeSet) is a complex data structure, really a graph of
variable-size nodes, connected in multiple ways via three distinct kinds
of C pointers. There are some tools available to help check internal
consistency of a BTree as a whole.
Most generally useful is the \module{BTrees.check} module. The
\function{check.check()} function examines a BTree (or Bucket, Set, or
TreeSet) for value-based consistency, such as that the keys are in
strictly increasing order. See the function docstring for details.
The \function{check.display()} function displays the internal structure
of a BTree.
BTrees and TreeSets also have a \method{_check()} method. This verifies
that the (possibly many) internal pointers in a BTree or TreeSet
are mutually consistent, and raises \exception{AssertionError} if they're
If a \function{check.check()} or \method{_check()} call fails,
it may point to a bug in the implementation of BTrees or conflict
resolution, or may point to database corruption.
Repairing a damaged BTree is usually best done by making a copy of it.
For example, if \var{} is bound to a corrupted IOBTree,
\begin{verbatim} = IOBTree(
usually suffices. If object identity needs to be preserved,
acopy = IOBTree(
does the same, but leaves \var{} bound to the same object.
%ZODB Programming
% How ZODB works (ExtensionClass, dirty bits)
% Installing ZODB
% Rules for Writing Persistent Classes
\section{ZODB Programming}
\subsection{Installing ZODB}
ZODB is packaged using the standard distutils tools.
You will need Python 2.3 or higher. Since the code is packaged using
distutils, it is simply a matter of untarring or unzipping the release
package, and then running \code{python install}.
You'll need a C compiler to build the packages, because there are
various C extension modules. Binary installers are provided for
Windows users.
\subsubsection{Installing the Packages}
Download the ZODB tarball containing all the packages for both ZODB
and ZEO from \url{}. See
the \file{README.txt} file in the top level of the release directory
for details on building, testing, and installing.
You can find information about ZODB and the most current releases in
the ZODB Wiki at \url{}.
\subsection{How ZODB Works}
The ZODB is conceptually simple. Python classes subclass a
\class{persistent.Persistent} class to become ZODB-aware.
Instances of persistent objects are brought in from a permanent
storage medium, such as a disk file, when the program needs them, and
remain cached in RAM. The ZODB traps modifications to objects, so
that when a statement such as \code{obj.size = 1} is executed, the
modified object is marked as ``dirty.'' On request, any dirty objects
are written out to permanent storage; this is called committing a
transaction. Transactions can also be aborted or rolled back, which
results in any changes being discarded, dirty objects reverting to
their initial state before the transaction began.
The term ``transaction'' has a specific technical meaning in computer
science. It's extremely important that the contents of a database
don't get corrupted by software or hardware crashes, and most database
software offers protection against such corruption by supporting four
useful properties, Atomicity, Consistency, Isolation, and Durability.
In computer science jargon these four terms are collectively dubbed
the ACID properties, forming an acronym from their names.
The ZODB provides all of the ACID properties. Definitions of the
ACID properties are:
\item[Atomicity] means that any changes to data made during a transaction
are all-or-nothing. Either all the changes are applied, or none of
them are. If a program makes a bunch of modifications and then
crashes, the database won't be partially modified, potentially leaving
the data in an inconsistent state; instead all the changes will be
forgotten. That's bad, but it's better than having a
partially-applied modification put the database into an inconsistent
\item[Consistency] means that each transaction executes a valid
transformation of the database state. Some databases, but not ZODB,
provide a variety of consistency checks in the database or language;
for example, a relational database constraint columns to be of
particular types and can enforce relations across tables. Viewed more
generally, atomicity and isolation make it possible for applications
to provide consistency.
\item[Isolation] means that two programs or threads running in two
different transactions cannot see each other's changes until they
commit their transactions.
\item[Durability] means that once a transaction has been committed,
a subsequent crash will not cause any data to be lost or corrupted.
\subsection{Opening a ZODB}
There are 3 main interfaces supplied by the ZODB:
\class{Storage}, \class{DB}, and \class{Connection} classes. The
\class{DB} and \class{Connection} interfaces both have single
implementations, but there are several different classes that
implement the \class{Storage} interface.
\item \class{Storage} classes are the lowest layer, and handle
storing and retrieving objects from some form of long-term storage.
A few different types of Storage have been written, such as
\class{FileStorage}, which uses regular disk files, and
\class{BDBFullStorage}, which uses Sleepycat Software's BerkeleyDB
database. You could write a new Storage that stored objects in a
relational database, for example, if that would
better suit your application. Two example storages,
\class{DemoStorage} and \class{MappingStorage}, are available to use
as models if you want to write a new Storage.
\item The \class{DB} class sits on top of a storage, and mediates the
interaction between several connections. One \class{DB} instance is
created per process.
\item Finally, the \class{Connection} class caches objects, and moves
them into and out of object storage. A multi-threaded program should
open a separate \class{Connection} instance for each thread.
Different threads can then modify objects and commit their
modifications independently.
Preparing to use a ZODB requires 3 steps: you have to open the
\class{Storage}, then create a \class{DB} instance that uses the
\class{Storage}, and then get a \class{Connection} from the \class{DB
instance}. All this is only a few lines of code:
from ZODB import FileStorage, DB
storage = FileStorage.FileStorage('/tmp/test-filestorage.fs')
db = DB(storage)
conn =
Note that you can use a completely different data storage mechanism by
changing the first line that opens a \class{Storage}; the above example uses a
\class{FileStorage}. In section~\ref{zeo}, ``How ZEO Works'',
you'll see how ZEO uses this flexibility to good effect.
\subsection{Using a ZODB Configuration File}
ZODB also supports configuration files written in the ZConfig format.
A configuration file can be used to separate the configuration logic
from the application logic. The storages classes and the \class{DB}
class support a variety of keyword arguments; all these options can be
specified in a config file.
The configuration file is simple. The example in the previous section
could use the following example:
path /tmp/test-filestorage.fs
The \module{ZODB.config} module includes several functions for opening
database and storages from configuration files.
import ZODB.config
db = ZODB.config.databaseFromURL('/tmp/test.conf')
conn =
The ZConfig documentation, included in the ZODB3 release, explains
the format in detail. Each configuration file is described by a
schema, by convention stored in a \file{component.xml} file. ZODB,
ZEO, zLOG, and zdaemon all have schemas.
\subsection{Writing a Persistent Class}
Making a Python class persistent is quite simple; it simply needs to
subclass from the \class{Persistent} class, as shown in this example:
from persistent import Persistent
class User(Persistent):
The \class{Persistent} base class is a new-style class implemented in
For simplicity, in the examples the \class{User} class will
simply be used as a holder for a bunch of attributes. Normally the
class would define various methods that add functionality, but that
has no impact on the ZODB's treatment of the class.
The ZODB uses persistence by reachability; starting from a set of root
objects, all the attributes of those objects are made persistent,
whether they're simple Python data types or class instances. There's
no method to explicitly store objects in a ZODB database; simply
assign them as an attribute of an object, or store them in a mapping,
that's already in the database. This chain of containment must
eventually reach back to the root object of the database.
As an example, we'll create a simple database of users that allows
retrieving a \class{User} object given the user's ID. First, we
retrieve the primary root object of the ZODB using the \method{root()}
method of the \class{Connection} instance. The root object behaves
like a Python dictionary, so you can just add a new key/value pair for
your application's root object. We'll insert an \class{OOBTree} object
that will contain all the \class{User} objects. (The
\class{BTree} module is also included as part of Zope.)
dbroot = conn.root()
# Ensure that a 'userdb' key is present
# in the root
if not dbroot.has_key('userdb'):
from BTrees.OOBTree import OOBTree
dbroot['userdb'] = OOBTree()
userdb = dbroot['userdb']
Inserting a new user is simple: create the \class{User} object, fill
it with data, insert it into the \class{BTree} instance, and commit
this transaction.
\begin{verbatim}# Create new User instance
import transaction
newuser = User()
# Add whatever attributes you want to track = 'amk'
newuser.first_name = 'Andrew' ; newuser.last_name = 'Kuchling'
# Add object to the BTree, keyed on the ID
userdb[] = newuser
# Commit the change
The \module{transaction} module defines a few top-level functions for
working with transactions. \function{commit()} writes any modified
objects to disk, making the changes permanent. \function{abort()} rolls
back any changes that have been made, restoring the original state of
the objects. If you're familiar with database transactional
semantics, this is all what you'd expect. \function{get()} returns a
\class{Transaction} object that has additional methods like
\method{note()}, to add a note to the transaction metadata.
More precisely, the \module{transaction} module exposes an instance of
the \class{ThreadTransactionManager} transaction manager class as
\code{transaction.manager}, and the \module{transaction} functions
\function{get()} and \function{begin()} redirect to the same-named
methods of \code{transaction.manager}. The \function{commit()} and
\function{abort()} functions apply the methods of the same names to
the \class{Transaction} object returned by \code{transaction.manager.get()}.
This is for convenience. It's also possible to create your own transaction
manager instances, and to tell \code{} to use your transaction
manager instead.
Because the integration with Python is so complete, it's a lot like
having transactional semantics for your program's variables, and you
can experiment with transactions at the Python interpreter's prompt:
\begin{verbatim}>>> newuser
<User instance at 81b1f40>
>>> newuser.first_name # Print initial value
>>> newuser.first_name = 'Bob' # Change first name
>>> newuser.first_name # Verify the change
>>> transaction.abort() # Abort transaction
>>> newuser.first_name # The value has changed back
\subsection{Rules for Writing Persistent Classes}
Practically all persistent languages impose some restrictions on
programming style, warning against constructs they can't handle or
adding subtle semantic changes, and the ZODB is no exception.
Happily, the ZODB's restrictions are fairly simple to understand, and
in practice it isn't too painful to work around them.
The summary of rules is as follows:
\item If you modify a mutable object that's the value of an object's
attribute, the ZODB can't catch that, and won't mark the object as
dirty. The solution is to either set the dirty bit yourself when you
modify mutable objects, or use a wrapper for Python's lists and
dictionaries (\class{PersistentList},
that will set the dirty bit properly.
\item Recent versions of the ZODB allow writing a class with
\method{__setattr__} , \method{__getattr__}, or \method{__delattr__}
methods. (Older versions didn't support this at all.) If you write
such a \method{__setattr__} or \method{__delattr__} method, its code
has to set the dirty bit manually.
\item A persistent class should not have a \method{__del__} method.
The database moves objects freely between memory and storage. If an
object has not been used in a while, it may be released and its
contents loaded from storage the next time it is used. Since the
Python interpreter is unaware of persistence, it would call
\method{__del__} each time the object was freed.
Let's look at each of these rules in detail.
\subsubsection{Modifying Mutable Objects}
The ZODB uses various Python hooks to catch attribute accesses, and
can trap most of the ways of modifying an object, but not all of them.
If you modify a \class{User} object by assigning to one of its
attributes, as in \code{userobj.first_name = 'Andrew'}, the ZODB will
mark the object as having been changed, and it'll be written out on
the following \method{commit()}.
The most common idiom that \emph{isn't} caught by the ZODB is
mutating a list or dictionary. If \class{User} objects have a
attribute named \code{friends} containing a list, calling
\code{userobj.friends.append(otherUser)} doesn't mark
\code{userobj} as modified; from the ZODB's point of
view, \code{userobj.friends} was only read, and its value, which
happened to be an ordinary Python list, was returned. The ZODB isn't
aware that the object returned was subsequently modified.
This is one of the few quirks you'll have to remember when using the
ZODB; if you modify a mutable attribute of an object in place, you
have to manually mark the object as having been modified by setting
its dirty bit to true. This is done by setting the
\member{_p_changed} attribute of the object to true:
userobj._p_changed = True
You can hide the implementation detail of having to mark objects as
dirty by designing your class's API to not use direct attribute
access; instead, you can use the Java-style approach of accessor
methods for everything, and then set the dirty bit within the accessor
method. For example, you might forbid accessing the \code{friends}
attribute directly, and add a \method{get_friend_list()} accessor and
an \method{add_friend()} modifier method to the class. \method{add_friend()}
would then look like this:
def add_friend(self, friend):
self._p_changed = True
Alternatively, you could use a ZODB-aware list or mapping type that
handles the dirty bit for you. The ZODB comes with a
\class{PersistentMapping} class, and I've contributed a
\class{PersistentList} class that's included in my ZODB distribution,
and may make it into a future upstream release of Zope.
% XXX It'd be nice to discuss what happens when an object is ``ghosted'' (e.g.
% you set an object's _p_changed = None). The __p_deactivate__ method should
% not be used (it's also obsolete).
\subsubsection{\method{__getattr__}, \method{__delattr__}, and \method{__setattr__}}
ZODB allows persistent classes to have hook methods like
\method{__getattr__} and \method{__setattr__}. There are four special
methods that control attribute access; the rules for each are a little
The \method{__getattr__} method works pretty much the same for
persistent classes as it does for other classes. No special handling
is needed. If an object is a ghost, then it will be activated before
\method{__getattr__} is called.
The other methods are more delicate. They will override the hooks
provided by \class{Persistent}, so user code must call special methods
to invoke those hooks anyway.
The \method{__getattribute__} method will be called for all attribute
access; it overrides the attribute access support inherited from
\class{Persistent}. A user-defined
\method{__getattribute__} must always give the \class{Persistent} base
class a chance to handle special attribute, as well as
\member{__dict__} or \member{__class__}. The user code should
call \method{_p_getattr}, passing the name of the attribute as the
only argument. If it returns True, the user code should call
\class{Persistent}'s \method{__getattribute__} to get the value. If
not, the custom user code can run.
A \method{__setattr__} hook will also override the \class{Persistent}
\method{__setattr__} hook. User code must treat it much like
\method{__getattribute__}. The user-defined code must call
\method{_p_setattr} first to all \class{Persistent} to handle special
attributes; \method{_p_setattr} takes the attribute name and value.
If it returns True, \class{Persistent} handled the attribute. If not,
the user code can run. If the user code modifies the object's state,
it must assigned to \member{_p_changed}.
A \method{__delattr__} hooks must be implemented the same was as a the
last two hooks. The user code must call \method{_p_delattr}, passing
the name of the attribute as an argument. If the call returns True,
\class{Persistent} handled the attribute; if not, the user code can
\subsubsection{\method{__del__} methods}
A \method{__del__} method is invoked just before the memory occupied by an
unreferenced Python object is freed. Because ZODB may materialize, and
dematerialize, a given persistent object in memory any number of times,
there isn't a meaningful relationship between when a persistent object's
\method{__del__} method gets invoked and any natural aspect of a
persistent object's life cycle. For example, it is emphatically not the
case that a persistent object's \method{__del__} method gets invoked only
when the object is no longer referenced by other objects in the database.
\method{__del__} is only concerned with reachability from objects in
Worse, a \method{__del__} method can interfere with the persistence
machinery's goals. For example, some number of persistent objects reside
in a \class{Connection}'s memory cache. At various times, to reduce memory
burden, objects that haven't been referenced recently are removed from the
cache. If a persistent object with a \method{__del___} method is so
removed, and the cache was holding the last memory reference to the object,
the object's \method{__del__} method will be invoked. If the
\method{__del__} method then references any attribute of the object, ZODB
needs to load the object from the database again, in order to satisfy the
attribute reference. This puts the object back into the cache again: such
an object is effectively immortal, occupying space in the memory cache
forever, as every attempt to remove it from cache puts it back into the
cache. In ZODB versions prior to 3.2.2, this could even cause the cache
reduction code to fall into an infinite loop. The infinite loop no longer
occurs, but such objects continue to live in the memory cache forever.
Because \method{__del__} methods don't make good sense for persistent
objects, and can create problems, persistent classes should not define
\method{__del__} methods.
\subsection{Writing Persistent Classes}
Now that we've looked at the basics of programming using the ZODB,
we'll turn to some more subtle tasks that are likely to come up for
anyone using the ZODB in a production system.
\subsubsection{Changing Instance Attributes}
Ideally, before making a class persistent you would get its interface
right the first time, so that no attributes would ever need to be
added, removed, or have their interpretation change over time. It's a
worthy goal, but also an impractical one unless you're gifted with
perfect knowledge of the future. Such unnatural foresight can't be
required of any person, so you therefore have to be prepared to handle
such structural changes gracefully. In object-oriented database
terminology, this is a schema update. The ZODB doesn't have an actual
schema specification, but you're changing the software's expectations
of the data contained by an object, so you're implicitly changing the
One way to handle such a change is to write a one-time conversion
program that will loop over every single object in the database and
update them to match the new schema. This can be easy if your network
of object references is quite structured, making it easy to find all
the instances of the class being modified. For example, if all
\class{User} objects can be found inside a single dictionary or
BTree, then it would be a simple matter to loop over every
\class{User} instance with a \keyword{for} statement.
This is more difficult if your object graph is less structured; if
\class{User} objects can be found as attributes of any number of
different class instances, then there's no longer any easy way to find
them all, short of writing a generalized object traversal function
that would walk over every single object in a ZODB, checking each one
to see if it's an instance of \class{User}.
Some OODBs support a feature called extents, which allow quickly
finding all the instances of a given class, no matter where they are
in the object graph; unfortunately the ZODB doesn't offer extents as a
% XXX Rest of section not written yet: __getstate__/__setstate__
% Storages
% FileStorage
% BerkeleyStorage
% OracleStorage
This chapter will examine the different \class{Storage} subclasses
that are considered stable, discuss their varying characteristics, and
explain how to administer them.
\subsection{Using Multiple Storages}
XXX explain mounting substorages
%Transactions and Versioning
% Committing and Aborting
% Subtransactions
% Undoing
% Versions
% Multithreaded ZODB Programs
\section{Transactions and Versioning}
\subsection{Committing and Aborting}
Changes made during a transaction don't appear in the database until
the transaction commits. This is done by calling the \method{commit()}
method of the current \class{Transaction} object, where the latter is
obtained from the \method{get()} method of the current transaction
manager. If the default thread transaction manager is being used, then
\code{transaction.commit()} suffices.
Similarly, a transaction can be explicitly aborted (all changes within
the transaction thrown away) by invoking the \method{abort()} method
of the current \class{Transaction} object, or simply
\code{transaction.abort()} if using the default thread transaction manager.
Prior to ZODB 3.3, if a commit failed (meaning the \code{commit()} call
raised an exception), the transaction was implicitly aborted and a new
transaction was implicitly started. This could be very surprising if the
exception was suppressed, and especially if the failing commit was one
in a sequence of subtransaction commits.
So, starting with ZODB 3.3, if a commit fails, all further attempts to
commit, join, or register with the transaction raise
\exception{ZODB.POSException.TransactionFailedError}. You must explicitly
start a new transaction then, either by calling the \method{abort()} method
of the current transaction, or by calling the \method{begin()} method of the
current transaction's transaction manager.
Subtransactions can be created within a transaction. Each
subtransaction can be individually committed and aborted, but the
changes within a subtransaction are not truly committed until the
containing transaction is committed.
The primary purpose of subtransactions is to decrease the memory usage
of transactions that touch a very large number of objects. Consider a
transaction during which 200,000 objects are modified. All the
objects that are modified in a single transaction have to remain in
memory until the transaction is committed, because the ZODB can't
discard them from the object cache. This can potentially make the
memory usage quite large. With subtransactions, a commit can be be
performed at intervals, say, every 10,000 objects. Those 10,000
objects are then written to permanent storage and can be purged from
the cache to free more space.
To commit a subtransaction instead of a full transaction,
pass a true value to the \method{commit()}
or \method{abort()} method of the \class{Transaction} object.
# Commit a subtransaction
# Abort a subtransaction
A new subtransaction is automatically started upon successful committing
or aborting the previous subtransaction.
\subsection{Undoing Changes}
Some types of \class{Storage} support undoing a transaction even after
it's been committed. You can tell if this is the case by calling the
\method{supportsUndo()} method of the \class{DB} instance, which
returns true if the underlying storage supports undo. Alternatively
you can call the \method{supportsUndo()} method on the underlying
storage instance.
If a database supports undo, then the \method{undoLog(\var{start},
\var{end}\optional{, func})} method on the \class{DB} instance returns
the log of past transactions, returning transactions between the times
\var{start} and \var{end}, measured in seconds from the epoch.
If present, \var{func} is a function that acts as a filter on the
transactions to be returned; it's passed a dictionary representing
each transaction, and only transactions for which \var{func} returns
true will be included in the list of transactions returned to the
caller of \method{undoLog()}. The dictionary contains keys for
various properties of the transaction. The most important keys are
\samp{id}, for the transaction ID, and \samp{time}, for the time at
which the transaction was committed.
>>> print storage.undoLog(0, sys.maxint)
[{'description': '',
'time': 981126744.98,
'user_name': ''},
{'description': '',
'time': 981126478.202,
'user_name': ''}
To store a description and a user name on a commit, get the current
transaction and call the \method{note(\var{text})} method to store a
description, and the
\method{setUser(\var{user_name})} method to store the user name.
While \method{setUser()} overwrites the current user name and replaces
it with the new value, the \method{note()} method always adds the text
to the transaction's description, so it can be called several times to
log several different changes made in the course of a single
transaction.get().note('Change ownership')
To undo a transaction, call the \method{DB.undo(\var{id})} method,
passing it the ID of the transaction to undo. If the transaction
can't be undone, a \exception{ZODB.POSException.UndoError} exception
will be raised, with the message ``non-undoable
transaction''. Usually this will happen because later transactions
modified the objects affected by the transaction you're trying to
After you call \method{undo()} you must commit the transaction for the
undo to actually be applied.
\footnote{There are actually two different ways a storage can
implement the undo feature. Most of the storages that ship with ZODB
use the transactional form of undo described in the main text. Some
storages may use a non-transactional undo makes changes visible
immediately.} There is one glitch in the undo process. The thread
that calls undo may not see the changes to the object until it calls
\method{Connection.sync()} or commits another transaction.
Versions should be avoided. They're going to be deprecated,
replaced by better approaches to long-running transactions.
While many subtransactions can be contained within a single regular
transaction, it's also possible to contain many regular transactions
within a long-running transaction, called a version in ZODB
terminology. Inside a version, any number of transactions can be
created and committed or rolled back, but the changes within a version
are not made visible to other connections to the same ZODB.
Not all storages support versions, but you can test for versioning
ability by calling \method{supportsVersions()} method of the
\class{DB} instance, which returns true if the underlying storage
supports versioning.
A version can be selected when creating the \class{Connection}
instance using the \method{\optional{\var{version}})} method.
The \var{version} argument must be a string that will be used as the
name of the version.
vers_conn ='Working version')
Transactions can then be committed and aborted using this versioned
connection. Other connections that don't specify a version, or
provide a different version name, will not see changes committed
within the version named \samp{Working~version}. To commit or abort a
version, which will either make the changes visible to all clients or
roll them back, call the \method{DB.commitVersion()} or
\method{DB.abortVersion()} methods.
XXX what are the source and dest arguments for?
The ZODB makes no attempt to reconcile changes between different
versions. Instead, the first version which modifies an object will
gain a lock on that object. Attempting to modify the object from a
different version or from an unversioned connection will cause a
\exception{ZODB.POSException.VersionLockError} to be raised:
from ZODB.POSException import VersionLockError
except VersionLockError, (obj_id, version):
print ('Cannot commit; object %s '
'locked by version %s' % (obj_id, version))
The exception provides the ID of the locked object, and the name of
the version having a lock on it.
\subsection{Multithreaded ZODB Programs}
ZODB databases can be accessed from multithreaded Python programs.
The \class{Storage} and \class{DB} instances can be shared among
several threads, as long as individual \class{Connection} instances
are created for each thread.
% Installing ZEO
% How ZEO works (ClientStorage)
% Configuring ZEO
\subsection{How ZEO Works}
The ZODB, as I've described it so far, can only be used within a
single Python process (though perhaps with multiple threads). ZEO,
Zope Enterprise Objects, extends the ZODB machinery to provide access
to objects over a network. The name "Zope Enterprise Objects" is a
bit misleading; ZEO can be used to store Python objects and access
them in a distributed fashion without Zope ever entering the picture.
The combination of ZEO and ZODB is essentially a Python-specific
object database.
ZEO consists of about 12,000 lines of Python code, excluding tests. The
code is relatively small because it contains only code for a TCP/IP
server, and for a new type of Storage, \class{ClientStorage}.
\class{ClientStorage} simply makes remote procedure calls to the
server, which then passes them on a regular \class{Storage} class such
as \class{FileStorage}. The following diagram lays out the system:
XXX insert diagram here later
Any number of processes can create a \class{ClientStorage}
instance, and any number of threads in each process can be using that
instance. \class{ClientStorage} aggressively caches objects
locally, so in order to avoid using stale data the ZEO server sends
an invalidation message to all the connected \class{ClientStorage}
instances on every write operation. The invalidation message contains
the object ID for each object that's been modified, letting the
\class{ClientStorage} instances delete the old data for the
given object from their caches.
This design decision has some consequences you should be aware of.
First, while ZEO isn't tied to Zope, it was first written for use with
Zope, which stores HTML, images, and program code in the database. As
a result, reads from the database are \emph{far} more frequent than
writes, and ZEO is therefore better suited for read-intensive
applications. If every \class{ClientStorage} is writing to the
database all the time, this will result in a storm of invalidate
messages being sent, and this might take up more processing time than
the actual database operations themselves. These messages are
small and sent in batches, so there would need to be a lot of writes
before it became a problem.
On the other hand, for applications that have few writes in comparison
to the number of read accesses, this aggressive caching can be a major
win. Consider a Slashdot-like discussion forum that divides the load
among several Web servers. If news items and postings are represented
by objects and accessed through ZEO, then the most heavily accessed
objects -- the most recent or most popular postings -- will very
quickly wind up in the caches of the
\class{ClientStorage} instances on the front-end servers. The
back-end ZEO server will do relatively little work, only being called
upon to return the occasional older posting that's requested, and to
send the occasional invalidate message when a new posting is added.
The ZEO server isn't going to be contacted for every single request,
so its workload will remain manageable.
\subsection{Installing ZEO}
This section covers how to install the ZEO package, and how to
configure and run a ZEO Storage Server on a machine.
The ZEO server software is included in ZODB3. As with the rest of
ZODB3, you'll need Python 2.3 or higher.
\subsubsection{Running a server}
The script in the ZEO directory can be used to start a
server. Run it with the -h option to see the various values. If
you're just experimenting, a good choise is to use
\code{python ZEO/ -a /tmp/zeosocket -f /tmp/test.fs} to run
ZEO with a Unix domain socket and a \class{FileStorage}.
\subsection{Testing the ZEO Installation}
Once a ZEO server is up and running, using it is just like using ZODB
with a more conventional disk-based storage; no new programming
details are introduced by using a remote server. The only difference
is that programs must create a \class{ClientStorage} instance instead
of a \class{FileStorage} instance. From that point onward, ZODB-based
code is happily unaware that objects are being retrieved from a ZEO
server, and not from the local disk.
As an example, and to test whether ZEO is working correctly, try
running the following lines of code, which will connect to the server,
add some bits of data to the root of the ZODB, and commits the
from ZEO import ClientStorage
from ZODB import DB
import transaction
# Change next line to connect to your ZEO server
addr = '', 1975
storage = ClientStorage.ClientStorage(addr)
db = DB(storage)
conn =
root = conn.root()
# Store some things in the root
root['list'] = ['a', 'b', 1.0, 3]
root['dict'] = {'a':1, 'b':4}
# Commit the transaction
If this code runs properly, then your ZEO server is working correctly.
You can also use a configuration file.
server localhost:9100
One nice feature of the configuration file is that you don't need to
specify imports for a specific storage. That makes the code a little
shorter and allows you to change storages without changing the code.
import ZODB.config
db = ZODB.config.databaseFromURL('/tmp/zeo.conf')
\subsection{ZEO Programming Notes}
ZEO is written using \module{asyncore}, from the Python standard
library. It assumes that some part of the user application is running
an \module{asyncore} mainloop. For example, Zope run the loop in a
separate thread and ZEO uses that. If your application does not have
a mainloop, ZEO will not process incoming invalidation messages until
you make some call into ZEO. The \method{Connection.sync} method can
be used to process pending invalidation messages. You can call it
when you want to make sure the \class{Connection} has the most recent
version of every object, but you don't have any other work for ZEO to do.
\subsection{Sample Application:}
For an example application, we'll build a little chat application.
What's interesting is that none of the application's code deals with
network programming at all; instead, an object will hold chat
messages, and be magically shared between all the clients through ZEO.
I won't present the complete script here; it's included in my ZODB
distribution, and you can download it from
\url{}. Only the interesting portions of
the code will be covered here.
The basic data structure is the \class{ChatSession} object,
which provides an \method{add_message()} method that adds a
message, and a \method{new_messages()} method that returns a list
of new messages that have accumulated since the last call to
\method{new_messages()}. Internally, \class{ChatSession}
maintains a B-tree that uses the time as the key, and stores the
message as the corresponding value.
The constructor for \class{ChatSession} is pretty simple; it simply
creates an attribute containing a B-tree:
class ChatSession(Persistent):
def __init__(self, name): = name
# Internal attribute: _messages holds all the chat messages.
self._messages = BTrees.OOBTree.OOBTree()
\method{add_message()} has to add a message to the
\code{_messages} B-tree. A complication is that it's possible
that some other client is trying to add a message at the same time;
when this happens, the client that commits first wins, and the second
client will get a \exception{ConflictError} exception when it tries to
commit. For this application, \exception{ConflictError} isn't serious
but simply means that the operation has to be retried; other
applications might treat it as a fatal error. The code uses
\code{try...except...else} inside a \code{while} loop,
breaking out of the loop when the commit works without raising an
def add_message(self, message):
"""Add a message to the channel.
message -- text of the message to be added
while 1:
now = time.time()
self._messages[now] = message
except ConflictError:
# Conflict occurred; this process should pause and
# wait for a little bit, then try again.
# No ConflictError exception raised, so break
# out of the enclosing while loop.
# end while
\method{new_messages()} introduces the use of \textit{volatile}
attributes. Attributes of a persistent object that begin with
\code{_v_} are considered volatile and are never stored in the
database. \method{new_messages()} needs to store the last time
the method was called, but if the time was stored as a regular
attribute, its value would be committed to the database and shared
with all the other clients. \method{new_messages()} would then
return the new messages accumulated since any other client called
\method{new_messages()}, which isn't what we want.
def new_messages(self):
"Return new messages."
# self._v_last_time is the time of the most recent message
# returned to the user of this class.
if not hasattr(self, '_v_last_time'):
self._v_last_time = 0
new = []
T = self._v_last_time
for T2, message in self._messages.items():
if T2 > T:
self._v_last_time = T2
return new
This application is interesting because it uses ZEO to easily share a
data structure; ZEO and ZODB are being used for their networking
ability, not primarily for their data storage ability. I can foresee
many interesting applications using ZEO in this way:
\item With a Tkinter front-end, and a cleverer, more scalable data
structure, you could build a shared whiteboard using the same
\item A shared chessboard object would make writing a networked chess
game easy.
\item You could create a Python class containing a CD's title and
track information. To make a CD database, a read-only ZEO server
could be opened to the world, or an HTTP or XML-RPC interface could
be written on top of the ZODB.
\item A program like Quicken could use a ZODB on the local disk to
store its data. This avoids the need to write and maintain
specialized I/O code that reads in your objects and writes them out;
instead you can concentrate on the problem domain, writing objects
that represent cheques, stock portfolios, or whatever.
\title{ZODB/ZEO Programming Guide}
\author{A.M.\ Kuchling}
\copyright{Copyright 2002 A.M. Kuchling.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the appendix entitled ``GNU
Free Documentation License''.}
\input links.tex
\input gfdl.tex
