Commit a4924845 authored by Laurence Rowe's avatar Laurence Rowe

The ZODB/ZEO Programming Guide has been moved into it's own package

(zodbguide) and published at http://docs.zope.org/zodb.
parent 9f48ce3a
This directory contains Andrew Kuchling's programmer's guide to ZODB
and ZEO. It was originally taken from Andrew's zodb.sf.net project on
SourceForge. Because the original version is no longer updated, this
version is best viewed as an independent fork now.
Write section on __setstate__
Continue working on it
Suppress the full GFDL text in the PDF/PS versions
% Administration
% Importing and exporting data
% Disaster recovery/avoidance
% Security
import sys, time, os, random
import transaction
from persistent import Persistent
from ZEO import ClientStorage
import ZODB
from ZODB.POSException import ConflictError
from BTrees import OOBTree
class ChatSession(Persistent):
"""Class for a chat session.
Messages are stored in a B-tree, indexed by the time the message
was created. (Eventually we'd want to throw messages out,
add_message(message) -- add a message to the channel
new_messages() -- return new messages since the last call to
this method
"""
def __init__(self, name):
"""Initialize new chat session.
name -- the channel's name
"""
self.name = name
# Internal attribute: _messages holds all the chat messages.
self._messages = OOBTree.OOBTree()
def new_messages(self):
"Return new messages."
# self._v_last_time is the time of the most recent message
# returned to the user of this class.
if not hasattr(self, '_v_last_time'):
self._v_last_time = 0
new = []
T = self._v_last_time
for T2, message in self._messages.items():
if T2 > T:
new.append( message )
self._v_last_time = T2
return new
def add_message(self, message):
"""Add a message to the channel.
message -- text of the message to be added
"""
while 1:
try:
now = time.time()
self._messages[ now ] = message
transaction.commit()
except ConflictError:
# Conflict occurred; this process should pause and
# wait for a little bit, then try again.
time.sleep(.2)
pass
else:
# No ConflictError exception raised, so break
# out of the enclosing while loop.
break
# end while
def get_chat_session(conn, channelname):
"""Return the chat session for a given channel, creating the session
if required."""
# We'll keep a B-tree of sessions, mapping channel names to
# session objects. The B-tree is stored at the ZODB's root under
# the key 'chat_sessions'.
root = conn.root()
if not root.has_key('chat_sessions'):
print 'Creating chat_sessions B-tree'
root['chat_sessions'] = OOBTree.OOBTree()
transaction.commit()
sessions = root['chat_sessions']
# Get a session object corresponding to the channel name, creating
# it if necessary.
if not sessions.has_key( channelname ):
print 'Creating new session:', channelname
sessions[ channelname ] = ChatSession(channelname)
transaction.commit()
session = sessions[ channelname ]
return session
if __name__ == '__main__':
if len(sys.argv) != 2:
print 'Usage: %s <channelname>' % sys.argv[0]
sys.exit(0)
storage = ClientStorage.ClientStorage( ('localhost', 9672) )
db = ZODB.DB( storage )
conn = db.open()
s = session = get_chat_session(conn, sys.argv[1])
messages = ['Hi.', 'Hello', 'Me too', "I'M 3L33T!!!!"]
while 1:
# Send a random message
msg = random.choice(messages)
session.add_message( '%s: pid %i' % (msg,os.getpid() ))
# Display new messages
for msg in session.new_messages():
print msg
# Wait for a few seconds
pause = random.randint( 1, 4 )
time.sleep( pause )
This diff is collapsed.
% Indexing Data
% BTrees
% Full-text indexing
%Introduction
% What is ZODB?
% What is ZEO?
% OODBs vs. Relational DBs
% Other OODBs
\section{Introduction}
This guide explains how to write Python programs that use the Z Object
Database (ZODB) and Zope Enterprise Objects (ZEO). The latest version
of the guide is always available at
\url{http://www.zope.org/Wikis/ZODB/guide/index.html}.
\subsection{What is the ZODB?}
The ZODB is a persistence system for Python objects. Persistent
programming languages provide facilities that automatically write
objects to disk and read them in again when they're required by a
running program. By installing the ZODB, you add such facilities to
Python.
It's certainly possible to build your own system for making Python
objects persistent. The usual starting points are the \module{pickle}
module, for converting objects into a string representation, and
various database modules, such as the \module{gdbm} or \module{bsddb}
modules, that provide ways to write strings to disk and read them
back. It's straightforward to combine the \module{pickle} module and
a database module to store and retrieve objects, and in fact the
\module{shelve} module, included in Python's standard library, does
this.
The downside is that the programmer has to explicitly manage objects,
reading an object when it's needed and writing it out to disk when the
object is no longer required. The ZODB manages objects for you,
keeping them in a cache, writing them out to disk when they are
modified, and dropping them from the cache if they haven't been used
in a while.
\subsection{OODBs vs. Relational DBs}
Another way to look at it is that the ZODB is a Python-specific
object-oriented database (OODB). Commercial object databases for C++
or Java often require that you jump through some hoops, such as using
a special preprocessor or avoiding certain data types. As we'll see,
the ZODB has some hoops of its own to jump through, but in comparison
the naturalness of the ZODB is astonishing.
Relational databases (RDBs) are far more common than OODBs.
Relational databases store information in tables; a table consists of
any number of rows, each row containing several columns of
information. (Rows are more formally called relations, which is where
the term ``relational database'' originates.)
Let's look at a concrete example. The example comes from my day job
working for the MEMS Exchange, in a greatly simplified version. The
job is to track process runs, which are lists of manufacturing steps
to be performed in a semiconductor fab. A run is owned by a
particular user, and has a name and assigned ID number. Runs consist
of a number of operations; an operation is a single step to be
performed, such as depositing something on a wafer or etching
something off it.
Operations may have parameters, which are additional information
required to perform an operation. For example, if you're depositing
something on a wafer, you need to know two things: 1) what you're
depositing, and 2) how much should be deposited. You might deposit
100 microns of silicon oxide, or 1 micron of copper.
Mapping these structures to a relational database is straightforward:
\begin{verbatim}
CREATE TABLE runs (
int run_id,
varchar owner,
varchar title,
int acct_num,
primary key(run_id)
);
CREATE TABLE operations (
int run_id,
int step_num,
varchar process_id,
PRIMARY KEY(run_id, step_num),
FOREIGN KEY(run_id) REFERENCES runs(run_id),
);
CREATE TABLE parameters (
int run_id,
int step_num,
varchar param_name,
varchar param_value,
PRIMARY KEY(run_id, step_num, param_name)
FOREIGN KEY(run_id, step_num)
REFERENCES operations(run_id, step_num),
);
\end{verbatim}
In Python, you would write three classes named \class{Run},
\class{Operation}, and \class{Parameter}. I won't present code for
defining these classes, since that code is uninteresting at this
point. Each class would contain a single method to begin with, an
\method{__init__} method that assigns default values, such as 0 or
\code{None}, to each attribute of the class.
It's not difficult to write Python code that will create a \class{Run}
instance and populate it with the data from the relational tables;
with a little more effort, you can build a straightforward tool,
usually called an object-relational mapper, to do this automatically.
(See
\url{http://www.amk.ca/python/unmaintained/ordb.html} for a quick hack
at a Python object-relational mapper, and
\url{http://www.python.org/workshops/1997-10/proceedings/shprentz.html}
for Joel Shprentz's more successful implementation of the same idea;
Unlike mine, Shprentz's system has been used for actual work.)
However, it is difficult to make an object-relational mapper
reasonably quick; a simple-minded implementation like mine is quite
slow because it has to do several queries to access all of an object's
data. Higher performance object-relational mappers cache objects to
improve performance, only performing SQL queries when they actually
need to.
That helps if you want to access run number 123 all of a sudden. But
what if you want to find all runs where a step has a parameter named
'thickness' with a value of 2.0? In the relational version, you have
two unappealing choices:
\begin{enumerate}
\item Write a specialized SQL query for this case: \code{SELECT run_id
FROM operations WHERE param_name = 'thickness' AND param_value = 2.0}
If such queries are common, you can end up with lots of specialized
queries. When the database tables get rearranged, all these queries
will need to be modified.
\item An object-relational mapper doesn't help much. Scanning
through the runs means that the the mapper will perform the required
SQL queries to read run \#1, and then a simple Python loop can check
whether any of its steps have the parameter you're looking for.
Repeat for run \#2, 3, and so forth. This does a vast
number of SQL queries, and therefore is incredibly slow.
\end{enumerate}
An object database such as ZODB simply stores internal pointers from
object to object, so reading in a single object is much faster than
doing a bunch of SQL queries and assembling the results. Scanning all
runs, therefore, is still inefficient, but not grossly inefficient.
\subsection{What is ZEO?}
The ZODB comes with a few different classes that implement the
\class{Storage} interface. Such classes handle the job of
writing out Python objects to a physical storage medium, which can be
a disk file (the \class{FileStorage} class), a BerkeleyDB file
(\class{BDBFullStorage}), a relational database
(\class{DCOracleStorage}), or some other medium. ZEO adds
\class{ClientStorage}, a new \class{Storage} that doesn't write to
physical media but just forwards all requests across a network to a
server. The server, which is running an instance of the
\class{StorageServer} class, simply acts as a front-end for some
physical \class{Storage} class. It's a fairly simple idea, but as
we'll see later on in this document, it opens up many possibilities.
\subsection{About this guide}
The primary author of this guide works on a project which uses the
ZODB and ZEO as its primary storage technology. We use the ZODB to
store process runs and operations, a catalog of available processes,
user information, accounting information, and other data. Part of the
goal of writing this document is to make our experience more widely
available. A few times we've spent hours or even days trying to
figure out a problem, and this guide is an attempt to gather up the
knowledge we've gained so that others don't have to make the same
mistakes we did while learning.
The author's ZODB project is described in a paper available here,
\url{http://www.amk.ca/python/writing/mx-architecture/}
This document will always be a work in progress. If you wish to
suggest clarifications or additional topics, please send your comments to
\email{zodb-dev@zope.org}.
\subsection{Acknowledgements}
Andrew Kuchling wrote the original version of this guide, which
provided some of the first ZODB documentation for Python programmers.
His initial version has been updated over time by Jeremy Hylton and
Tim Peters.
I'd like to thank the people who've pointed out inaccuracies and bugs,
offered suggestions on the text, or proposed new topics that should be
covered: Jeff Bauer, Willem Broekema, Thomas Guettler,
Chris McDonough, George Runyan.
% links.tex
% Collection of relevant links
\section{Resources}
Introduction to the Zope Object Database, by Jim Fulton:
\\
Goes into much greater detail, explaining advanced uses of the ZODB and
how it's actually implemented. A definitive reference, and highly recommended.
\\
\url{http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html}
Persistent Programing with ZODB, by Jeremy Hylton and Barry Warsaw:
\\
Slides for a tutorial presented at the 10th Python conference. Covers
much of the same ground as this guide, with more details in some areas
and less in others.
\\
\url{http://www.zope.org/Members/bwarsaw/ipc10-slides}
This diff is collapsed.
This diff is collapsed.
% Storages
% FileStorage
% BerkeleyStorage
% OracleStorage
\section{Storages}
This chapter will examine the different \class{Storage} subclasses
that are considered stable, discuss their varying characteristics, and
explain how to administer them.
\subsection{Using Multiple Storages}
XXX explain mounting substorages
\subsection{FileStorage}
\subsection{BDBFullStorage}
\subsection{OracleStorage}
%Transactions and Versioning
% Committing and Aborting
% Subtransactions
% Undoing
% Versions
% Multithreaded ZODB Programs
\section{Transactions and Versioning}
\subsection{Committing and Aborting}
Changes made during a transaction don't appear in the database until
the transaction commits. This is done by calling the \method{commit()}
method of the current \class{Transaction} object, where the latter is
obtained from the \method{get()} method of the current transaction
manager. If the default thread transaction manager is being used, then
\code{transaction.commit()} suffices.
Similarly, a transaction can be explicitly aborted (all changes within
the transaction thrown away) by invoking the \method{abort()} method
of the current \class{Transaction} object, or simply
\code{transaction.abort()} if using the default thread transaction manager.
Prior to ZODB 3.3, if a commit failed (meaning the \code{commit()} call
raised an exception), the transaction was implicitly aborted and a new
transaction was implicitly started. This could be very surprising if the
exception was suppressed, and especially if the failing commit was one
in a sequence of subtransaction commits.
So, starting with ZODB 3.3, if a commit fails, all further attempts to
commit, join, or register with the transaction raise
\exception{ZODB.POSException.TransactionFailedError}. You must explicitly
start a new transaction then, either by calling the \method{abort()} method
of the current transaction, or by calling the \method{begin()} method of the
current transaction's transaction manager.
\subsection{Subtransactions}
Subtransactions can be created within a transaction. Each
subtransaction can be individually committed and aborted, but the
changes within a subtransaction are not truly committed until the
containing transaction is committed.
The primary purpose of subtransactions is to decrease the memory usage
of transactions that touch a very large number of objects. Consider a
transaction during which 200,000 objects are modified. All the
objects that are modified in a single transaction have to remain in
memory until the transaction is committed, because the ZODB can't
discard them from the object cache. This can potentially make the
memory usage quite large. With subtransactions, a commit can be be
performed at intervals, say, every 10,000 objects. Those 10,000
objects are then written to permanent storage and can be purged from
the cache to free more space.
To commit a subtransaction instead of a full transaction,
pass a true value to the \method{commit()}
or \method{abort()} method of the \class{Transaction} object.
\begin{verbatim}
# Commit a subtransaction
transaction.commit(True)
# Abort a subtransaction
transaction.abort(True)
\end{verbatim}
A new subtransaction is automatically started upon successful committing
or aborting the previous subtransaction.
\subsection{Undoing Changes}
Some types of \class{Storage} support undoing a transaction even after
it's been committed. You can tell if this is the case by calling the
\method{supportsUndo()} method of the \class{DB} instance, which
returns true if the underlying storage supports undo. Alternatively
you can call the \method{supportsUndo()} method on the underlying
storage instance.
If a database supports undo, then the \method{undoLog(\var{start},
\var{end}\optional{, func})} method on the \class{DB} instance returns
the log of past transactions, returning transactions between the times
\var{start} and \var{end}, measured in seconds from the epoch.
If present, \var{func} is a function that acts as a filter on the
transactions to be returned; it's passed a dictionary representing
each transaction, and only transactions for which \var{func} returns
true will be included in the list of transactions returned to the
caller of \method{undoLog()}. The dictionary contains keys for
various properties of the transaction. The most important keys are
\samp{id}, for the transaction ID, and \samp{time}, for the time at
which the transaction was committed.
\begin{verbatim}
>>> print storage.undoLog(0, sys.maxint)
[{'description': '',
'id': 'AzpGEGqU/0QAAAAAAAAGMA',
'time': 981126744.98,
'user_name': ''},
{'description': '',
'id': 'AzpGC/hUOKoAAAAAAAAFDQ',
'time': 981126478.202,
'user_name': ''}
...
\end{verbatim}
To store a description and a user name on a commit, get the current
transaction and call the \method{note(\var{text})} method to store a
description, and the
\method{setUser(\var{user_name})} method to store the user name.
While \method{setUser()} overwrites the current user name and replaces
it with the new value, the \method{note()} method always adds the text
to the transaction's description, so it can be called several times to
log several different changes made in the course of a single
transaction.
\begin{verbatim}
transaction.get().setUser('amk')
transaction.get().note('Change ownership')
\end{verbatim}
To undo a transaction, call the \method{DB.undo(\var{id})} method,
passing it the ID of the transaction to undo. If the transaction
can't be undone, a \exception{ZODB.POSException.UndoError} exception
will be raised, with the message ``non-undoable
transaction''. Usually this will happen because later transactions
modified the objects affected by the transaction you're trying to
undo.
After you call \method{undo()} you must commit the transaction for the
undo to actually be applied.
\footnote{There are actually two different ways a storage can
implement the undo feature. Most of the storages that ship with ZODB
use the transactional form of undo described in the main text. Some
storages may use a non-transactional undo makes changes visible
immediately.} There is one glitch in the undo process. The thread
that calls undo may not see the changes to the object until it calls
\method{Connection.sync()} or commits another transaction.
\subsection{Versions}
\begin{notice}[warning]
Versions should be avoided. They're going to be deprecated,
replaced by better approaches to long-running transactions.
\end{notice}
While many subtransactions can be contained within a single regular
transaction, it's also possible to contain many regular transactions
within a long-running transaction, called a version in ZODB
terminology. Inside a version, any number of transactions can be
created and committed or rolled back, but the changes within a version
are not made visible to other connections to the same ZODB.
Not all storages support versions, but you can test for versioning
ability by calling \method{supportsVersions()} method of the
\class{DB} instance, which returns true if the underlying storage
supports versioning.
A version can be selected when creating the \class{Connection}
instance using the \method{DB.open(\optional{\var{version}})} method.
The \var{version} argument must be a string that will be used as the
name of the version.
\begin{verbatim}
vers_conn = db.open(version='Working version')
\end{verbatim}
Transactions can then be committed and aborted using this versioned
connection. Other connections that don't specify a version, or
provide a different version name, will not see changes committed
within the version named \samp{Working~version}. To commit or abort a
version, which will either make the changes visible to all clients or
roll them back, call the \method{DB.commitVersion()} or
\method{DB.abortVersion()} methods.
XXX what are the source and dest arguments for?
The ZODB makes no attempt to reconcile changes between different
versions. Instead, the first version which modifies an object will
gain a lock on that object. Attempting to modify the object from a
different version or from an unversioned connection will cause a
\exception{ZODB.POSException.VersionLockError} to be raised:
\begin{verbatim}
from ZODB.POSException import VersionLockError
try:
transaction.commit()
except VersionLockError, (obj_id, version):
print ('Cannot commit; object %s '
'locked by version %s' % (obj_id, version))
\end{verbatim}
The exception provides the ID of the locked object, and the name of
the version having a lock on it.
\subsection{Multithreaded ZODB Programs}
ZODB databases can be accessed from multithreaded Python programs.
The \class{Storage} and \class{DB} instances can be shared among
several threads, as long as individual \class{Connection} instances
are created for each thread.
This diff is collapsed.
\documentclass{howto}
\title{ZODB/ZEO Programming Guide}
\release{3.7.0b3}
\date{\today}
\author{A.M.\ Kuchling}
\authoraddress{\email{amk@amk.ca}}
\begin{document}
\maketitle
\tableofcontents
\copyright{Copyright 2002 A.M. Kuchling.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the appendix entitled ``GNU
Free Documentation License''.}
\input{introduction}
\input{prog-zodb}
\input{zeo}
\input{transactions}
\input{modules}
\appendix
\input links.tex
\input gfdl.tex
\end{document}
The ZODB/ZEO Programming Guide has been moved into it's own package
(zodbguide) and published at http://docs.zope.org/zodb.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment