Commit e32f8dd1 authored by Laurence Rowe's avatar Laurence Rowe

Move ZODB docs to own project

parent 6e86042e
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d build/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html web pickle htmlhelp latex changes linkcheck
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " changes to make an overview over all changed/added/deprecated items"
@echo " linkcheck to check all external links for integrity"
clean:
-rm -rf build/*
html:
mkdir -p build/html build/doctrees
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) build/html
@echo
@echo "Build finished. The HTML pages are in build/html."
pickle:
mkdir -p build/pickle build/doctrees
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) build/pickle
@echo
@echo "Build finished; now you can process the pickle files."
web: pickle
json:
mkdir -p build/json build/doctrees
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) build/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
mkdir -p build/htmlhelp build/doctrees
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) build/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in build/htmlhelp."
latex:
mkdir -p build/latex build/doctrees
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) build/latex
@echo
@echo "Build finished; the LaTeX files are in build/latex."
@echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \
"run these through (pdf)latex."
changes:
mkdir -p build/changes build/doctrees
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) build/changes
@echo
@echo "The overview file is in build/changes."
linkcheck:
mkdir -p build/linkcheck build/doctrees
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) build/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in build/linkcheck/output.txt."
``zodbdocs`` contains all ZODB relevant documentation like "ZODB/ZEO
Programming Guide", some ZODB articles and links to the ZODB release notes.
All documentation is formatted as restructured text. To generate HTML using
Sphinx simply type "make html".
An overview of the ZODB (by Laurence Rowe)
==========================================
ZODB in comparison to relational databases, transactions, scalability and best
practice. Originally delivered to the Plone Conference 2007, Naples.
Comparison to other database types
----------------------------------
**Relational Databases** are great at handling large quantities of homogenous
data. If you’re building a ledger system a Relational Database is a great fit.
But Relational Databases only support hierarchical data structures to a
limited degree. Using foreign-key relationships must refer to a single table,
so only a single type can be contained.
**Hierarchical databases** (such as LDAP or a filesystem) are much more
suitable for modelling the flexible containment hierarchies required for
content management applications. But most of these systems do not support
transactional semantics. ORMs such as `SQLAlchemy
<http://www.sqlalchemy.org>`_. make working with Relational Databases in an
object orientated manner much more pleasant. But they don’t overcome the
restrictions inherent in a relational model.
The **ZODB** is an (almost) transparent python object persistence system,
heavily influenced by Smalltalk. As an Object-Orientated Database it gives you
the flexibility to build a data model fit your application. For the most part
you don’t have to worry about persistency - you only work with python objects
and it just happens in the background.
Of course this power comes at a price. While changing the methods your classes
provide is not a problem, changing attributes can necessitate writing a
migration script, as you would with a relational schema change. With ZODB
obejcts though explicit schema migrations are not enforced, which can bite you
later.
Transactions
------------
The ZODB has a transactional support at its core. Transactions provide
concurrency control and atomicity. Transactions are executed as if they have
exclusive access to the data, so as an application developer you don’t have to
worry about threading. Of course there is nothing to prevent two simultaneous
conflicting requests, So checks are made at transaction commit time to ensure
consistency.
Since Zope 2.8 ZODB has implemented **Multi Version Concurrency Control**.
This means no more ReadConflictErrors, each transaction is guaranteed to be
able to load any object as it was when the transaction begun.
You may still see (Write) **ConflictErrors**. These can be minimised using
data structures that support conflict resolution, primarily B-Trees in the
BTrees library. These scalable data structures are used in Large Plone Folders
and many parts of Zope. One downside is that they don’t support user definable
ordering.
The hot points for ConflictErrors are the catalogue indexes. Some of the
indexes do not support conflict resolution and you will see ConflictErrors
under write-intensive loads. On solution is to defer catalogue updates using
`QueueCatalog <http://pypi.python.org/pypi/Products.QueueCatalog>`_
(`PloneQueueCatalog
<http://pypi.python.org/pypi/Products.PloneQueueCatalog>`_), which allows
indexing operations to be serialized using a seperate ZEO client. This can
bring big performance benefits as request retries are reduced, but the
downside is that index updates are no longer reflected immediately in the
application. Another alternative is to offload text indexing to a dedicated
search engine using `collective.solr
<http://pypi.python.org/pypi/collective.solr>`_.
This brings us to **Atomicity**, the other key feature of ZODB transactions. A
transaction will either succeed or fail, your data is never left in an
inconsistent state if an error occurs. This makes Zope a forgiving system to
work with.
You must though be careful with interactions with external systems. If a
ConflictError occurs Zope will attempt to replay a transaction up to three
times. Interactions with an external system should be made through a Data
Manager that participates in the transaction. If you’re talking to a database
use a Zope DA or a SQLAlchemy wrapper like `zope.sqlalchemy
<http://pypi.python.org/pypi/zope.sqlalchemy>`_.
Unfortunately the default MailHost implementation used by Plone is not
transaction aware. With it you can see duplicate emails sent. If this is a
problem use TransactionalMailHost.
Scalability Python is limited to a single CPU by the Global Interpreter Lock,
but that’s ok, ZEO lets us run multiple Zope Application servers sharing a
single database. You should run one Zope client for each processor on your
server. ZEO also lets you connect a debug session to your database at the same
time as your Zope web server, invaluable for debugging.
ZEO tends to be IO bound, so the GIL is not an issue.
ZODB also supports **partitioning**, allowing you to spread data over multiple
storages. However you should be careful about cross database references
(especially when copying and pasting between two databases) as they can be
problematic.
Another common reason to use partitioning is because the ZODB in memory cache
settings are made per database. Separating the catalogue into another storage
lets you set a higher target cache size for catalogue objects than for your
content objects. As much of the Plone interface is catalogue driven this can
have a significant performance benefit, especially on a large site.
.. image:: images/zeo-diagram.png
Storage Options
---------------
**FileStorage** is the default. Everything in one big Data.fs file, which is
essentially a transaction log. Use this unless you have a very good reason not
to.
**DirectoryStorage** (`site <http://dirstorage.sourceforge.net>`_) stores one
file per object revision. Does not require the Data.fs.index to be rebuilt on
an unclean shutdown (which can take a significant time for a large database).
Small number of users.
**RelStorage** (`pypi <http://pypi.python.org/pypi/RelStorage>`_) stores
pickles in a relational database. PostgreSQL, MySQL and Oracle are supported
and no ZEO server is required. You benefit from the faster network layers of
these database adapters. However, conflict resolution is moved to the
application server, which can be bad for worst case performance when you have
high network latency.
BDBStorage, OracleStorage, PGStorage and APE have now fallen by the wayside.
Other features
--------------
**Savepoints** (previously sub-transactions) allow fine grained error control
and objects to be garbage collected during a transaction, saving memory.
Versions are deprecated (and will be removed in ZODB 3.9). The application
layer is responsible for versioning, e.g. CMFEditions / ZopeVersionControl.
**Undo**, don’t rely on it! If your object is indexed it may prove impossible
to undo the transaction (independently) if a later transaction has changed the
same index. Undo is only performed on a single database, so if you have
separated out your catalogue it will get out of sync. Fine for undoing in
portal_skins/custom though.
**BLOBs** are new in ZODB 3.8 / Zope 2.11, bringing efficient large file
support. Great for document management applications.
**Packing** removes old revisions of objects. Similar to `Routine Vacuuming
<http://www.postgresql.org/docs/8.3/static/routine-vacuuming.html>`_ in
PostgreSQL.
Some best practice
------------------
**Don’t write on read**. Your Data.fs should not grow on a read. Beware of
setDefault and avoid inplace migration.
**Keep your code on the filesystem**. Too much stuff in the custom folder will
just lead to pain further down the track. Though this can be very convenient
for getting things done when they are needed yesterday...
**Use scalable data structures** such as BTrees. Keep your content objects
simple, add functionality with adapters and views.
This diff is collapsed.
This diff is collapsed.
ZODB articles
=============
Contents
--------
.. toctree::
:maxdepth: 2
ZODB-overview.rst
ZODB1.rst
ZODB2.rst
Other ZODB Resources
--------------------
- `ZODB/ZEO Programming Guide <../zodbguide/index.html>`_
- Jim Fulton's `Introduction to the Zope Object Database <http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html>`_
- IBM developerWorks `Example-driven ZODB
<http://www.ibm.com/developerworks/aix/library/au-zodb/>`_
- Martin Faassen's `A misconception about the ZODB
<http://faassen.n--tree.net/blog/view/weblog/2008/06/20/0>`_
- `How To Love ZODB and Forget RDBMS
<http://zope.org/Members/adytumsolutions/HowToLoveZODB_PartI>`_
- `ZODB Wiki <http://www.zope.org/Wikis/ZODB/FrontPage>`_ and `Documentation
page <http://wiki.zope.org/ZODB/Documentation>`_
- `ZODB-dev <http://mail.zope.org/mailman/listinfo/zodb-dev>`_ mailing list
and `archive <http://mail.zope.org/pipermail/zodb-dev/>`_
##############################################################################
#
# Copyright (c) 2006 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
#
##############################################################################
"""Bootstrap a buildout-based project
Simply run this script in a directory containing a buildout.cfg.
The script accepts buildout command-line options, so you can
use the -c option to specify an alternate configuration file.
$Id: bootstrap.py 90478 2008-08-27 22:44:46Z georgyberdyshev $
"""
import os, shutil, sys, tempfile, urllib2
tmpeggs = tempfile.mkdtemp()
is_jython = sys.platform.startswith('java')
try:
import pkg_resources
except ImportError:
ez = {}
exec urllib2.urlopen('http://peak.telecommunity.com/dist/ez_setup.py'
).read() in ez
ez['use_setuptools'](to_dir=tmpeggs, download_delay=0)
import pkg_resources
if sys.platform == 'win32':
def quote(c):
if ' ' in c:
return '"%s"' % c # work around spawn lamosity on windows
else:
return c
else:
def quote (c):
return c
cmd = 'from setuptools.command.easy_install import main; main()'
ws = pkg_resources.working_set
if is_jython:
import subprocess
assert subprocess.Popen([sys.executable] + ['-c', quote(cmd), '-mqNxd',
quote(tmpeggs), 'zc.buildout'],
env=dict(os.environ,
PYTHONPATH=
ws.find(pkg_resources.Requirement.parse('setuptools')).location
),
).wait() == 0
else:
assert os.spawnle(
os.P_WAIT, sys.executable, quote (sys.executable),
'-c', quote (cmd), '-mqNxd', quote (tmpeggs), 'zc.buildout',
dict(os.environ,
PYTHONPATH=
ws.find(pkg_resources.Requirement.parse('setuptools')).location
),
) == 0
ws.add_entry(tmpeggs)
ws.require('zc.buildout')
import zc.buildout.buildout
zc.buildout.buildout.main(sys.argv[1:] + ['bootstrap'])
shutil.rmtree(tmpeggs)
[buildout]
develop =
parts =
stxpy
eggs-directory = ${buildout:directory}/eggs
versions = versions
unzip = true
eggs =
[versions]
zc.buildout =
zc.recipe.egg =
[stxpy]
recipe = zc.recipe.egg
eggs =
Sphinx
interpreter = stxpy
scripts =
sphinx-build
sphinx-quickstart
# -*- coding: utf-8 -*-
#
# ZODB documentation and articles documentation build configuration file, created by
# sphinx-quickstart on Sat Feb 21 09:17:33 2009.
#
# This file is execfile()d with the current directory set to its containing dir.
#
# The contents of this file are pickled, so don't put values in the namespace
# that aren't pickleable (module imports are okay, they're removed automatically).
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys, os
# If your extensions are in another directory, add it here. If the directory
# is relative to the documentation root, use os.path.abspath to make it
# absolute, like shown here.
#sys.path.append(os.path.abspath('.'))
# General configuration
# ---------------------
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = []
# Add any paths that contain templates here, relative to this directory.
templates_path = ['.templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'ZODB'
copyright = u'2009, Zope Developers Community'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '3.9'
# The full version, including alpha/beta/rc tags.
release = '3.9.0'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of documents that shouldn't be included in the build.
#unused_docs = []
# List of directories, relative to source directory, that shouldn't be searched
# for source files.
exclude_trees = ['build']
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# Options for HTML output
# -----------------------
# The style sheet to use for HTML and HTML Help pages. A file of that name
# must exist either in Sphinx' static/ path, or in one of the custom paths
# given in html_static_path.
html_style = 'default.css'
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
html_logo = 'logo.png'
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['.static']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_use_modindex = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, the reST sources are included in the HTML build as _sources/<name>.
#html_copy_source = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = ''
# Output file base name for HTML help builder.
htmlhelp_basename = 'ZODBdocumentationandarticlesdoc'
# Options for LaTeX output
# ------------------------
# The paper size ('letter' or 'a4').
#latex_paper_size = 'letter'
# The font size ('10pt', '11pt' or '12pt').
#latex_font_size = '10pt'
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, document class [howto/manual]).
latex_documents = [
('index', 'ZODBdocumentationandarticles.tex', ur'ZODB documentation and articles Documentation',
ur'Zope Developers Community', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# Additional stuff for the LaTeX preamble.
#latex_preamble = ''
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_use_modindex = True
ZODB related documentation and articles
=======================================
.. toctree::
:maxdepth: 1
zodbguide/index
articles/index
Release information
===================
- `Release information at pypi <http://pypi.python.org/pypi/ZODB3>`_
logo.png

3.01 KB

This directory contains Andrew Kuchling's programmer's guide to ZODB
and ZEO. The tex source was not being updated in the ZODB docs directory
It was originally taken from Andrew's zodb.sf.net project on
SourceForge. Because the original version is no longer updated, this
version [in the zodb docs dir] is best viewed as an independent fork now.
Write section on __setstate__
Continue working on it
Suppress the full GFDL text in the PDF/PS versions
.. % Administration
.. % Importing and exporting data
.. % Disaster recovery/avoidance
.. % Security
import sys, time, os, random
import transaction
from persistent import Persistent
from ZEO import ClientStorage
import ZODB
from ZODB.POSException import ConflictError
from BTrees import OOBTree
class ChatSession(Persistent):
"""Class for a chat session.
Messages are stored in a B-tree, indexed by the time the message
was created. (Eventually we'd want to throw messages out,
add_message(message) -- add a message to the channel
new_messages() -- return new messages since the last call to
this method
"""
def __init__(self, name):
"""Initialize new chat session.
name -- the channel's name
"""
self.name = name
# Internal attribute: _messages holds all the chat messages.
self._messages = OOBTree.OOBTree()
def new_messages(self):
"Return new messages."
# self._v_last_time is the time of the most recent message
# returned to the user of this class.
if not hasattr(self, '_v_last_time'):
self._v_last_time = 0
new = []
T = self._v_last_time
for T2, message in self._messages.items():
if T2 > T:
new.append( message )
self._v_last_time = T2
return new
def add_message(self, message):
"""Add a message to the channel.
message -- text of the message to be added
"""
while 1:
try:
now = time.time()
self._messages[ now ] = message
transaction.commit()
except ConflictError:
# Conflict occurred; this process should pause and
# wait for a little bit, then try again.
time.sleep(.2)
pass
else:
# No ConflictError exception raised, so break
# out of the enclosing while loop.
break
# end while
def get_chat_session(conn, channelname):
"""Return the chat session for a given channel, creating the session
if required."""
# We'll keep a B-tree of sessions, mapping channel names to
# session objects. The B-tree is stored at the ZODB's root under
# the key 'chat_sessions'.
root = conn.root()
if not root.has_key('chat_sessions'):
print 'Creating chat_sessions B-tree'
root['chat_sessions'] = OOBTree.OOBTree()
transaction.commit()
sessions = root['chat_sessions']
# Get a session object corresponding to the channel name, creating
# it if necessary.
if not sessions.has_key( channelname ):
print 'Creating new session:', channelname
sessions[ channelname ] = ChatSession(channelname)
transaction.commit()
session = sessions[ channelname ]
return session
if __name__ == '__main__':
if len(sys.argv) != 2:
print 'Usage: %s <channelname>' % sys.argv[0]
sys.exit(0)
storage = ClientStorage.ClientStorage( ('localhost', 9672) )
db = ZODB.DB( storage )
conn = db.open()
s = session = get_chat_session(conn, sys.argv[1])
messages = ['Hi.', 'Hello', 'Me too', "I'M 3L33T!!!!"]
while 1:
# Send a random message
msg = random.choice(messages)
session.add_message( '%s: pid %i' % (msg,os.getpid() ))
# Display new messages
for msg in session.new_messages():
print msg
# Wait for a few seconds
pause = random.randint( 1, 4 )
time.sleep( pause )
# Use the python docs converter to convert to rst
# Requires http://svn.python.org/projects/doctools/converter
from converter import restwriter, convert_file
import sys
import os
if __name__ == '__main__':
try:
rootdir = sys.argv[1]
destdir = os.path.abspath(sys.argv[2])
except IndexError:
print "usage: convert.py docrootdir destdir"
sys.exit()
os.chdir(rootdir)
class IncludeRewrite:
def get(self, a, b=None):
if os.path.exists(a + '.tex'):
return a + '.rst'
print "UNKNOWN FILE %s" % a
return a
restwriter.includes_mapping = IncludeRewrite()
for infile in os.listdir('.'):
if infile.endswith('.tex'):
convert_file(infile, os.path.join(destdir, infile[:-3]+'rst'))
This diff is collapsed.
******************************
ZODB/ZEO Programming Guide
******************************
:Author: A.M. Kuchling
:Date: |today|
.. |release| replace:: 3.7.0b3
©Copyright 2002 A.M. Kuchling. Permission is granted to copy, distribute
and/or modify this document under the terms of the GNU Free Documentation
License, Version 1.1 or any later version published by the Free Software
Foundation; with no Invariant Sections, no Front-Cover Texts, and no
Back-Cover Texts. A copy of the license is included in the appendix entitled
"GNU Free Documentation License".
Contents
--------
.. toctree::
:maxdepth: 2
introduction.rst
prog-zodb.rst
zeo.rst
transactions.rst
modules.rst
links.rst
gfdl.rst
.. % Indexing Data
.. % BTrees
.. % Full-text indexing
.. % Introduction
.. % What is ZODB?
.. % What is ZEO?
.. % OODBs vs. Relational DBs
.. % Other OODBs
Introduction
============
This guide explains how to write Python programs that use the Z Object Database
(ZODB) and Zope Enterprise Objects (ZEO). The latest version of the guide is
always available at `<http://www.zope.org/Wikis/ZODB/guide/index.html>`_.
What is the ZODB?
-----------------
The ZODB is a persistence system for Python objects. Persistent programming
languages provide facilities that automatically write objects to disk and read
them in again when they're required by a running program. By installing the
ZODB, you add such facilities to Python.
It's certainly possible to build your own system for making Python objects
persistent. The usual starting points are the :mod:`pickle` module, for
converting objects into a string representation, and various database modules,
such as the :mod:`gdbm` or :mod:`bsddb` modules, that provide ways to write
strings to disk and read them back. It's straightforward to combine the
:mod:`pickle` module and a database module to store and retrieve objects, and in
fact the :mod:`shelve` module, included in Python's standard library, does this.
The downside is that the programmer has to explicitly manage objects, reading an
object when it's needed and writing it out to disk when the object is no longer
required. The ZODB manages objects for you, keeping them in a cache, writing
them out to disk when they are modified, and dropping them from the cache if
they haven't been used in a while.
OODBs vs. Relational DBs
------------------------
Another way to look at it is that the ZODB is a Python-specific object-oriented
database (OODB). Commercial object databases for C++ or Java often require that
you jump through some hoops, such as using a special preprocessor or avoiding
certain data types. As we'll see, the ZODB has some hoops of its own to jump
through, but in comparison the naturalness of the ZODB is astonishing.
Relational databases (RDBs) are far more common than OODBs. Relational databases
store information in tables; a table consists of any number of rows, each row
containing several columns of information. (Rows are more formally called
relations, which is where the term "relational database" originates.)
Let's look at a concrete example. The example comes from my day job working for
the MEMS Exchange, in a greatly simplified version. The job is to track process
runs, which are lists of manufacturing steps to be performed in a semiconductor
fab. A run is owned by a particular user, and has a name and assigned ID
number. Runs consist of a number of operations; an operation is a single step
to be performed, such as depositing something on a wafer or etching something
off it.
Operations may have parameters, which are additional information required to
perform an operation. For example, if you're depositing something on a wafer,
you need to know two things: 1) what you're depositing, and 2) how much should
be deposited. You might deposit 100 microns of silicon oxide, or 1 micron of
copper.
Mapping these structures to a relational database is straightforward::
CREATE TABLE runs (
int run_id,
varchar owner,
varchar title,
int acct_num,
primary key(run_id)
);
CREATE TABLE operations (
int run_id,
int step_num,
varchar process_id,
PRIMARY KEY(run_id, step_num),
FOREIGN KEY(run_id) REFERENCES runs(run_id),
);
CREATE TABLE parameters (
int run_id,
int step_num,
varchar param_name,
varchar param_value,
PRIMARY KEY(run_id, step_num, param_name)
FOREIGN KEY(run_id, step_num)
REFERENCES operations(run_id, step_num),
);
In Python, you would write three classes named :class:`Run`, :class:`Operation`,
and :class:`Parameter`. I won't present code for defining these classes, since
that code is uninteresting at this point. Each class would contain a single
method to begin with, an :meth:`__init__` method that assigns default values,
such as 0 or ``None``, to each attribute of the class.
It's not difficult to write Python code that will create a :class:`Run` instance
and populate it with the data from the relational tables; with a little more
effort, you can build a straightforward tool, usually called an object-
relational mapper, to do this automatically. (See
`<http://www.amk.ca/python/unmaintained/ordb.html>`_ for a quick hack at a
Python object-relational mapper, and
`<http://www.python.org/workshops/1997-10/proceedings/shprentz.html>`_ for Joel
Shprentz's more successful implementation of the same idea; Unlike mine,
Shprentz's system has been used for actual work.)
However, it is difficult to make an object-relational mapper reasonably quick; a
simple-minded implementation like mine is quite slow because it has to do
several queries to access all of an object's data. Higher performance object-
relational mappers cache objects to improve performance, only performing SQL
queries when they actually need to.
That helps if you want to access run number 123 all of a sudden. But what if
you want to find all runs where a step has a parameter named 'thickness' with a
value of 2.0? In the relational version, you have two unappealing choices:
#. Write a specialized SQL query for this case: ``SELECT run_id FROM operations
WHERE param_name = 'thickness' AND param_value = 2.0``
If such queries are common, you can end up with lots of specialized queries.
When the database tables get rearranged, all these queries will need to be
modified.
#. An object-relational mapper doesn't help much. Scanning through the runs
means that the the mapper will perform the required SQL queries to read run #1,
and then a simple Python loop can check whether any of its steps have the
parameter you're looking for. Repeat for run #2, 3, and so forth. This does a
vast number of SQL queries, and therefore is incredibly slow.
An object database such as ZODB simply stores internal pointers from object to
object, so reading in a single object is much faster than doing a bunch of SQL
queries and assembling the results. Scanning all runs, therefore, is still
inefficient, but not grossly inefficient.
What is ZEO?
------------
The ZODB comes with a few different classes that implement the :class:`Storage`
interface. Such classes handle the job of writing out Python objects to a
physical storage medium, which can be a disk file (the :class:`FileStorage`
class), a BerkeleyDB file (:class:`BDBFullStorage`), a relational database
(:class:`DCOracleStorage`), or some other medium. ZEO adds
:class:`ClientStorage`, a new :class:`Storage` that doesn't write to physical
media but just forwards all requests across a network to a server. The server,
which is running an instance of the :class:`StorageServer` class, simply acts as
a front-end for some physical :class:`Storage` class. It's a fairly simple
idea, but as we'll see later on in this document, it opens up many
possibilities.
About this guide
----------------
The primary author of this guide works on a project which uses the ZODB and ZEO
as its primary storage technology. We use the ZODB to store process runs and
operations, a catalog of available processes, user information, accounting
information, and other data. Part of the goal of writing this document is to
make our experience more widely available. A few times we've spent hours or
even days trying to figure out a problem, and this guide is an attempt to gather
up the knowledge we've gained so that others don't have to make the same
mistakes we did while learning.
The author's ZODB project is described in a paper available here,
`<http://www.amk.ca/python/writing/mx-architecture/>`_
This document will always be a work in progress. If you wish to suggest
clarifications or additional topics, please send your comments to zodb-
dev@zope.org.
Acknowledgements
----------------
Andrew Kuchling wrote the original version of this guide, which provided some of
the first ZODB documentation for Python programmers. His initial version has
been updated over time by Jeremy Hylton and Tim Peters.
I'd like to thank the people who've pointed out inaccuracies and bugs, offered
suggestions on the text, or proposed new topics that should be covered: Jeff
Bauer, Willem Broekema, Thomas Guettler, Chris McDonough, George Runyan.
.. % links.tex
.. % Collection of relevant links
Resources
=========
Introduction to the Zope Object Database, by Jim Fulton: --- Goes into much
greater detail, explaining advanced uses of the ZODB and how it's actually
implemented. A definitive reference, and highly recommended. ---
`<http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html>`_
Persistent Programing with ZODB, by Jeremy Hylton and Barry Warsaw: --- Slides
for a tutorial presented at the 10th Python conference. Covers much of the same
ground as this guide, with more details in some areas and less in others. ---
`<http://www.zope.org/Members/bwarsaw/ipc10-slides>`_
This diff is collapsed.
This diff is collapsed.
.. % Storages
.. % FileStorage
.. % BerkeleyStorage
.. % OracleStorage
Storages
========
This chapter will examine the different :class:`Storage` subclasses that are
considered stable, discuss their varying characteristics, and explain how to
administer them.
Using Multiple Storages
-----------------------
XXX explain mounting substorages
FileStorage
-----------
BDBFullStorage
--------------
OracleStorage
-------------
.. % Transactions and Versioning
.. % Committing and Aborting
.. % Subtransactions
.. % Undoing
.. % Versions
.. % Multithreaded ZODB Programs
Transactions and Versioning
===========================
Committing and Aborting
-----------------------
Changes made during a transaction don't appear in the database until the
transaction commits. This is done by calling the :meth:`commit` method of the
current :class:`Transaction` object, where the latter is obtained from the
:meth:`get` method of the current transaction manager. If the default thread
transaction manager is being used, then ``transaction.commit()`` suffices.
Similarly, a transaction can be explicitly aborted (all changes within the
transaction thrown away) by invoking the :meth:`abort` method of the current
:class:`Transaction` object, or simply ``transaction.abort()`` if using the
default thread transaction manager.
Prior to ZODB 3.3, if a commit failed (meaning the ``commit()`` call raised an
exception), the transaction was implicitly aborted and a new transaction was
implicitly started. This could be very surprising if the exception was
suppressed, and especially if the failing commit was one in a sequence of
subtransaction commits.
So, starting with ZODB 3.3, if a commit fails, all further attempts to commit,
join, or register with the transaction raise
:exc:`ZODB.POSException.TransactionFailedError`. You must explicitly start a
new transaction then, either by calling the :meth:`abort` method of the current
transaction, or by calling the :meth:`begin` method of the current transaction's
transaction manager.
Subtransactions
---------------
Subtransactions can be created within a transaction. Each subtransaction can be
individually committed and aborted, but the changes within a subtransaction are
not truly committed until the containing transaction is committed.
The primary purpose of subtransactions is to decrease the memory usage of
transactions that touch a very large number of objects. Consider a transaction
during which 200,000 objects are modified. All the objects that are modified in
a single transaction have to remain in memory until the transaction is
committed, because the ZODB can't discard them from the object cache. This can
potentially make the memory usage quite large. With subtransactions, a commit
can be be performed at intervals, say, every 10,000 objects. Those 10,000
objects are then written to permanent storage and can be purged from the cache
to free more space.
To commit a subtransaction instead of a full transaction, pass a true value to
the :meth:`commit` or :meth:`abort` method of the :class:`Transaction` object.
::
# Commit a subtransaction
transaction.commit(True)
# Abort a subtransaction
transaction.abort(True)
A new subtransaction is automatically started upon successful committing or
aborting the previous subtransaction.
Undoing Changes
---------------
Some types of :class:`Storage` support undoing a transaction even after it's
been committed. You can tell if this is the case by calling the
:meth:`supportsUndo` method of the :class:`DB` instance, which returns true if
the underlying storage supports undo. Alternatively you can call the
:meth:`supportsUndo` method on the underlying storage instance.
If a database supports undo, then the :meth:`undoLog(start, end[, func])` method
on the :class:`DB` instance returns the log of past transactions, returning
transactions between the times *start* and *end*, measured in seconds from the
epoch. If present, *func* is a function that acts as a filter on the
transactions to be returned; it's passed a dictionary representing each
transaction, and only transactions for which *func* returns true will be
included in the list of transactions returned to the caller of :meth:`undoLog`.
The dictionary contains keys for various properties of the transaction. The
most important keys are ``id``, for the transaction ID, and ``time``, for the
time at which the transaction was committed. ::
>>> print storage.undoLog(0, sys.maxint)
[{'description': '',
'id': 'AzpGEGqU/0QAAAAAAAAGMA',
'time': 981126744.98,
'user_name': ''},
{'description': '',
'id': 'AzpGC/hUOKoAAAAAAAAFDQ',
'time': 981126478.202,
'user_name': ''}
...
To store a description and a user name on a commit, get the current transaction
and call the :meth:`note(text)` method to store a description, and the
:meth:`setUser(user_name)` method to store the user name. While :meth:`setUser`
overwrites the current user name and replaces it with the new value, the
:meth:`note` method always adds the text to the transaction's description, so it
can be called several times to log several different changes made in the course
of a single transaction. ::
transaction.get().setUser('amk')
transaction.get().note('Change ownership')
To undo a transaction, call the :meth:`DB.undo(id)` method, passing it the ID of
the transaction to undo. If the transaction can't be undone, a
:exc:`ZODB.POSException.UndoError` exception will be raised, with the message
"non-undoable transaction". Usually this will happen because later transactions
modified the objects affected by the transaction you're trying to undo.
After you call :meth:`undo` you must commit the transaction for the undo to
actually be applied. [#]_ There is one glitch in the undo process. The thread
that calls undo may not see the changes to the object until it calls
:meth:`Connection.sync` or commits another transaction.
Versions
--------
.. warning::
Versions should be avoided. They're going to be deprecated, replaced by better
approaches to long-running transactions.
While many subtransactions can be contained within a single regular transaction,
it's also possible to contain many regular transactions within a long-running
transaction, called a version in ZODB terminology. Inside a version, any number
of transactions can be created and committed or rolled back, but the changes
within a version are not made visible to other connections to the same ZODB.
Not all storages support versions, but you can test for versioning ability by
calling :meth:`supportsVersions` method of the :class:`DB` instance, which
returns true if the underlying storage supports versioning.
A version can be selected when creating the :class:`Connection` instance using
the :meth:`DB.open([*version*])` method. The *version* argument must be a string
that will be used as the name of the version. ::
vers_conn = db.open(version='Working version')
Transactions can then be committed and aborted using this versioned connection.
Other connections that don't specify a version, or provide a different version
name, will not see changes committed within the version named ``Working
version``. To commit or abort a version, which will either make the changes
visible to all clients or roll them back, call the :meth:`DB.commitVersion` or
:meth:`DB.abortVersion` methods. XXX what are the source and dest arguments for?
The ZODB makes no attempt to reconcile changes between different versions.
Instead, the first version which modifies an object will gain a lock on that
object. Attempting to modify the object from a different version or from an
unversioned connection will cause a :exc:`ZODB.POSException.VersionLockError` to
be raised::
from ZODB.POSException import VersionLockError
try:
transaction.commit()
except VersionLockError, (obj_id, version):
print ('Cannot commit; object %s '
'locked by version %s' % (obj_id, version))
The exception provides the ID of the locked object, and the name of the version
having a lock on it.
Multithreaded ZODB Programs
---------------------------
ZODB databases can be accessed from multithreaded Python programs. The
:class:`Storage` and :class:`DB` instances can be shared among several threads,
as long as individual :class:`Connection` instances are created for each thread.
.. rubric:: Footnotes
.. [#] There are actually two different ways a storage can implement the undo feature.
Most of the storages that ship with ZODB use the transactional form of undo
described in the main text. Some storages may use a non-transactional undo
makes changes visible immediately.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment