Practically all persistent languages impose some restrictions on
programming style, warning against constructs they can't handle or
adding subtle semantic changes, and the ZODB is no exception.
Happily, the ZODB's restrictions are fairly simple to understand, and
in practice it isn't too painful to work around them.
<P>
The summary of rules is as follows:
<P>
<UL>
<LI>If you modify a mutable object that's the value of an object's
attribute, the ZODB can't catch that, and won't mark the object as
dirty.
The solution is to either set the dirty bit yourself when you modify
mutable objects, or use a wrapper for Python's lists and dictionaries
(<ttclass="class">PersistentList</tt>,
<ttclass="class">PersistentMapping</tt>)
that will set the dirty bit properly.
<P>
</LI>
<LI>Certain of Python's special methods don't work when they're
defined on ExtensionClasses. The most important ones are the
<ttclass="method">__cmp__</tt> method, and the reversed versions of binary
arithmetic operations: <ttclass="method">__radd__</tt>, <ttclass="method">__rsub__</tt>, and so
forth.
<P>
</LI>
<LI>Recent versions of the ZODB allow writing a class with
<ttclass="method">__setattr__</tt> , <ttclass="method">__getattr__</tt>, or <ttclass="method">__delattr__</tt> methods. (Older versions didn't support this at all.)
If you write such a <ttclass="method">__setattr__</tt> or <ttclass="method">__delattr__</tt> method,
<bclass="navlabel">Next:</b><aclass="sectref"HREF="node5.html">1.3 What is ZEO?</A>
<br><hr>
</DIV>
<!--End of Navigation Panel-->
<H2><ANAME="SECTION000220000000000000000">
1.2 OODBs vs. Relational DBs</A>
</H2>
<P>
Another way to look at it is that the ZODB is a Python-specific
object-oriented database (OODB). Commercial object databases for C++
or Java often require that you jump through some hoops, such as using
a special preprocessor or avoiding certain data types. As we'll see,
the ZODB has some hoops of its own to jump through, but in comparison
the naturalness of the ZODB is astonishing.
<P>
Relational databases (RDBs) are far more common than OODBs.
Relational databases store information in tables; a table consists of
any number of rows, each row containing several columns of
information. (Rows are more formally called relations, which is where
the term ``relational database'' originates.)
<P>
Let's look at a concrete example. The example comes from my day job
working for the MEMS Exchange, in a greatly simplified version. The
job is to track process runs, which are lists of manufacturing steps
to be performed in a semiconductor fab. A run is owned by a
particular user, and has a name and assigned ID number. Runs consist
of a number of operations; an operation is a single step to be
performed, such as depositing something on a wafer or etching
something off it.
<P>
Operations may have parameters, which are additional information
required to perform an operation. For example, if you're depositing
something on a wafer, you need to know two things: 1) what you're
depositing, and 2) how much should be deposited. You might deposit
100 microns of silicon oxide, or 1 micron of copper.
<P>
Mapping these structures to a relational database is straightforward:
<P>
<divclass="verbatim"><pre>
CREATE TABLE runs (
int run_id,
varchar owner,
varchar title,
int acct_num,
primary key(run_id)
);
CREATE TABLE operations (
int run_id,
int step_num,
varchar process_id,
PRIMARY KEY(run_id, step_num),
FOREIGN KEY(run_id) REFERENCES runs(run_id),
);
CREATE TABLE parameters (
int run_id,
int step_num,
varchar param_name,
varchar param_value,
PRIMARY KEY(run_id, step_num, param_name)
FOREIGN KEY(run_id, step_num)
REFERENCES operations(run_id, step_num),
);
</pre></div>
<P>
In Python, you would write three classes named <ttclass="class">Run</tt>,
<ttclass="class">Operation</tt>, and <ttclass="class">Parameter</tt>. I won't present code for
defining these classes, since that code is uninteresting at this
point. Each class would contain a single method to begin with, an
<ttclass="method">__init__</tt> method that assigns default values, such as 0 or
<code>None</code>, to each attribute of the class.
<P>
It's not difficult to write Python code that will create a <ttclass="class">Run</tt>
instance and populate it with the data from the relational tables;
with a little more effort, you can build a straightforward tool,
usually called an object-relational mapper, to do this automatically.
(See
<aclass="url"href="http://www.amk.ca/python/unmaintained/ordb.html">http://www.amk.ca/python/unmaintained/ordb.html</a> for a quick hack
at a Python object-relational mapper, and
<aclass="url"href="http://www.python.org/workshops/1997-10/proceedings/shprentz.html">http://www.python.org/workshops/1997-10/proceedings/shprentz.html</a>for Joel Shprentz's more successful implementation of the same idea;
Unlike mine, Shprentz's system has been used for actual work.)
<P>
However, it is difficult to make an object-relational mapper
reasonably quick; a simple-minded implementation like mine is quite
slow because it has to do several queries to access all of an object's
data. Higher performance object-relational mappers cache objects to
improve performance, only performing SQL queries when they actually
need to.
<P>
That helps if you want to access run number 123 all of a sudden. But
what if you want to find all runs where a step has a parameter named
'thickness' with a value of 2.0? In the relational version, you have
two unappealing choices:
<P>
<OL>
<LI>Write a specialized SQL query for this case: <code>SELECT run_id
FROM operations WHERE param_name = 'thickness' AND param_value = 2.0</code>
<P>
If such queries are common, you can end up with lots of specialized
queries. When the database tables get rearranged, all these queries
will need to be modified.
<P>
</LI>
<LI>An object-relational mapper doesn't help much. Scanning
through the runs means that the the mapper will perform the required
SQL queries to read run #1, and then a simple Python loop can check
whether any of its steps have the parameter you're looking for.
Repeat for run #2, 3, and so forth. This does a vast
number of SQL queries, and therefore is incredibly slow.
<P>
</LI>
</OL>
<P>
An object database such as ZODB simply stores internal pointers from
object to object, so reading in a single object is much faster than
doing a bunch of SQL queries and assembling the results. Scanning all
runs, therefore, is still inefficient, but not grossly inefficient.