Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Z
ZODB
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Nicolas Wavrant
ZODB
Commits
287a7d2c
Commit
287a7d2c
authored
Sep 17, 2016
by
Jim Fulton
Committed by
GitHub
Sep 17, 2016
Browse files
Options
Browse Files
Download
Plain Diff
Merge pull request #118 from NextThought/fix-106
Add a section on the pitfalls of __eq__/__hash__. Fixes #106.
parents
3674c507
3e7259a7
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
144 additions
and
12 deletions
+144
-12
doc/guide/writing-persistent-objects.rst
doc/guide/writing-persistent-objects.rst
+144
-12
No files found.
doc/guide/writing-persistent-objects.rst
View file @
287a7d2c
==========================
Writing
persistent
objects
==========================
==========================
==
Writing
persistent
objects
==========================
==
In
the
:
ref
:`
Tutorial
<
tutorial
-
label
>`,
we
discussed
the
basics
of
implementing
persistent
objects
by
subclassing
...
...
@@ -11,7 +11,7 @@ of writing persistent classes you should be aware of.
Access
and
modification
=======================
Two
of
the
main
jobs
of
the
``
Persistent
``
base
class
is
to
detect
Two
of
the
main
jobs
of
the
``
Persistent
``
base
class
are
to
detect
when
an
object
has
been
accessed
and
when
it
has
been
modified
.
When
an
object
is
accessed
,
its
state
may
need
to
be
loaded
from
the
database
.
When
an
object
is
modified
,
the
modification
needs
to
be
...
...
@@ -215,6 +215,15 @@ Here's a version of the example that uses a ``TreeSet``::
True
>>> db.close()
If you'
re
going
to
use
custom
classes
as
keys
in
a
``
BTree
``
or
entries
in
a
``
TreeSet
``,
they
must
provide
a
`
total
ordering
<
https
://
pythonhosted
.
org
/
BTrees
/#
total
-
ordering
-
and
-
persistence
>`
_
.
The
builtin
python
`
str
`
class
is
always
safe
to
use
as
BTree
key
.
You
can
use
`
zope
.
keyreference
<
https
://
pypi
.
python
.
org
/
pypi
/
zope
.
keyreference
>`
_
to
treat
arbitrary
persistent
objects
as
totally
orderable
based
on
their
persistent
object
identity
.
Scalable
sequences
are
a
bit
more
challenging
.
The
`
zc
.
blist
<
https
://
pypi
.
python
.
org
/
pypi
/
zc
.
blist
/>`
_
package
provides
a
scalable
list
implementation
that
works
well
for
some
sequence
use
cases
.
...
...
@@ -503,14 +512,136 @@ ghost:
>>> book._p_changed, bool(book._p_oid)
(None, True)
Other
things
you
can
do
,
but
shouldn
't (advanced)
=================================================
Things you can do, but need to carefully consider (advanced)
============================================================
While you can do anything with a persistent subclass that you can with
a normal subclass, certain things have additional implications for
persistent objects. These often show up as performance issues, or the
result may become hard to maintain.
Implement ``__eq__`` and ``__hash__``
-------------------------------------
When you store an entry into a Python ``dict`` (or the persistent
variant ``PersistentMapping``, or a ``set`` or ``frozenset``), the
key'
s
``
__eq__
``
and
``
__hash__
``
methods
are
used
to
determine
where
to
store
the
value
.
Later
they
are
used
to
look
it
up
via
``
in
``
or
``
__getitem__
``.
When
that
``
dict
``
is
later
loaded
from
the
database
,
the
internal
storage
is
rebuild
from
scratch
.
This
means
that
every
key
has
its
``
__hash__
``
method
called
at
least
once
,
and
may
have
its
``
__eq__
``
method
called
many
times
.
By
default
,
every
object
,
including
persistent
objects
,
inherits
an
implementation
of
``
__eq__
``
and
``
__hash__
``
from
:
class
:`
object
`.
These
default
implementations
are
based
on
the
object
's *identity*,
that is, its unique identifier within the current Python process.
Calling, them, therefore is very fast, even on :ref:`ghosts
<ghost-label>`, and doesn'
t
cause
a
ghost
to
load
its
state
.
If
you
override
``
__eq__
``
and
``
__hash__
``
in
a
custom
persistent
subclass
,
however
,
when
you
use
instances
of
that
class
as
a
key
in
a
``
dict
``,
then
the
instance
will
have
to
be
unghosted
before
it
can
be
put
in
the
dictionary
.
If
you
're building a large dictionary
with many such keys that are ghosts, you may find that loading all the
object states takes a considerable amount of time. If you were to
store that dictionary in the database and load it later, *all* the
keys will have to be unghosted at the same time before the dictionary
can be accessed, again, possibly taking a long time.
For example, a class that defines ``__eq__`` and ``__hash__`` like this::
class BookEq(persistent.Persistent):
def __init__(self, title):
self.title = title
self.authors = ()
def add_author(self, author):
self.authors += (author, )
def __eq__(self, other):
return self.title == other.title and self.authors == other.authors
def __hash__(self):
return hash((self.title, self.authors))
.. -> src
>>> exec(src)
is going to be much slower to use as a key in a persistent dictionary,
or in a new dictionary when the key is a ghost, than the class that
inherits identity-based ``__eq__`` and ``__hash__``.
.. Example of the above.
Here'
s
what
that
class
would
look
like
::
>>>
class
Book
(
persistent
.
Persistent
):
...
def
__init__
(
self
,
title
):
...
self
.
title
=
title
...
self
.
authors
=
()
...
...
def
add_author
(
self
,
author
):
...
self
.
authors
+=
(
author
,
)
The first rule here is don'
t
be
clever
!!! It's very tempting to be
clever
,
but
it
's almost never worth it.
Lets
see
an
example
of
how
these
classes
behave
when
stored
in
a
dictionary
.
First
,
lets
store
some
dictionaries
::
Overriding ``__getstate__`` and ``__setstate__``
------------------------------------------------
>>>
import
ZODB
>>>
db
=
ZODB
.
DB
(
None
)
>>>
conn1
=
db
.
open
()
>>>
conn1
.
root
.
with_hashes
=
{
BookEq
(
str
(
i
))
for
i
in
range
(
5000
)}
>>>
conn1
.
root
.
with_ident
=
{
Book
(
str
(
i
))
for
i
in
range
(
5000
)}
>>>
transaction
.
commit
()
Now
,
in
a
new
connection
(
so
we
don
't have any objects cached), lets
load the dictionaries::
>>> conn2 = db.open()
>>> all((book._p_status == '
ghost
' for book in conn2.root.with_ident))
True
>>> all((book._p_status == '
ghost
' for book in conn2.root.with_hashes))
False
We can see that all the objects that did have a custom ``__eq__``
and ``__hash__`` were loaded into memory, while those that did weren'
t
.
There
are
some
alternatives
:
-
Avoiding
the
use
of
persistent
objects
as
keys
in
dictionaries
or
entries
in
sets
sidesteps
the
issue
.
-
If
your
application
can
tolerate
identity
based
comparisons
,
simply
don
't implement the two methods. This means that objects will be
compared only by identity, but because persistent objects are
persistent, the same object will have the same identity in each
connection, so that often works out.
It is safe to remove ``__eq__`` and ``__hash__`` methods from a
class even if you already have dictionaries in a database using
instances of those classes as keys.
- Make your classes `orderable
<https://pythonhosted.org/BTrees/#total-ordering-and-persistence>`_
and use them as keys in a BTree or entries in a TreeSet instead of a
dictionary or set. Even though your custom comparison methods will
have to unghost the objects, the nature of a BTree means that only a
small number of objects will have to be loaded in most cases.
- Any persistent object can be wrapped in a ``zope.keyreferenece`` to
make it orderable and hashable based on persistent identity. This
can be an alternative for some dictionaries if you can'
t
alter
the
class
definition
but
can
accept
identity
comparisons
in
some
dictionaries
or
sets
.
You
must
remember
to
wrap
all
keys
,
though
.
Implement
``
__getstate__
``
and
``
__setstate__
``
-----------------------------------------------
When
an
object
is
saved
in
a
database
,
its
``
__getstate__
``
method
is
called
without
arguments
to
get
the
object
's state. The default
...
...
@@ -528,14 +659,15 @@ tasks like providing more efficient state representations or for
the
result
was
to
make
object
implementations
brittle
and
/
or
complex
and
the
benefit
usually
wasn
't worth it.
Overriding
``
__getattr__
``,
``
__getattribute__
``,
or
``
__setattribute__
``
------------------------------------------------------------------------
-
Implement
``__getattr__``, ``__getattribute__``, or ``__setattribute__``
------------------------------------------------------------------------
This is something extremely clever people might attempt, but it'
s
probably
never
worth
the
bother
.
It
's possible, but it requires such
deep understanding of persistence and internals that we'
re
not
even
going
to
document
it
.
:)
Links
=====
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment