Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Z
ZODB
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Nicolas Wavrant
ZODB
Commits
bce996ca
Commit
bce996ca
authored
Sep 14, 2016
by
Jason Madden
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add a section on the pitfalls of __eq__/__hash__. Fixes #106.
parent
09010439
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
129 additions
and
4 deletions
+129
-4
doc/guide/writing-persistent-objects.rst
doc/guide/writing-persistent-objects.rst
+129
-4
No files found.
doc/guide/writing-persistent-objects.rst
View file @
bce996ca
==========================
Writing
persistent
objects
==========================
==========================
==
Writing
persistent
objects
==========================
==
In
the
:
ref
:`
Tutorial
<
tutorial
-
label
>`,
we
discussed
the
basics
of
implementing
persistent
objects
by
subclassing
...
...
@@ -11,7 +11,7 @@ of writing persistent classes you should be aware of.
Access
and
modification
=======================
Two
of
the
main
jobs
of
the
``
Persistent
``
base
class
is
to
detect
Two
of
the
main
jobs
of
the
``
Persistent
``
base
class
are
to
detect
when
an
object
has
been
accessed
and
when
it
has
been
modified
.
When
an
object
is
accessed
,
its
state
may
need
to
be
loaded
from
the
database
.
When
an
object
is
modified
,
the
modification
needs
to
be
...
...
@@ -215,6 +215,15 @@ Here's a version of the example that uses a ``TreeSet``::
True
>>> db.close()
If you'
re
going
to
use
custom
classes
as
keys
in
a
``
BTree
``
or
entries
in
a
``
TreeSet
``,
they
must
provide
a
`
total
ordering
<
https
://
pythonhosted
.
org
/
BTrees
/#
total
-
ordering
-
and
-
persistence
>`
_
.
The
builtin
python
`
str
`
class
is
always
safe
to
use
as
BTree
key
.
You
can
use
`
zope
.
keyreference
<
https
://
pypi
.
python
.
org
/
pypi
/
zope
.
keyreference
>`
_
to
treat
arbitrary
persistent
objects
as
totally
orderable
based
on
their
persistent
object
identity
.
Scalable
sequences
are
a
bit
more
challenging
.
The
`
zc
.
blist
<
https
://
pypi
.
python
.
org
/
pypi
/
zc
.
blist
/>`
_
package
provides
a
scalable
list
implementation
that
works
well
for
some
sequence
use
cases
.
...
...
@@ -503,6 +512,121 @@ ghost:
>>> book._p_changed, bool(book._p_oid)
(None, True)
Things you can do, but need to carefully consider (advanced)
============================================================
While you can do anything with a persistent subclass that you can with
a normal subclass, certain things have additional implications for
persistent objects. These often show up as performance issues.
Implement ``__eq__`` and ``__hash__``
-------------------------------------
When you store an entry into a Python ``dict`` (or the persistent
variant ``PersistentMapping``) the key'
s
``
__eq__
``
and
``
__hash__
``
methods
are
used
to
determine
where
to
store
the
value
.
Later
they
are
used
to
look
it
up
via
``
in
``
or
``
__getitem__
``.
When
that
``
dict
``
is
later
loaded
from
the
database
,
the
internal
storage
is
rebuild
from
scratch
.
This
means
that
every
key
has
its
``
__hash__
``
method
called
at
least
once
,
and
may
have
its
``
__eq__
``
method
called
many
times
.
By
default
,
every
object
,
including
persistent
objects
,
inherits
an
implementation
of
``
__eq__
``
and
``
__hash__
``
from
:
class
:`
object
`.
These
default
implementations
are
based
on
the
object
's *identity*,
that is, its unique identifier within the current Python process.
Calling, them, therefore is very fast, even on ghosts, and doesn'
t
cause
a
ghost
to
load
its
state
.
If
you
override
``
__eq__
``
and
``
__hash__
``
in
a
custom
persistent
subclass
,
however
,
when
you
use
that
instances
of
that
class
as
a
key
in
a
``
dict
``,
then
the
instance
will
have
to
be
a
unghosted
before
it
can
be
put
in
the
dictionary
.
If
you
're building a large dictionary
with many such keys that are ghosts, you may find that loading all the
object states takes a considerable amount of time. If you were to
store that dictionary in the database and load it later, *all* the
keys will have to be unghosted at the same time before the dictionary
can be accessed, again, possibly taking a long time.
For example, a class that defines ``__eq__`` and ``__hash__`` like this::
class BookEq(persistent.Persistent):
def __init__(self, title):
self.title = title
self.authors = ()
def add_author(self, author):
self.authors += (author, )
def __eq__(self, other):
return self.title == other.title and self.authors == other.authors
def __hash__(self):
return hash((self.title, self.authors))
.. -> src
>>> exec(src)
is going to be much slower to use as a key in a persistent dictionary,
or in a new dictionary when the key is a ghost, than the class that
inherits identity-based ``__eq__`` and ``__hash__``::
class Book(persistent.Persistent):
def __init__(self, title):
self.title = title
self.authors = ()
def add_author(self, author):
self.authors += (author, )
Lets see an example of how these classes behave when stored in a
dictionary. First, lets store some dictionaries::
>>> import ZODB
>>> db = ZODB.DB(None)
>>> conn1 = db.open()
>>> conn1.root.with_hashes = {BookEq(str(i)) for i in range(5000)}
>>> conn1.root.with_ident = {Book(str(i)) for i in range(5000)}
>>> transaction.commit()
Now, in a new connection (so we don'
t
have
any
objects
cached
),
lets
load
the
dictionaries
::
>>>
conn2
=
db
.
open
()
>>>
all
((
book
.
_p_status
==
'ghost'
for
book
in
conn2
.
root
.
with_ident
))
True
>>>
all
((
book
.
_p_status
==
'ghost'
for
book
in
conn2
.
root
.
with_hashes
))
False
We
can
see
that
all
the
objects
that
did
have
a
custom
``
__eq__
``
and
``
__hash__
``
were
loaded
into
memory
,
while
those
that
did
weren
't.
There are two possible solutions:
- If your application can tolerate identity based comparisons, simply
don'
t
implement
the
two
methods
.
This
means
that
objects
will
be
compared
only
by
identity
,
but
because
persistent
objects
are
persistent
,
the
same
object
will
have
the
same
identity
in
each
connection
,
so
that
often
works
out
.
It
is
safe
to
remove
``
__eq__
``
and
``
__hash__
``
methods
from
a
class
even
if
you
already
have
dictionaries
in
a
database
using
instances
of
those
classes
as
keys
.
-
Make
your
classes
`
orderable
<
https
://
pythonhosted
.
org
/
BTrees
/#
total
-
ordering
-
and
-
persistence
>`
_
and
use
them
as
keys
in
a
BTree
instead
of
a
dictionary
.
Even
though
your
custom
comparison
methods
will
have
to
unghost
the
objects
,
the
nature
of
a
BTree
means
that
only
a
small
number
of
objects
will
have
to
be
loaded
in
most
cases
.
Other
things
you
can
do
,
but
shouldn
't (advanced)
=================================================
...
...
@@ -536,6 +660,7 @@ probably never worth the bother. It's possible, but it requires such
deep
understanding
of
persistence
and
internals
that
we
're not even
going to document it. :)
Links
=====
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment