1. 23 Apr, 2021 4 commits
  2. 21 Apr, 2021 3 commits
    • Kirill Smelkov's avatar
    • Kirill Smelkov's avatar
      tests: Add test for load vs external invalidation race · e923c9a8
      Kirill Smelkov authored
      For ZEO this data corruption bug was reported at
      https://github.com/zopefoundation/ZEO/issues/155 and fixed at
      https://github.com/zopefoundation/ZEO/pull/169.
      
      Without that fix the failure shows e.g. as follows when running ZEO test
      suite:
      
          Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
          Traceback (most recent call last):
            File "/usr/lib/python2.7/unittest/case.py", line 329, in run
              testMethod()
            File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 621, in check_race_load_vs_external_invalidate
              self.fail([_ for _ in failure if _])
            File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
              raise self.failureException(msg)
          AssertionError: ['T1: obj1.value (7)  !=  obj2.value (8)']
      
      Even if added test is somewhat similar to
      check_race_loadopen_vs_local_invalidate, it is added anew without trying
      to unify code. The reason here is that the probability to catch load vs
      external invalidation race is significantly reduced when there are only
      1 modify and 1 verify workers. The unification with preserving both
      tests semantic would make test for "load vs local invalidate" harder to
      follow. Sometimes a little copying is better than trying to unify too
      much.
      
      For the test to work, test infrastructure is amended with
      ._new_storage_client() method that complements ._storage attribute:
      client-server storages like ZEO, NEO and RelStorage allow several
      storage clients to be connected to single storage server. For
      client-server storages test subclasses should implement
      _new_storage_client to return new storage client that is connected to
      the same storage server self._storage is connected to.
      
      For ZEO ._new_storage_client() is added by https://github.com/zopefoundation/ZEO/pull/170
      
      Other client-server storages can follow to implement ._new_storage_client()
      and this way automatically activate this "load vs external invalidation"
      test when their testsuite is run.
      
      Contrary to test for "load vs local invalidate" N is set to lower value (100),
      because with 8 workers the bug is usually reproduced at not-so-high iteration
      number (5-10-20).
      
      /cc @d-maurer, @jamadden, @jmuchemb
      /reviewed-on https://github.com/zopefoundation/ZODB/pull/345
      e923c9a8
    • Kirill Smelkov's avatar
      tests: Add test for open vs invalidation race · 5b4dd5f7
      Kirill Smelkov authored
      Add test that exercises open vs invalidation race condition that, if
      happen, leads to data corruption. We are seeing such race happening on
      storage level in ZEO (https://github.com/zopefoundation/ZEO/issues/166),
      and previously we've seen it also to happen on Connection level
      (https://github.com/zopefoundation/ZODB/issues/290). By adding this test
      to be exercised wrt all storages we make sure that all storages stay
      free from this race.
      
      And it payed out. Besides catching original problems from
      https://github.com/zopefoundation/ZODB/issues/290 and
      https://github.com/zopefoundation/ZEO/issues/166 , this test also
      discovered a concurrency bug in MVCCMappingStorage:
      
          Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests)
          Traceback (most recent call last):
            File "/usr/lib/python2.7/unittest/case.py", line 329, in run
              testMethod()
            File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate
              self.fail(failure[0])
            File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
              raise self.failureException(msg)
          AssertionError: T1: obj1.value (24)  !=  obj2.value (23)
      
      The problem with MVCCMappingStorage was that instance.poll_invalidations
      was correctly taking main_lock with intention to make sure main data is
      not mutated during analysis, but instance.tpc_finish and
      instance.tpc_abort did _not_ taken main lock, which was leading to
      committed data to be propagating into main storage in non-atomic way.
      
      This bug was also observable if both obj1 and obj2 in the added test
      were always loaded from the storage (added obj2._p_invalidate after
      obj1._p_invalidate).
      
      -> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage
      instance when processing transaction completion.
      
      /cc @d-maurer, @jamadden, @jmuchemb
      /reviewed-on https://github.com/zopefoundation/ZODB/pull/345
      5b4dd5f7
  3. 20 Apr, 2021 5 commits
  4. 01 Apr, 2021 1 commit
    • Claudius Ellsel's avatar
      Update README.rst · dad77801
      Claudius Ellsel authored
      Fix tiny issue with a remaining colon that was probably not deleted during updates of the README.
      dad77801
  5. 31 Mar, 2021 1 commit
  6. 29 Mar, 2021 2 commits
    • Kirill Smelkov's avatar
      changes: Correct link to UnboundLocalError fsoids.py fix · 2798502e
      Kirill Smelkov authored
      Commit fc4c86e6 (Fix unbound local error when using the fsoids.py script
      (#295)) wanted to refer to "issue 285", but put it as "issue 268" into
      visible text.
      2798502e
    • Kirill Smelkov's avatar
      fsrefs: Optimize IO (take 2) (#340) · 79078049
      Kirill Smelkov authored
      * fsrefs: Optimize IO  (take 2)
      
      Access objects in the order of their position in file instead of in the order
      of their OID. This should give dramatical speedup when data are on HDD.
      
      For example @perrinjerome reports that on a 73Go database it takes
      almost 8h to run fsrefs (where on the same database, fstest takes 15
      minutes) [1,2]. After the patch fsrefs took ~80 minutes to run on the same
      database. In other words this is ~ 6x improvement.
      
      Fsrefs has no tests. I tested it only lightly via generating a bit
      corrupt database with deleted referred object(*), and it gives the same
      output as unmodified fsrefs.
      
          oid 0x0 __main__.Object
          last updated: 1979-01-03 21:00:42.900001, tid=0x285cbacb70a3db3
          refers to invalid objects:
                  oid 0x07 missing: '<unknown>'
                  oid 0x07 object creation was undone: '<unknown>'
      
      This "take 2" version is derived from https://github.com/zopefoundation/ZODB/pull/338
      and only iterates objects in the order of their in-file position without
      building complete references graph in-RAM, because that in-RAM graph would
      consume ~12GB of memory.
      
      Added pos2oid in-RAM index also consumes memory: for the 73GB database in
      question fs._index takes ~700MB, while pos2oid takes ~2GB. In theory it could be less,
      because we need only array of oid sorted by key(oid)=fs._index[oid]. However
      array.array does not support sorting, and if we use plain list to keep just
      []oid, the memory consumption just for that list is ~5GB. Also because
      list.sort(key=...) internally allocates memory for key array (and
      list.sort(cmp=...) was removed from Python3), total memory consumption just to
      produce list of []oid ordered by pos is ~10GB.
      So without delving into C/Cython and/or manually sorting the array in Python (=
      slow), using QQBTree seems to be the best out-of-the-box option for oid-by-pos index.
      
      [1] nexedi/zodbtools!19 (comment 129480)
      [2] nexedi/zodbtools!19 (comment 129551)
      
      (*) test database generated via a bit modified gen_testdata.py from
      zodbtools:
      
      https://lab.nexedi.com/nexedi/zodbtools/blob/v0.0.0.dev8-28-g129afa6/zodbtools/test/gen_testdata.py
      
      +
      
      ```diff
      --- a/zodbtools/test/gen_testdata.py
      +++ b/zodbtools/test/gen_testdata.py
      @@ -229,7 +229,7 @@ def ext(subj): return {}
               # delete an object
               name = random.choice(list(root.keys()))
               obj = root[name]
      -        root[name] = Object("%s%i*" % (name, i))
      +#       root[name] = Object("%s%i*" % (name, i))
               # NOTE user/ext are kept empty on purpose - to also test this case
               commit(u"", u"predelete %s" % unpack64(obj._p_oid), {})
      ```
      
      /cc @tim-one, @jeremyhylton, @jamadden
      /reviewed-by @jamadden, @perrinjerome 
      /reviewed-on https://github.com/zopefoundation/ZODB/pull/340
      79078049
  7. 19 Feb, 2021 1 commit
  8. 28 Oct, 2020 2 commits
  9. 23 Sep, 2020 1 commit
  10. 04 Sep, 2020 2 commits
  11. 31 Aug, 2020 2 commits
    • Kirill Smelkov's avatar
      interface: Require invalidations to be called with full set of objects and not to skip transactions · c1e08052
      Kirill Smelkov authored
      Currently invalidate documentation is not clear whether it should be
      called for every transaction and whether it should include full set of
      objects created/modified by that transaction. Until now this was working
      relatively well for the sole purpose of invalidating client ZEO cache,
      because for that particular task it is relatively OK not to include just
      created objects into invalidation messages, and even to completely skip
      sending invalidation if transaction only create - not modify - objects.
      Due to this fact the workings of the client cache was indifferent to the
      ambiguity of the interface.
      
      In 2016 skipping transactions with only created objects was reconsidered
      as bug and fixed in ZEO5 because ZODB5 relies more heavily on MVCC
      semantic and needs to be notified about every transaction committed to
      storage to be able to properly update ZODB.Connection view:
      
      https://github.com/zopefoundation/ZEO/commit/02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
      https://github.com/zopefoundation/ZEO/commit/9613f09b
      
      However just-created objects were not included into invalidation
      messages until, hopefully, recently:
      
      https://github.com/zopefoundation/ZEO/pull/160
      
      As ZODB is started to be used more widely in areas where it was not
      traditionally used before, the ambiguity in invalidate interface and the
      lack of guarantees - for any storage - to be notified with full set of
      information, creates at least the following problems:
      
      - a ZODB client (not necessarily native ZODB/py client) can maintain
        raw cache for the storage. If such client tries to load an oid at
        database view when that object did not existed yet, gets "no object"
        reply and stores that information into raw cache, to properly invalidate
        the cache it needs an invalidation message from ZODB server that
        *includes* created object.
      
      - tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
        if not all objects modified/created by a transaction are included into
        invalidation messages.
      
      - similarly to `zodb watch`, a monitoring tool, that would want to be
        notified of all created/modified objects, won't see full
        database-change picture, and so won't work properly without knowing
        which objects were created.
      
      - wendelin.core 2 - which builds data from ZODB BTrees and data objects
        into virtual filesystem - needs to get invalidation messages with both
        modified and created objects to properly implement its own lazy
        invalidation and isolation protocol for file blocks in OS cache: when
        a block of file is accessed, all clients, that have this block mmaped,
        need to be notified and asked to remmap that block into particular
        revision of the file depending on a client's view of the filesystem and
        database [4,5].
      
        To compute to where a client needs to remmap the block, WCFS server
        (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
        to see whether client's view of the filesystem is before object creation
        (and then ask that client to pin that block to hole), or after creation
        (and then ask the client to pin that block to corresponding revision).
      
        This computation needs ZODB server to send invalidation messages in
        full: with both modified and just created objects.
      
      Also:
      
      - the property that all objects - both modified and just created -
        are included into invalidation messages is required and can help to
        remove `next_serial` from `loadBefore` return in the future.
        This, in turn, can help to do 2x less SQL queries in loadBefore for
        NEO and RelStorage (and maybe other storages too):
        https://github.com/zopefoundation/ZODB/issues/318#issuecomment-657685745
      
      Current state of storages with respect to new requirements:
      
      - ZEO: does not skip transactions, but includes only modified - not
        created - objects. This is fixed by https://github.com/zopefoundation/ZEO/pull/160
      
      - NEO: already implements the requirements in full
      
      - RelStorage: already implements the requirements in full, if I
        understand correctly:
      
        https://github.com/zodb/relstorage/blob/3.1.2-1-gaf57d6c/src/relstorage/adapters/poller.py#L28-L145
      
      While editing invalidate documentation, use the occasion to document
      recently added property that invalidate(tid) is always called before
      storage starts to report its lastTransaction() ≥ tid - see 4a6b0283
      (mvccadapter: check if the last TID changed without invalidation).
      
      /cc @jimfulton, @jamadden, @jmuchemb, @vpelletier, @arnaud-fontaine, @gidzit, @klawlf82, @jwolf083
      /reviewed-on https://github.com/zopefoundation/ZODB/pull/319
      /reviewed-by @dataflake
      /reviewed-by @jmuchemb
      
      [1] https://lab.nexedi.com/kirr/neo/blob/049cb9a0/go/zodb/zodbtools/watch.go
      [2] neo@e0d59f5d
      [3] neo@c41c2907
      
      [4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
      [5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
      c1e08052
    • Jérome Perrin's avatar
      Merge pull request #327 from perrinjerome/doc-sphinx-python3 · b4f233cd
      Jérome Perrin authored
      Fix requirements for sphinx on python2
      b4f233cd
  12. 26 Aug, 2020 2 commits
  13. 19 Aug, 2020 1 commit
    • Julien Muchembled's avatar
      Relax assertion in check_tid_ordering_w_commit test · 4aa62185
      Julien Muchembled authored
      It is pointless for lastTransaction() to block until it is allowed to
      return the TID of a transaction that has just been committed, because
      it may still not be the real last TID (e.g. for some storage
      implementations, invalidations are received from a shared server
      via the network). While invalidations are still being processed,
      it's fine to return immediately with the previous last TID.
      
      This was clarified in commit 4a6b0283
      ("mvccadapter: check if the last TID changed without invalidation").
      
      See pull request #316
      4aa62185
  14. 31 Jul, 2020 1 commit
    • Kirill Smelkov's avatar
      Kill leftovers of pre-MVCC read conflicts · 3a493b01
      Kirill Smelkov authored
      In the early days, before MVCC was introduced, ZODB used to raise
      ReadConflictError on access to object that was simultaneously changed by
      another client in concurrent transaction. However, as
      doc/articles/ZODB-overview.rst says
      
      	Since Zope 2.8 ZODB has implemented **Multi Version Concurrency Control**.
      	This means no more ReadConflictErrors, each transaction is guaranteed to be
      	able to load any object as it was when the transaction begun.
      
      So today the only way to get a ReadConflictError should be
      
        1) at commit time for an object that was requested to stay unchanged
           via checkCurrentSerialInTransaction, and
      
        2) at plain access time, if a pack running simultaneously to current
           transaction, removes object revision that we try to load.
      
      The second point is a bit unfortunate, since when load discovers that
      object was deleted or not yet created, it is logically more clean to
      raise POSKeyError. However due to backward compatibility we still want
      to raise ReadConflictError in this case - please see comments added to
      MVCCAdapter for details.
      
      Anyway, let's remove leftovers of handling regular read-conflicts from
      pre-MVCC era:
      
      Adjust docstring of ReadConflictError to explicitly describe that this
      error can only happen at commit time for objects requested to be
      current, or at plain access if pack is running simultaneously under
      connection foot.
      
      There were also leftover code, comment and test bits in Connection,
      interfaces, testmvcc and testZODB, that are corrected/removed
      correspondingly. testZODB actually had ReadConflictTests that was
      completely deactivated: commit b0f992fd ("Removed the mvcc option..."; 2007)
      moved read-conflict-on-access related tests out of ZODBTests, but did not
      activated moved parts at all, because as that commit says when MVCC is
      always on unconditionally, there is no on-access conflicts:
      
          Removed the mvcc option.  Everybody wants mvcc and removing us lets us
          simplify the code a little. (We'll be able to simplify more when we
          stop supporting versions.)
      
      Today, if I try to manually activate that ReadConflictTests via
      
          @@ -637,6 +637,7 @@ def __init__(self, poisonedjar):
           def test_suite():
               return unittest.TestSuite((
                   unittest.makeSuite(ZODBTests, 'check'),
          +        unittest.makeSuite(ReadConflictTests, 'check'),
                   ))
      
           if __name__ == "__main__":
      
      it fails in dumb way showing that this tests were unmaintained for ages:
      
          Error in test checkReadConflict (ZODB.tests.testZODB.ReadConflictTests)
          Traceback (most recent call last):
            File "/usr/lib/python2.7/unittest/case.py", line 320, in run
              self.setUp()
            File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/testZODB.py", line 451, in setUp
              ZODB.tests.utils.TestCase.setUp(self)
          AttributeError: 'module' object has no attribute 'utils'
      
      Since today ZODB always uses MVCC and there is no way to get
      ReadConflictError on concurrent plain read/write access, those tests
      should be also gone together with old pre-MVCC way of handling
      concurrency.
      
      /cc @jimfulton
      /reviewed-on https://github.com/zopefoundation/ZODB/pull/320
      /reviewed-by @jamadden
      3a493b01
  15. 12 Jun, 2020 2 commits
  16. 11 Jun, 2020 2 commits
  17. 10 Jun, 2020 1 commit
  18. 09 Jun, 2020 1 commit
    • Julien Muchembled's avatar
      mvccadapter: check if the last TID changed without invalidation · 4a6b0283
      Julien Muchembled authored
      Since commit b5895a5c ("mvccadapter:
      fix race with invalidations when starting a new transaction"),
      a ZEO test fails as follows:
      
          File "src/ZEO/tests/drop_cache_rather_than_verify.txt", line 114, in drop_cache_rather_than_verify.txt
          Failed example:
              conn.root()[1].x
          Expected:
              6
          Got:
              1
      
      Earlier in the test, the ZEO server is restarted and then another
      client commits. When disconnected, the first client does not receive
      invalidations anymore and the connection gets stuck in the past until
      there's a new commit after it reconnected. It was possible to make the
      test pass with the following patch:
      
      --- a/src/ZEO/ClientStorage.py
      +++ b/src/ZEO/ClientStorage.py
      @@ -357,6 +357,7 @@ def notify_connected(self, conn, info):
      
               # invalidate our db cache
               if self._db is not None:
      +            self._db.invalidate(self.lastTransaction(), ())
                   self._db.invalidateCache()
      
               logger.info("%s %s to storage: %s",
      
      Other implementations like NEO are probably affected the same way.
      
      Rather than changing interfaces in a backward-incompatible way,
      this commit revert to the original behaviour, and all the changes
      that were done in existing tests are reverted.
      
      However, the interfaces are clarified about the fact that storage
      implementations must update at a precise moment the value that is
      returned by lastTransaction(): just after invalidate() or
      tpc_finish callback.
      4a6b0283
  19. 02 Jun, 2020 1 commit
  20. 20 May, 2020 4 commits
  21. 31 Mar, 2020 1 commit
    • Kirill Smelkov's avatar
      FileStorage: Save committed transaction to disk even if changed data is empty · bb9bf539
      Kirill Smelkov authored
      ZODB tries to avoid saving empty transactions to storage on
      `transaction.commit()`. The way it works is: if no objects were changed
      during ongoing transaction, ZODB.Connection does not join current
      TransactionManager, and transaction.commit() performs two-phase commit
      protocol only on joined DataManagers. In other words if no objects were
      changed, no tpc_*() methods are called at all on ZODB.Connection at
      transaction.commit() time.
      
      This way application servers like Zope/ZServer/ERP5/... can have
      something as
      
          try:
              # process incoming request
              transaction.commit()    # processed ok
          except:
              transaction.abort()
              # problem: log + reraise
      
      in top-level code to process requests without creating many on-disk
      transactions with empty data changes just because read-only requests
      were served.
      
      Everything is working as intended.
      
      However at storage level, FileStorage currently also checks whether
      transaction that is being committed also comes with empty data changes,
      and _skips_ saving transaction into disk *at all* for such cases, even
      if it has been explicitly told to commit the transaction via two-phase
      commit protocol calls done at storage level.
      
      This creates the situation, where contrary to promise in
      ZODB/interfaces.py(*), after successful tpc_begin/tpc_vote/tpc_finish()
      calls made at storage level, transaction is _not_ made permanent,
      despite tid of "committed" transaction being returned to caller. In other
      words FileStorage, when asked to commit a transaction, even if one with
      empty data changes, reports "ok" and gives transaction ID to the caller,
      without creating corresponding transaction record on disk.
      
      This behaviour is
      
      a) redundant to application-level avoidance to create empty transaction
         on storage described in the beginning, and
      
      b) creates problems:
      
      The first problem is that application that works at storage-level might
      be interested in persisting transaction, even with empty changes to
      data, just because it wants to save the metadata similarly to e.g.
      `git commit --allow-empty`.
      
      The other problem is that an application view and data in database
      become inconsistent: an application is told that a transaction was
      created with corresponding transaction ID, but if the storage is
      actually inspected, e.g. by iteration, the transaction is not there.
      This, in particular, can create problems if TID of committed transaction
      is reported elsewhere and that second database client does not find the
      transaction it was told should exist.
      
      I hit this particular problem with wendelin.core. In wendelin.core,
      there is custom virtual memory layer that keeps memory in sync with
      data in ZODB. At commit time, the memory is inspected for being dirtied,
      and if a page was changed, virtual memory layer joins current
      transaction _and_ forces corresponding ZODB.Connection - via which it
      will be saving data into ZODB objects - to join the transaction too,
      because it would be too late to join ZODB.Connection after 2PC process
      has begun(+). One of the format in which data are saved tries to
      optimize disk space usage, and it actually might happen, that even if
      data in RAM were dirtied, the data itself stayed the same and so nothing
      should be saved into ZODB. However ZODB.Connection is already joined
      into transaction and it is hard not to join it because joining a
      DataManager when the 2PC is already ongoing does not work.
      
      This used to work ok with wendelin.core 1, but with wendelin.core 2 -
      where separate virtual filesystem is also connected to the database to
      provide base layer for arrays mappings - this creates problem, because
      when wcfs (the filesystem) is told to synchronize to view the database
      @tid of committed transaction, it can wait forever waiting for that, or
      later, transaction to appear on disk in the database, creating
      application-level deadlock.
      
      I agree that some more effort might be made at wendelin.core side to
      avoid committing transactions with empty data at storage level.
      
      However the most clean way to fix this problem in my view is to fix
      FileStorage itself, because if at storage level it was asked to commit
      something, it should not silently skip doing so and dropping even non-empty
      metadata + returning ok and committed transaction ID to the caller.
      
      As described in the beginning this should not create problems for
      application-level ZODB users, while at storage-level the implementation
      is now consistently matching interface and common sense.
      
      ----
      
      (*) tpc_finish: Finish the transaction, making any transaction changes permanent.
          Changes must be made permanent at this point.
          ...
      
          https://github.com/zopefoundation/ZODB/blob/5.5.1-35-gb5895a5c2/src/ZODB/interfaces.py#L828-L831
      
      (+) https://lab.nexedi.com/kirr/wendelin.core/blob/9ff5ed32/bigfile/file_zodb.py#L788-822
      bb9bf539