TODO: safer tpc_finish and faster storage

63324838 · Julien Muchembled · 6275f7c6 · 63324838 · 63324838 · 63324838
Commit 63324838 authored Oct 27, 2015 by Julien Muchembled
Show whitespace changes
Inline Side-by-side

Showing with 43 additions and 17 deletions

BUGS BUGS +0 -7

TODO TODO +4 -9

neo/client/app.py neo/client/app.py +39 -1

No files found.
--- a/BUGS
+++ b/BUGS
@@ -16,13 +16,6 @@ Although this should happen rarely enough not to affect performance, this can
 be an issue if your application can't afford restarting the transaction,
 e.g. because it interacted with external environment.

-Client always raises in tpc_finish if a failure happen
------------------------------------------------------
-
-This is wrong because the failure may actually happen just after the transaction
-is actually committed. Client should not raise if it can reconnect and note
-that the transaction exist.
-
 Storage failure or update may lead to POSException or break undoLog()
 ---------------------------------------------------------------------


--- a/TODO
+++ b/TODO
@@ -58,9 +58,12 @@
      committed by future transactions.
    - Add a 'devid' storage configuration so that master do not distribute
      replicated partitions on storages with same 'devid'.
+    - Make tpc_finish safer as described in its __doc__: moving work to
+      tpc_vote and recover from master failure when possible.

    Storage
-    - Use HailDB instead of a stand-alone MySQL server.
+    - Use libmysqld instead of a stand-alone MySQL server.
+    - It should be possible to defer the commit at the end of finishTransaction.
    - Notify master when storage becomes available for clients (LATENCY)
      Currently, storage presence is broadcasted to client nodes too early, as
      the storage node would refuse them until it has only up-to-date data (not
@@ -131,14 +134,6 @@
      table hasn't changed by pinging the master and retry if necessary.
    - Implement IStorageRestoreable (ZODB API) in order to preserve data
      serials (i.e. undo information).
-    - tpc_finish might raise while transaction got successfully committed.
-      This can happen if it gets disconnected from primary master while waiting
-      for AnswerFinishTransaction after primary received it and hence will
-      commit transaction independently from client presence. Client could
-      legitimately think transaction is not committed, and might decide to
-      retry. To solve this, client can know if its TTID got successfuly
-      committed by looking at currently unused '(t)trans.ttid' column.
-      See neo.threaded.test.Test.testStorageFailureDuringTpcFinish
    - Fix and reenable deadlock avoidance (SPEED). This is required for
      neo.threaded.test.Test.testDeadlockAvoidance


--- a/neo/client/app.py
+++ b/neo/client/app.py
@@ -660,7 +660,45 @@ class Application(ThreadedApplication):
        self.dispatcher.forget_queue(txn_context['queue'], flush_queue=False)

    def tpc_finish(self, transaction, tryToResolveConflict, f=None):
-        """Finish current transaction."""
+        """Finish current transaction
+
+        To avoid inconsistencies between several databases involved in the
+        same transaction, an IStorage implementation must do its best not to
+        fail in tpc_finish. In particular, making a transaction permanent
+        should ideally be as simple as switching a bit permanently.
+
+        In NEO, tpc_finish breaks this promise by not ensuring earlier that all
+        data and metadata are written, and it is for example vulnerable to
+        ENOSPC errors. In other words, some work should be moved to tpc_vote.
+
+        TODO: - In tpc_vote, all involved storage nodes must be asked to write
+                all metadata to ttrans/tobj and _commit_. AskStoreTransaction
+                can be extended for this: for nodes that don't store anything
+                in ttrans, it can just contain the ttid. The final tid is not
+                known yet, so ttrans/tobj would contain the ttid.
+              - In tpc_finish, AskLockInformation is still required for read
+                locking, ttrans.tid must be updated with the final value and
+                ttrans _committed_.
+              - The Verification phase would need some change because
+                ttrans/tobj may contain data for which tpc_finish was not
+                called. The ttid is also in trans so a mapping ttid<->tid is
+                always possible and can be forwarded via the master so that all
+                storage are still able to update the tid column with the final
+                value when moving rows from tobj to obj.
+              The resulting cost is:
+              - additional RPCs in tpc_vote
+              - 1 updated row in ttrans + commit
+
+        TODO: We should recover from master failures when the transaction got
+              successfully committed. More precisely, we should not raise:
+              - if any failure happens after all storage nodes have processed
+                successfully the LockInformation packets from the master;
+              - and if we can reconnect to the cluster to check that the ttid
+                got successfuly committed, which is possible because storage
+                nodes remember the ttid of all transactions.
+              See neo.threaded.test.Test.testStorageFailureDuringTpcFinish
+              This bug exists in ZEO.
+        """
        txn_container = self._txn_container
        if 'voted' not in txn_container.get(transaction):
            self.tpc_vote(transaction, tryToResolveConflict)