Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
N
neo
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Stefane Fermigier
neo
Commits
63324838
Commit
63324838
authored
Oct 27, 2015
by
Julien Muchembled
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
TODO: safer tpc_finish and faster storage
parent
6275f7c6
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
43 additions
and
17 deletions
+43
-17
BUGS
BUGS
+0
-7
TODO
TODO
+4
-9
neo/client/app.py
neo/client/app.py
+39
-1
No files found.
BUGS
View file @
63324838
...
@@ -16,13 +16,6 @@ Although this should happen rarely enough not to affect performance, this can
...
@@ -16,13 +16,6 @@ Although this should happen rarely enough not to affect performance, this can
be an issue if your application can't afford restarting the transaction,
be an issue if your application can't afford restarting the transaction,
e.g. because it interacted with external environment.
e.g. because it interacted with external environment.
Client always raises in tpc_finish if a failure happen
------------------------------------------------------
This is wrong because the failure may actually happen just after the transaction
is actually committed. Client should not raise if it can reconnect and note
that the transaction exist.
Storage failure or update may lead to POSException or break undoLog()
Storage failure or update may lead to POSException or break undoLog()
---------------------------------------------------------------------
---------------------------------------------------------------------
...
...
TODO
View file @
63324838
...
@@ -58,9 +58,12 @@
...
@@ -58,9 +58,12 @@
committed by future transactions.
committed by future transactions.
- Add a 'devid' storage configuration so that master do not distribute
- Add a 'devid' storage configuration so that master do not distribute
replicated partitions on storages with same 'devid'.
replicated partitions on storages with same 'devid'.
- Make tpc_finish safer as described in its __doc__: moving work to
tpc_vote and recover from master failure when possible.
Storage
Storage
- Use HailDB instead of a stand-alone MySQL server.
- Use libmysqld instead of a stand-alone MySQL server.
- It should be possible to defer the commit at the end of finishTransaction.
- Notify master when storage becomes available for clients (LATENCY)
- Notify master when storage becomes available for clients (LATENCY)
Currently, storage presence is broadcasted to client nodes too early, as
Currently, storage presence is broadcasted to client nodes too early, as
the storage node would refuse them until it has only up-to-date data (not
the storage node would refuse them until it has only up-to-date data (not
...
@@ -131,14 +134,6 @@
...
@@ -131,14 +134,6 @@
table hasn't changed by pinging the master and retry if necessary.
table hasn't changed by pinging the master and retry if necessary.
- Implement IStorageRestoreable (ZODB API) in order to preserve data
- Implement IStorageRestoreable (ZODB API) in order to preserve data
serials (i.e. undo information).
serials (i.e. undo information).
- tpc_finish might raise while transaction got successfully committed.
This can happen if it gets disconnected from primary master while waiting
for AnswerFinishTransaction after primary received it and hence will
commit transaction independently from client presence. Client could
legitimately think transaction is not committed, and might decide to
retry. To solve this, client can know if its TTID got successfuly
committed by looking at currently unused '(t)trans.ttid' column.
See neo.threaded.test.Test.testStorageFailureDuringTpcFinish
- Fix and reenable deadlock avoidance (SPEED). This is required for
- Fix and reenable deadlock avoidance (SPEED). This is required for
neo.threaded.test.Test.testDeadlockAvoidance
neo.threaded.test.Test.testDeadlockAvoidance
...
...
neo/client/app.py
View file @
63324838
...
@@ -660,7 +660,45 @@ class Application(ThreadedApplication):
...
@@ -660,7 +660,45 @@ class Application(ThreadedApplication):
self
.
dispatcher
.
forget_queue
(
txn_context
[
'queue'
],
flush_queue
=
False
)
self
.
dispatcher
.
forget_queue
(
txn_context
[
'queue'
],
flush_queue
=
False
)
def
tpc_finish
(
self
,
transaction
,
tryToResolveConflict
,
f
=
None
):
def
tpc_finish
(
self
,
transaction
,
tryToResolveConflict
,
f
=
None
):
"""Finish current transaction."""
"""Finish current transaction
To avoid inconsistencies between several databases involved in the
same transaction, an IStorage implementation must do its best not to
fail in tpc_finish. In particular, making a transaction permanent
should ideally be as simple as switching a bit permanently.
In NEO, tpc_finish breaks this promise by not ensuring earlier that all
data and metadata are written, and it is for example vulnerable to
ENOSPC errors. In other words, some work should be moved to tpc_vote.
TODO: - In tpc_vote, all involved storage nodes must be asked to write
all metadata to ttrans/tobj and _commit_. AskStoreTransaction
can be extended for this: for nodes that don't store anything
in ttrans, it can just contain the ttid. The final tid is not
known yet, so ttrans/tobj would contain the ttid.
- In tpc_finish, AskLockInformation is still required for read
locking, ttrans.tid must be updated with the final value and
ttrans _committed_.
- The Verification phase would need some change because
ttrans/tobj may contain data for which tpc_finish was not
called. The ttid is also in trans so a mapping ttid<->tid is
always possible and can be forwarded via the master so that all
storage are still able to update the tid column with the final
value when moving rows from tobj to obj.
The resulting cost is:
- additional RPCs in tpc_vote
- 1 updated row in ttrans + commit
TODO: We should recover from master failures when the transaction got
successfully committed. More precisely, we should not raise:
- if any failure happens after all storage nodes have processed
successfully the LockInformation packets from the master;
- and if we can reconnect to the cluster to check that the ttid
got successfuly committed, which is possible because storage
nodes remember the ttid of all transactions.
See neo.threaded.test.Test.testStorageFailureDuringTpcFinish
This bug exists in ZEO.
"""
txn_container
=
self
.
_txn_container
txn_container
=
self
.
_txn_container
if
'voted'
not
in
txn_container
.
get
(
transaction
):
if
'voted'
not
in
txn_container
.
get
(
transaction
):
self
.
tpc_vote
(
transaction
,
tryToResolveConflict
)
self
.
tpc_vote
(
transaction
,
tryToResolveConflict
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment