- 27 Feb, 2017 3 commits
-
-
Julien Muchembled authored
This happened in 2 cases: - Commit a4c06242 ("Review aborting of transactions") introduced a race condition causing oids to remain write-locked forever after that the transaction modifying them is aborted. - An unfinished transaction is not locked/unlocked during tpc_finish: oids must be unlocked when being notified that the transaction is finished.
-
Julien Muchembled authored
This was found by the first assertion of answerRebaseObject (client) because a storage node missed a few transactions and reported a conflict with an older serial than the one being stored: this must never happen and this commit adds a more generic assertion on the storage side. The above case is when the "first phase" of replication of a partition (all history up to the tid before unfinished transactions) ended after that the unfinished transactions are finished: this was a corruption bug, where UP_TO_DATE cells could miss data. Otherwise, if the "first phase" ended before, then the partition remained stuck in OUT_OF_DATE state. Restarting the storage node was enough to recover.
-
Julien Muchembled authored
Traceback (most recent call last): ... File "neo/client/app.py", line 507, in tpc_vote self.waitStoreResponses(txn_context) File "neo/client/app.py", line 500, in waitStoreResponses _waitAnyTransactionMessage(txn_context) File "neo/client/app.py", line 150, in _waitAnyTransactionMessage self._handleConflicts(txn_context) File "neo/client/app.py", line 474, in _handleConflicts self._store(txn_context, oid, conflict_serial, data) File "neo/client/app.py", line 410, in _store self._waitAnyTransactionMessage(txn_context, False) File "neo/client/app.py", line 145, in _waitAnyTransactionMessage self._waitAnyMessage(queue, block=block) File "neo/client/app.py", line 133, in _waitAnyMessage _handlePacket(conn, packet, kw) File "neo/lib/threaded_app.py", line 133, in _handlePacket handler.dispatch(conn, packet, kw) File "neo/lib/handler.py", line 72, in dispatch method(conn, *args, **kw) File "neo/client/handlers/storage.py", line 122, in answerRebaseObject assert txn_context.conflict_dict[oid] == (serial, conflict) AssertionError Scenario: 0. unanswered rebase from S2 1. conflict resolved between t1 and t2 -> S1 & S2 2. S1 reports a new conflict 3. S2 answers to the rebase: returned serial (t1) is smaller than in conflict_dict (t2) 4. S2 reports the same conflict as in 2
-
- 24 Feb, 2017 2 commits
-
-
Julien Muchembled authored
Traceback (most recent call last): ... File "neo/storage/handlers/storage.py", line 111, in answerFetchObjects self.app.replicator.finish() File "neo/storage/replicator.py", line 370, in finish self._nextPartition() File "neo/storage/replicator.py", line 279, in _nextPartition assert app.pt.getCell(offset, app.uuid).isOutOfDate() AssertionError The scenario is: 1. partition A: start of replication, with unfinished transactions 2. partition A: all unfinished transactions are finished 3. partition A: end of replication with ReplicationDone notification 4. replication of partition B 5. partition A: AssertionError when starting replication The bug is that in 3, the partition A is partially replicated and the storage node must not notify the master.
-
Julien Muchembled authored
-
- 23 Feb, 2017 1 commit
-
-
Julien Muchembled authored
This fixes testBasicStore when run with MySQL backend, which started to fail with commit 9eb06ff1 when -L runner option is not used.
-
- 21 Feb, 2017 6 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
This is a first version with several optimizations possible: - improve EventQueue (or implement a specific queue) to minimize deadlocks - turn the RebaseObject packet into a notification Sorting oids could also be useful to reduce the probability of deadlocks, but that would never be enough to avoid them completely, even if there's a single storage. For example: 1. C1 does a first store (x or y) 2. C2 stores x and y; one is delayed 3. C1 stores the other -> deadlock When solving the deadlock, the data of the first store may only exist on the storage. 2 functional tests are removed because they're redundant, either with ZODB tests or with the new threaded tests.
-
Julien Muchembled authored
- Make sure that errors while processing a delayed packet are reported to the connection that sent this packet. - Provide a mechanism to process events for the same connection in chronological order.
-
Julien Muchembled authored
-
- 14 Feb, 2017 8 commits
-
-
Julien Muchembled authored
Fix conflict handling after a successful store to a node being disconnected for having missed a transaction
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
- fail sooner in case of unresolvable conflict - avoid OOM when there are many conflicts
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 02 Feb, 2017 10 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
Now that we do inequality comparisons between timestamps, the master must use a monotonic clock, to avoid issues when the clock is turned back. Before, the probability that time.time() returned again the same value was probably negligible.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
This optimizes the normal case, and handlers can now take specific action when requests are cancelled because a connection is closed.
-
Julien Muchembled authored
It was only used by the now removed HasLock.
-
Julien Muchembled authored
It was disabled long time ago and NEO has evolved in such a way that the new implementation will be completely different.
-
Julien Muchembled authored
It's dead code, because 1 year after it was introduced, something else was implemented to detect deadlocks immediately. Anyway, it would be an unacceptable way to detect them.
-
Julien Muchembled authored
-
- 26 Jan, 2017 1 commit
-
-
Julien Muchembled authored
-
- 19 Jan, 2017 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 18 Jan, 2017 4 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
It was failing when some .py files were not compiled.
-
- 17 Jan, 2017 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-