Commits · 23b6a66a1e83084b3c825189fdeaf15e6a41d00c · Jérome Perrin / neoppod

An error occurred fetching the project authors.

24 Apr, 2017 3 commits

Reimplement election (of the primary master) · 23b6a66a

The election is not a separate process anymore.
It happens during the RECOVERING phase, and there's no use of timeouts anymore.

Each master node keeps a timestamp of when it started to play the primary role,
and the node with the smallest timestamp is elected. The election stops when
the cluster is started: as long as it is operational, the primary master can't
be deposed.

An election must happen whenever the cluster is not operational anymore, to
handle the case of a network cut between a primary master and all other nodes:
then another master node (secondary) takes over and when the initial primary
master is back, it loses against the new primary master if the cluster is
already started.

23b6a66a

Remove BROKEN node state · 9d7f9795
Julien Muchembled authored 7 years ago

9d7f9795
Remove HIDDEN node state · b8210d58
Julien Muchembled authored 7 years ago

b8210d58

31 Mar, 2017 2 commits

storage: fix commit activity when cells are discarded or when they become readable · 34d797e2

Julien Muchembled authored 7 years ago

This is a follow up of commit 64afd7d2,
which focused on read accesses when there is no transaction activity.

This commit also includes a test to check a simpler scenario that the one
described in the previous commit.

34d797e2

Forbid read-accesses to cells that are actually non-readable · 64afd7d2

Julien Muchembled authored 7 years ago

After an attempt to read from a non-readable, which happens when a client has
a newer or older PT than storage's, the client now retries to read.

This bugfix is for all kinds of read-access except undoLog, which can still
report incomplete results.

64afd7d2

21 Feb, 2017 1 commit

Implement deadlock avoidance · 092992db

Julien Muchembled authored 8 years ago

This is a first version with several optimizations possible:
- improve EventQueue (or implement a specific queue) to minimize deadlocks
- turn the RebaseObject packet into a notification

Sorting oids could also be useful to reduce the probability of deadlocks,
but that would never be enough to avoid them completely, even if there's a
single storage. For example:

1. C1 does a first store (x or y)
2. C2 stores x and y; one is delayed
3. C1 stores the other -> deadlock
   When solving the deadlock, the data of the first store may only
   exist on the storage.

2 functional tests are removed because they're redundant,
either with ZODB tests or with the new threaded tests.

092992db

14 Feb, 2017 2 commits
- Update TODO · cf6e48ea
  Julien Muchembled authored 8 years ago
  
  cf6e48ea
- Lockless stores/checks during replication · 7af948cf
  Julien Muchembled authored 8 years ago
  
  7af948cf
02 Feb, 2017 1 commit

Delayed connection acception when the storage node is ready · b7a5bc99

Julien Muchembled authored 8 years ago

Now that we do inequality comparisons between timestamps, the master must
use a monotonic clock, to avoid issues when the clock is turned back.
Before, the probability that time.time() returned again the same value was
probably negligible.

b7a5bc99

25 Nov, 2016 1 commit
- New neotestrunner option for code coverage testing · 5de0ff3a
  Julien Muchembled authored 8 years ago
  
  5de0ff3a
29 Aug, 2016 1 commit
- Update TODO · 00ffb1ef
  Julien Muchembled authored 8 years ago
  
  00ffb1ef
22 Mar, 2016 1 commit
- Recover from failures during tpc_finish when the transaction got successfully committed · dd74d662
  Julien Muchembled authored 8 years ago
  
  dd74d662
04 Mar, 2016 1 commit

storage: defer commit when unlocking a transaction (-> better performance) · eaa07e25

Julien Muchembled authored 8 years ago

Before this change, a storage node did 3 commits per transaction:
- once all data are stored
- when locking the transaction
- when unlocking the transaction

The last one is not important for ACID. In case of a crash, the transaction
is unlocked again (verification phase). By deferring it by 1 second, we
only have 2 commits per transaction during high activity because all pending
changes are merged with the commits caused by other transactions.

This change compensates the extra commit(s) per transaction that were
introduced in commit 7eb7cf1b
("Minimize the amount of work during tpc_finish").

eaa07e25

01 Dec, 2015 1 commit

Safer DB truncation, new 'truncate' ctl command · d3c8b76d

Julien Muchembled authored 9 years ago

With the previous commit, the request to truncate the DB was not stored
persistently, which means that this operation was still vulnerable to the case
where the master is restarted after some nodes, but not all, have already
truncated. The master didn't have the information to fix this and the result
was a DB partially truncated.

-> On a Truncate packet, a storage node only stores the tid somewhere, to send
   it back to the master, which stays in RECOVERING state as long as any node
   has a different value than that of the node with the latest partition table.

We also want to make sure that there is no unfinished data, because a user may
truncate at a tid higher than a locked one.

-> Truncation is now effective at the end on the VERIFYING phase, just before
   returning the last ids to the master.

At last all nodes should be truncated, to avoid that an offline node comes back
with a different history. Currently, this would not be an issue since
replication is always restart from the beginning, but later we'd like they
remember where they stopped to replicate.

-> If a truncation is requested, the master waits for all nodes to be pending,
   even if it was previously started (the user can still force the cluster to
   start with neoctl). And any lost node during verification also causes the
   master to go back to recovery.

Obviously, the protocol has been changed to split the LastIDs packet and
introduce a new Recovery, since it does not make sense anymore to ask last ids
during recovery.

d3c8b76d

30 Nov, 2015 1 commit

Minimize the amount of work during tpc_finish · 7eb7cf1b

Julien Muchembled authored 9 years ago

NEO did not ensure that all data and metadata are written on disk before
tpc_finish, and it was for example vulnerable to ENOSPC errors.
In other words, some work had to be moved to tpc_vote:

- In tpc_vote, all involved storage nodes are now asked to write all metadata
  to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
  column of ttrans and tobj now contains NULL and the ttid respectively.

- In tpc_finish, AskLockInformation is still required for read locking,
  ttrans.tid is updated with the final value and this change is _committed_.

- The verification phase is greatly simplified, more reliable and faster. For
  all voted transactions, we can know if a tpc_finish was started by getting
  the final tid from the ttid, either from ttrans or from trans. And we know
  that such transactions can't be partial so we don't need to check oids.

So in addition to minimizing the risk of failures during tpc_finish, we also
fix a bug causing the verification phase to discard transactions with objects
for which readCurrent was called.

On performance side:

- Although tpc_vote now asks all involved storages, instead of only those
  storing the transaction metadata, the client has been improved to do this
  in parallel. The additional commits are also all done in parallel.

- A possible improvement to compensate the additional commits is to delay the
  commit done by the unlock.

- By minimizing the time to lock transactions, objects are read-locked for a
  much shorter period. This is even more important that locked transactions
  must be unlocked in the same order.

Transactions with too many modified objects will now timeout inside tpc_vote
instead of tpc_finish. Of course, such transactions may still cause other
transaction to timeout in tpc_finish.

7eb7cf1b

25 Nov, 2015 1 commit
- storage: always restart replication of outdated cells from the beginning (ZERO_TID) · 6b1f198f
  Julien Muchembled authored 9 years ago
```
This is a workaround to fix holes if replication is interrupted after new data
is committed.
```
  6b1f198f
29 Oct, 2015 1 commit
- TODO: safer tpc_finish and faster storage · 63324838
  Julien Muchembled authored 9 years ago
  
  63324838
05 Oct, 2015 1 commit
- Add SSL support · bff5c82f
  Julien Muchembled authored 9 years ago
  
  bff5c82f
24 Sep, 2015 1 commit
- Fix remaining memory leaks and make handler instances become singletons · 7d5b1559
  Julien Muchembled authored 9 years ago
  
  7d5b1559
23 Sep, 2015 1 commit

Fix leak of file descriptors in unit tests · d75fcc59

Julien Muchembled authored 9 years ago

There remain only one leak in ClientApplicationTests.test_connectToPrimaryNode
because of Mock objects.

d75fcc59

15 Sep, 2015 1 commit
- TODO: document which mock library we should use · c88c6ac5
  Julien Muchembled authored 9 years ago
  
  c88c6ac5
28 Aug, 2015 1 commit

Fix occasional deadlocks in threaded tests · 0b93b1fb

Julien Muchembled authored 9 years ago

deadlocks mainly happened while stopping a cluster, hence the complete review
of NEOCluster.stop()

A major change is to make the client node handle its lock like other nodes
(i.e. in the polling thread itself) to better know when to call
Serialized.background() (there was a race condition with the test of
'self.poll_thread.isAlive()' in ClientApplication.close).

0b93b1fb

14 Aug, 2015 1 commit

Do not reconnect too quickly to a node after an error · d898a83d

Julien Muchembled authored 9 years ago

For example, a backup storage node that was rejected because the upstream
cluster was not ready could reconnect in loop without delay, using 100% CPU
and flooding logs.

A new 'setReconnectionNoDelay' method on Connection can be used for cases where
it's legitimate to quickly reconnect.

With this new delayed reconnection, it's possible to remove the remaining
time.sleep().

d898a83d

12 Aug, 2015 1 commit
- Small optimizations & cleanups · 19745e7c
  Julien Muchembled authored 9 years ago
  
  19745e7c
13 Jul, 2015 1 commit
- Better handling of NotReady error · 167ad36b
  Julien Muchembled authored 9 years ago
  
  167ad36b
10 Jul, 2015 1 commit
- Some documentation cleanup · 8ec87379
  Julien Muchembled authored 9 years ago
  
  8ec87379
25 Jul, 2014 2 commits
- typos · 53df0e1a
  Julien Muchembled authored 10 years ago
  
  53df0e1a
- Add warning about possible misuse of replicas · 55a60277
  Julien Muchembled authored 10 years ago
  
  55a60277
20 Jun, 2014 1 commit

client: clean up import/export code · d562bf8f

Julien Muchembled authored 10 years ago

Export:
- Remove leftover warning about a bug that was fixed in
  commit e76af297
- In neomigrate script, open NEO storage read-only.
- IStorageIteration is already implemented.

Import:
- Review comments.
- In neomigrate script, warn that IStorageRestoreable is not implemented.
- Do not call 'close' method on source iterator. BaseStorage does not do it and
  this is not part of ZODB API. In the case of FileStorage, resource are freed
  automatically during garbage collection.

d562bf8f

03 Jun, 2014 1 commit
- Update BUGS · 9a6ad2e6
  Julien Muchembled authored 10 years ago
```
One entry should have been removed before v1.1
```
  9a6ad2e6
29 May, 2014 1 commit
- TODO: ctl command to truncate DB + hint for better performance · 14fd9cd3
  Julien Muchembled authored 11 years ago
  
  14fd9cd3
07 Jan, 2014 1 commit

Add test showing that clients may be stuck on an old snapshot in case of failure during tpc_finish · fd4cfaa9

Julien Muchembled authored 11 years ago

If anything wrong happens after a transaction is locked and before the end of
onTransactionCommitted, recovery phase should be run again, so that the master
gets correct last tid.

Following patch by Vincent is an attempt to fix this:

--- a/neo/master/app.py
+++ b/neo/master/app.py
@@ -329,8 +329,8 @@ def playPrimaryRole(self):

         # recover the cluster status at startup
         try:
-            self.runManager(RecoveryManager)
             while True:
+                self.runManager(RecoveryManager)
                 self.runManager(VerificationManager)
                 try:
                     if self.backup_tid:
@@ -338,10 +338,6 @@ def playPrimaryRole(self):
                             raise RuntimeError("No upstream cluster to backup"
                                                " defined in configuration")
                         self.backup_app.provideService()
-                        # Reset connection with storages (and go through a
-                        # recovery phase) when leaving backup mode in order
-                        # to get correct last oid/tid.
-                        self.runManager(RecoveryManager)
                         continue
                     self.provideService()
                 except OperationFailure:

fd4cfaa9

23 Aug, 2012 1 commit
- Clean up TODO · b7fdd53f
  Julien Muchembled authored 12 years ago
  
  b7fdd53f
20 Aug, 2012 2 commits

Comment about backup limitations · dd556379
Julien Muchembled authored 12 years ago

dd556379

More bugfixes to backup mode · 08742377

Julien Muchembled authored 12 years ago

- catch OperationFailure
- reset transaction manager when leaving backup mode
- send appropriate target tid to a storage that updates a outdated cell
- clean up partition table when leaving BACKINGUP state unexpectedly
- make sure all readable cells of a partition have the same 'backup_tid'
  if they have the same data, so that we know when internal replication is
  finished when leaving backup mode
- fix storage not finished internal replication when leaving backup mode

08742377

16 Aug, 2012 1 commit
- neoctl: change 'set node' command into 'kill', which kills a node safely · 6be00df7
  Julien Muchembled authored 12 years ago
  
  6be00df7
15 Aug, 2012 1 commit
- Clean up TODO · 35787987
  Julien Muchembled authored 12 years ago
  
  35787987
10 Aug, 2012 1 commit

Start renaming UUID into NID, because node IDs are not 128 bits length anymore · b81ae60a

Julien Muchembled authored 12 years ago

SQL tables can be upgraded using:
  UPDATE config SET name = 'nid' WHERE name = 'uuid';

then for MySQL:
  ALTER TABLE pt CHANGE uuid nid INT NOT NULL;

or SQLite:
  ALTER TABLE pt RENAME TO old_pt;
  CREATE TABLE pt (rid INTEGER NOT NULL, nid INTEGER NOT NULL, state INTEGER NOT NULL, PRIMARY KEY (rid, nid));
  INSERT INTO pt SELECT * from old_pt;
  DROP TABLE old_pt;

b81ae60a

23 Jul, 2012 2 commits

Clean up TODO · 20628341
Julien Muchembled authored 12 years ago

20628341

client: really process all invalidations in poll thread · c277ed20

Julien Muchembled authored 12 years ago

This changes completely how to get data from storages than is not too
recent and NEO now behaves as expected by ZODB, instead of trying to
snapshot at Storage level.
However, ZODB should probably be changed to avoid double loading when
an invalidation is received during a load.

c277ed20