Commits · v1.6.2 · Stefane Fermigier / neo

09 Mar, 2016 2 commits
- Release version 1.6.2 · 1705d828
  Julien Muchembled authored Mar 09, 2016
  
  1705d828
- BUGS: possible "uuid" conflict issue after clients got disconnected from the master · 24780e8e
  Julien Muchembled authored Mar 09, 2016
  
  24780e8e
08 Mar, 2016 2 commits
- tests: check case of multiple conflict resolutions for the same (oid, txn) · eee74faf
  Julien Muchembled authored Mar 08, 2016
  
  eee74faf
- tests: new helper to synchronize threads · 15bcd495
  Julien Muchembled authored Mar 08, 2016
  
  15bcd495
04 Mar, 2016 3 commits

storage: move the commit at tpc_vote from the backends to the unique caller · 645920e8
Julien Muchembled authored Mar 04, 2016

645920e8

storage: defer commit when unlocking a transaction (-> better performance) · eaa07e25

Julien Muchembled authored Mar 04, 2016

Before this change, a storage node did 3 commits per transaction:
- once all data are stored
- when locking the transaction
- when unlocking the transaction

The last one is not important for ACID. In case of a crash, the transaction
is unlocked again (verification phase). By deferring it by 1 second, we
only have 2 commits per transaction during high activity because all pending
changes are merged with the commits caused by other transactions.

This change compensates the extra commit(s) per transaction that were
introduced in commit 7eb7cf1b
("Minimize the amount of work during tpc_finish").

eaa07e25

client: optimize cache by not keeping items with counter=0 in history queue · 254878a8
Julien Muchembled authored Mar 02, 2016

254878a8

02 Mar, 2016 1 commit

client: revert incorrect memory optimization · 763806e0

Julien Muchembled authored Mar 02, 2016

Since commit d2d77437 ("client: make the cache
tolerant to late invalidations when the entry is in the history queue"),
invalidated items became current again when they were moved to the history
queue, which was wrong for 2 reasons:
- only the last items of _oid_dict values may have next_tid=None,
- and for such items, they could be wrongly reused when caching the real
  current data.

763806e0

01 Mar, 2016 1 commit
- storage: switch to a maintained fork of MySQL-python · 5f0c93f5
  Julien Muchembled authored Mar 01, 2016
  
  5f0c93f5
26 Feb, 2016 4 commits
- README: minor update · e0bd2b5b
  Julien Muchembled authored Feb 26, 2016
  
  e0bd2b5b
- doc: rename CHANGES/README/UPGRADE for GitLab · 55eb90c1
  Julien Muchembled authored Feb 26, 2016
  
  55eb90c1
- tests: new NEO_DB_SOCKET environment variable to chose the MySQL server to use · cc72e972
  Julien Muchembled authored Feb 26, 2016
  
  cc72e972
- BUGS: deadlock avoidance can also happen with only 1 storage node · 9bd524ab
  Julien Muchembled authored Feb 26, 2016
  
  9bd524ab
05 Feb, 2016 1 commit

client: make the cache tolerant to late invalidations when the entry is in the history queue · d2d77437

Julien Muchembled authored Feb 05, 2016

This fixes the following scenario:
1. the master sends invalidations to clients,
   and unlocks to storages  (oid1, tid1)
2. the storage receives/processes the unlock
3. the client asks data (oid1, tid0)
4. the storage returns tid1 as next tid, whereas it's still None in the cache
   (before, it caused an assertion failure)
6. the client processes invalidations

d2d77437

25 Jan, 2016 2 commits
- Release version 1.6 · a7f50dfc
  Julien Muchembled authored Jan 25, 2016
  
  a7f50dfc
- Update copyright year · 5a8e9d04
  Julien Muchembled authored Jan 25, 2016
  
  5a8e9d04
21 Jan, 2016 2 commits
- Update neo/debug.py example · 321b0bf8
  Julien Muchembled authored Jan 21, 2016
  
  321b0bf8
- tests: document Patch class · e5c056b9
  Julien Muchembled authored Jan 21, 2016
  
  e5c056b9
12 Jan, 2016 1 commit
- client: remove obsolete comment in Storage.load · d43bd510
  Julien Muchembled authored Jan 12, 2016
```
See commit c277ed20
("client: really process all invalidations in poll thread").
```
  d43bd510
16 Dec, 2015 2 commits
- neoctl: don't print 'None' on successful check/truncate commands · 50a6cf41
  Julien Muchembled authored Dec 14, 2015
  
  50a6cf41
- interfaces: check signature of methods · 82d95846
  Julien Muchembled authored Dec 13, 2015
  
  82d95846
13 Dec, 2015 3 commits
- storage: define interface for backends and check they implement it · f419f974
  Julien Muchembled authored Dec 13, 2015
  
  f419f974
- importer: allow truncation after the last tid to import, during or after the import · c6b80f7b
  Julien Muchembled authored Dec 13, 2015
```
This is a partial implementation. To truncate at a smaller tid, you must wait
that data is imported up to this tid and stop using the Importer backend.
```
  c6b80f7b
- importer: do not implement deleteTransaction, now only used for replication · 24a9f1b8
  Julien Muchembled authored Dec 13, 2015
```
This backend does not support replication. Even if we implemented it, such node
could only be a source for other nodes so we should never delete transactions.
```
  24a9f1b8
12 Dec, 2015 1 commit
- neolog: fix crash on unknown packets · af8a8370
  Julien Muchembled authored Dec 12, 2015
  
  af8a8370
11 Dec, 2015 1 commit
- client: dump cache stats on SIGRTMIN+2 · 9e543d76
  Julien Muchembled authored Dec 11, 2015
  
  9e543d76
09 Dec, 2015 1 commit
- client: fix spurious connection timeouts · 06a64d80
  Julien Muchembled authored Dec 09, 2015
```
This fixes a regression caused by
commit eef52c27
```
  06a64d80
02 Dec, 2015 1 commit
- Release version 1.6 · f180b00e
  Julien Muchembled authored Dec 02, 2015
  
  f180b00e
01 Dec, 2015 3 commits

master: fix verification when nodes don't have any readable cell · cd669221
Julien Muchembled authored Nov 24, 2015

cd669221
Bump protocol version and upgrade storages automatically · ca2caf87
Julien Muchembled authored Nov 25, 2015

ca2caf87

Safer DB truncation, new 'truncate' ctl command · d3c8b76d

Julien Muchembled authored Dec 01, 2015

With the previous commit, the request to truncate the DB was not stored
persistently, which means that this operation was still vulnerable to the case
where the master is restarted after some nodes, but not all, have already
truncated. The master didn't have the information to fix this and the result
was a DB partially truncated.

-> On a Truncate packet, a storage node only stores the tid somewhere, to send
   it back to the master, which stays in RECOVERING state as long as any node
   has a different value than that of the node with the latest partition table.

We also want to make sure that there is no unfinished data, because a user may
truncate at a tid higher than a locked one.

-> Truncation is now effective at the end on the VERIFYING phase, just before
   returning the last ids to the master.

At last all nodes should be truncated, to avoid that an offline node comes back
with a different history. Currently, this would not be an issue since
replication is always restart from the beginning, but later we'd like they
remember where they stopped to replicate.

-> If a truncation is requested, the master waits for all nodes to be pending,
   even if it was previously started (the user can still force the cluster to
   start with neoctl). And any lost node during verification also causes the
   master to go back to recovery.

Obviously, the protocol has been changed to split the LastIDs packet and
introduce a new Recovery, since it does not make sense anymore to ask last ids
during recovery.

d3c8b76d

30 Nov, 2015 9 commits

Perform DB truncation during recovery, send PT to storages before verification · 3e3eab5b

Julien Muchembled authored Nov 25, 2015

Currently, the database may only be truncated when leaving backup mode, but
the issue will be the same when neoctl gets a new command to truncate at an
arbitrary tid: we want to be sure that all nodes are truncated before anything
else.

Therefore, we stop sending Truncate orders before stopping operation because
nodes could fail/exit before actually processing them. Truncation must also
happen before asking nodes their last ids.

With this commit, if a truncation is requested:
- this is always the first thing done when a storage node connects to the
  primary master during the RECOVERING phase,
- and the cluster does not start automatically if there are missing nodes,
  unless an admin forces it.

Other changes:
- Connections to storage nodes don't need to be aborted anymore when leaving
  backup mode.
- The master always initiates communication when a storage node identifies,
  which simplifies code and reduces the number of exchanged packets.

3e3eab5b

master: fix possible blockage during recovery after a storage disconnection · 2485f151

Julien Muchembled authored Nov 19, 2015

At some point, the master asks a storage node its partition table. If this node
is lost before getting an answer, another node (or the same one if it comes
back) must be asked.

Before this change, the master node had to be restarted.

2485f151

master: last tid/oid after recovery/verification · dec81519

Julien Muchembled authored Nov 20, 2015

The important bugfix is to update the last oid when the master verifies a
transaction with new oids.

By resetting the transaction manager at the beginning of the recovery phase,
it become possible to avoid tid/oid holes:
- by reallocating previously unused allocated oids
- when going back "in the past", i.e. reverting to an older version of the
  database (with fewer oids) and/or adjusting the clock

dec81519

Go back/stay in RECOVERING state when the partition table can't be operational · e1f9a7da

Julien Muchembled authored Nov 25, 2015

This fixes several cases where the partition table could become corrupt and
the whole cluster being stuck in VERIFYING state.

This also reduces the probability the have cells out of date when restarting
several storage nodes simultaneously.

At last, if a master node becomes primary again, a cluster must not be started
automatically if nodes with readable cells are missing, in order to avoid
a split of the database. This could happen if this master node was previously
forced to start it.

e1f9a7da

Minimize the amount of work during tpc_finish · 7eb7cf1b

Julien Muchembled authored Nov 25, 2015

NEO did not ensure that all data and metadata are written on disk before
tpc_finish, and it was for example vulnerable to ENOSPC errors.
In other words, some work had to be moved to tpc_vote:

- In tpc_vote, all involved storage nodes are now asked to write all metadata
  to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
  column of ttrans and tobj now contains NULL and the ttid respectively.

- In tpc_finish, AskLockInformation is still required for read locking,
  ttrans.tid is updated with the final value and this change is _committed_.

- The verification phase is greatly simplified, more reliable and faster. For
  all voted transactions, we can know if a tpc_finish was started by getting
  the final tid from the ttid, either from ttrans or from trans. And we know
  that such transactions can't be partial so we don't need to check oids.

So in addition to minimizing the risk of failures during tpc_finish, we also
fix a bug causing the verification phase to discard transactions with objects
for which readCurrent was called.

On performance side:

- Although tpc_vote now asks all involved storages, instead of only those
  storing the transaction metadata, the client has been improved to do this
  in parallel. The additional commits are also all done in parallel.

- A possible improvement to compensate the additional commits is to delay the
  commit done by the unlock.

- By minimizing the time to lock transactions, objects are read-locked for a
  much shorter period. This is even more important that locked transactions
  must be unlocked in the same order.

Transactions with too many modified objects will now timeout inside tpc_vote
instead of tpc_finish. Of course, such transactions may still cause other
transaction to timeout in tpc_finish.

7eb7cf1b

Do not send useless node information to bootstraping node · 99ac542c
Julien Muchembled authored Nov 23, 2015

99ac542c
fixup! storage: fix pruning of data when deleting partial transactions during verification · cff279af
Julien Muchembled authored Nov 30, 2015
```
This fixes a regression in commit 83fe64bf
when ttrans has several rows to the same data_id.
```
cff279af
threaded: prevent neoctl to loop forever when something went wrong during the test · a63bf12f
Julien Muchembled authored Nov 26, 2015

a63bf12f
ssl: fix handshaking connections being stuck when they're aborted · fe487c07
Julien Muchembled authored Nov 27, 2015

fe487c07