Commits · f39babe5ca2d545cb88aaad10714b06b10de56ac · Kirill Smelkov / neo

25 Apr, 2017 1 commit
- Remove UNKNOWN node state · f39babe5
  Julien Muchembled authored Apr 25, 2017
  
  f39babe5
24 Apr, 2017 6 commits

Reimplement election (of the primary master) · 23b6a66a

Julien Muchembled authored Apr 10, 2017

The election is not a separate process anymore.
It happens during the RECOVERING phase, and there's no use of timeouts anymore.

Each master node keeps a timestamp of when it started to play the primary role,
and the node with the smallest timestamp is elected. The election stops when
the cluster is started: as long as it is operational, the primary master can't
be deposed.

An election must happen whenever the cluster is not operational anymore, to
handle the case of a network cut between a primary master and all other nodes:
then another master node (secondary) takes over and when the initial primary
master is back, it loses against the new primary master if the cluster is
already started.

23b6a66a

Use existing generic way to ignore AcceptIdentification on closed connections · 0a3dba8b
Julien Muchembled authored Apr 24, 2017

0a3dba8b
Remove BROKEN node state · 9d7f9795
Julien Muchembled authored Apr 01, 2017

9d7f9795
Remove HIDDEN node state · b8210d58
Julien Muchembled authored Apr 01, 2017

b8210d58

On NM update, fix removal of nodes that aren't part of the cluster anymore · f051b7a0

Julien Muchembled authored Apr 12, 2017

In order to do that correctly, this commit contains several other changes:

When connecting to a primary master, a full node list always follows the
identification. For storage nodes, this means that they now know all nodes
during the RECOVERING phase.

The initial full node list now always contains a node tuple for:
- the server-side node (i.e. the primary master): on a master, this is
  done by always having a node describing itself in its node manager.
- the client-side node, to make sure it gets a id timestamp:
  now an admin node also receives a node for itself.

f051b7a0

When processing an answer, also update timeout and handler switcher on exception · 9e54a8e0
Julien Muchembled authored Apr 21, 2017
```
This keeps the connection fully functional when a handler raises an exception.
```
9e54a8e0

19 Apr, 2017 2 commits

Silently ignore answers to packets that aren't ignored on closed connection · 4a82657b

Julien Muchembled authored Apr 19, 2017

Commits like 7eb7cf1b
("Minimize the amount of work during tpc_finish") dropped what was done in
commit 07b48079
("Ignore some requests, based on connection state") to protect request handlers
when they respond.

This commit fixes this in a generic way.

4a82657b

Do not process any packet for aborted connections · 8b1189d3
Julien Muchembled authored Apr 19, 2017

8b1189d3

18 Apr, 2017 4 commits

Fix sorting of delayed events · 40bac312

Julien Muchembled authored Apr 18, 2017

The initial intention was to rely on stable sorting when several events have
the same key. For this to happen, sorting must not continue the comparison with
the second item of events.

This could lead to data corruption (conflict resolution with wrong base):

  FAIL: testNotifyReplicated (neo.tests.threaded.test.Test)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "neo/tests/threaded/__init__.py", line 1093, in wrapper
      return wrapped(self, cluster, *args, **kw)
    File "neo/tests/threaded/test.py", line 2019, in testNotifyReplicated
      self.assertEqual([15, 11, 13, 16], [r[x].value for x in 'abcd'])
    File "neo/tests/__init__.py", line 187, in assertEqual
      return super(NeoTestBase, self).assertEqual(first, second, msg=msg)
  failureException: Lists differ: [15, 11, 13, 16] != [19, 11, 13, 16]

  First differing element 0:
  15
  19

  - [15, 11, 13, 16]
  ?   ^

  + [19, 11, 13, 16]
  ?   ^

40bac312

Do never parse any packet from aborted connection · 6841e2f2
Julien Muchembled authored Apr 18, 2017

6841e2f2
fixup! Add file descriptor and aborted flag to __repr__ of connections · 557b5bd5
Julien Muchembled authored Apr 18, 2017
```
'aborted' could appear twice.
```
557b5bd5
fixup! qa: add a basic assertion in Patch to detect when patched code changes · e95847fa
Julien Muchembled authored Apr 18, 2017

e95847fa

13 Apr, 2017 1 commit
- qa: fix occasional deadlock when starting subprocesses in functional tests · c0adf81c
  Julien Muchembled authored Apr 12, 2017
  
  c0adf81c
04 Apr, 2017 1 commit

client: Add support for zodburi · 01a01c8c

Kirill Smelkov authored Apr 04, 2017

zodburi[1] provides a way to open ZODB storages by URL/URI. It already
has support for file:// zeo:// zconfig:// memory:// etc schemes out of
the box and third-party-to-ZODB storages can add support for their
schemes via providing zodburi.resolvers entrypoint.

For example relstorage and newtdb do this.

Let's also teach NEO to open itself via neo:// URI schema.

[1] http://docs.pylonsproject.org/projects/zodburi
[2] https://github.com/zodb/relstorage/blob/2.1a1-15-g68c8cf1/relstorage/zodburi_resolver.py
[3] https://github.com/newtdb/db/blob/0.5.2-1-gbd36e90/src/newt/db/zodburi.py

01a01c8c

31 Mar, 2017 14 commits

bug: on exist/crash, storage space for non-voted data may be leaked · 3bf2a0c6

Julien Muchembled authored Mar 31, 2017

Commit 58d0b602 didn't fix the issue completely.
Storage space can be freed with --repair option.

This adds an expectedFailure test.

3bf2a0c6

storage: fix commit activity when cells are discarded or when they become readable · 34d797e2

Julien Muchembled authored Mar 29, 2017

This is a follow up of commit 64afd7d2,
which focused on read accesses when there is no transaction activity.

This commit also includes a test to check a simpler scenario that the one
described in the previous commit.

34d797e2

client: speed up cell sorting on read-access · 6a75a654
Julien Muchembled authored Mar 10, 2017

6a75a654
Bump protocol version · 0e57eb05
Julien Muchembled authored Mar 31, 2017
```
Commit ad43dcd3 should have bumped it as well.
```
0e57eb05
qa: new ConnectionFilter.retry() · aefa65a2
Julien Muchembled authored Mar 29, 2017
```
Unused but it is likely to be useful in the future.
```
aefa65a2

Fix race when tweak touches partitions that are being reported as replicated · 87c5178b

Julien Muchembled authored Mar 15, 2017

The bug could lead to data corruption (if a partition is wrongly marked as
UP_TO_DATE) or crashes (assertion failure on either the storage or the master).

The protocol is extended to handle the following scenario:

    S                                    M
    partition 0 outdated
      <-- UnfinishedTransactions ------>
    replication of partition 0 ...
    partition 1 outdated
      --- UnfinishedTransactions ...
    ... replication finished
      --- ReplicationDone ...
                                         tweak
      <-- partition 1 discarded --------
                                         tweak
      <-- partition 1 outdated ---------
          ... UnfinishedTransactions -->
          ... ReplicationDone --------->

The master can't simply mark all outdated cells as being updatable when it
receives an UnfinishedTransactions packet.

87c5178b

qa: add a basic assertion in Patch to detect when patched code changes · cb78e6b2
Julien Muchembled authored Mar 24, 2017

cb78e6b2

Forbid read-accesses to cells that are actually non-readable · 64afd7d2

Julien Muchembled authored Mar 08, 2017

After an attempt to read from a non-readable, which happens when a client has
a newer or older PT than storage's, the client now retries to read.

This bugfix is for all kinds of read-access except undoLog, which can still
report incomplete results.

64afd7d2

Fix potential EMFILE when retrying to connect indefinitely · 43fdd059
Julien Muchembled authored Mar 17, 2017

43fdd059
The partition table must forget dropped nodes · 6f86c773
Julien Muchembled authored Mar 15, 2017

6f86c773

master: make sure that storage nodes have an up-to-date PT/NM when they're added · 7ffc96fd

Julien Muchembled authored Mar 15, 2017

This revert commit bddc1802,
to fix the following storage crash:

  Traceback (most recent call last):
    ...
    File "neo/lib/handler.py", line 72, in dispatch
      method(conn, *args, **kw)
    File "neo/storage/handlers/master.py", line 44, in notifyPartitionChanges
      app.pt.update(ptid, cell_list, app.nm)
    File "neo/lib/pt.py", line 231, in update
      assert node is not None, 'No node found for uuid ' + uuid_str(uuid)
  AssertionError: No node found for uuid S3

Partitition table updates must also be processed with InitializationHandler
when nodes remain in PENDING state because they're not added to the cluster.

7ffc96fd

In STOPPING cluster state, really wait for all transaction to be finished · 9e433594
Julien Muchembled authored Mar 15, 2017

9e433594
master: fix random crashes on shutdown when using several master nodes · 35737c9b
Julien Muchembled authored Mar 20, 2017

35737c9b
Mention RocksDB as a possible MySQL engine in neo.conf · 35468667
Julien Muchembled authored Mar 29, 2017

35468667

30 Mar, 2017 1 commit
- README: update location of automated test results · ca980d33
  Julien Muchembled authored Mar 30, 2017
  
  ca980d33
23 Mar, 2017 9 commits

storage: in deadlock avoidance, fix performance issue that could freeze the cluster · 1280f73e

Julien Muchembled authored Mar 14, 2017

In the worst case, with many clients trying to lock the same oids,
the cluster could enter in an infinite cascade of deadlocks.

Here is an overview with 3 storage nodes and 3 transactions:

S1 S2 S3 order of locking tids # abbreviations:
l1 l1 l2 123 # l: lock
q23 q23 d1q3 231 # d: deadlock triggered
r1:l3 r1:l2 (r1) # for S3, we still have l2 # q: queued
d2q1 q13 q13 312 # r: rebase

Above, we show what happens when a random transaction gets a lock just after
that another is rebased. Here, the result is that the last 2 lines are a
permutation of the first 2, and this can repeat indefinitely with bad luck.

This commit reduces the probability of deadlock by processing delayed
stores/checks in the order of their locking tid. In the above example,
S1 would give the lock to 2 when 1 is rebased, and 2 would vote successfully.

1280f73e

qa: document and fortify testCascadedDeadlockAvoidanceOnCheckCurrent · 1b9f8f72
Julien Muchembled authored Mar 14, 2017

1b9f8f72
storage: discard answers from aborted replications · ad43dcd3
Julien Muchembled authored Mar 06, 2017
```
This fixes a bug that could to data corruption or crashes.
```
ad43dcd3

Use Connection.send instead of answer when a packet id must be reused · 4222ac8a

Julien Muchembled authored Mar 07, 2017

It becomes possible to answer with several packets:
- the last is the usual associated answer packet
- all other (previously sent) packets are notifications

Connection.send does not return the packet id anymore. This is not useful
enough, and the caller can inspect the sent packet (getId).

4222ac8a

Rename {Node,Connection}.notify to send · ff4242d4
Julien Muchembled authored Mar 06, 2017

ff4242d4
Code clean up: Connection · 4bde7d76
Julien Muchembled authored Mar 07, 2017

4bde7d76
qa: in threaded tests, nodes can now be reset with a different configuration · dfa346a6
Julien Muchembled authored Mar 06, 2017

dfa346a6
mysql: add support for RocksDB · f2f44cd5
Julien Muchembled authored Mar 02, 2017

f2f44cd5
storage: by default, do not retry to connect to MySQL server automatically · 069dd583
Julien Muchembled authored Mar 23, 2017

069dd583

22 Mar, 2017 1 commit
- qa: hack to make threaded tests pass on a single-core CPU · 7e7af30c
  Julien Muchembled authored Mar 21, 2017
```
In reality, this was tested with
  taskset 1 neotestrunner ...
```
  7e7af30c