- 07 Nov, 2018 6 commits
-
-
Julien Muchembled authored
When a client-storage connection breaks, the storage node discards data of all ongoing transactions by the client. Therefore, a reconnection within the context of the transaction is wrong, as it could lead to partially-written transactions. This fixes cases where such reconnection happened. The biggest issue was that the mechanism to dispatch disconnection events only works when waiting for an answer. The client can still reconnect for other purposes but the new connection won't be reused by transactions that already involved the storage node.
-
Julien Muchembled authored
This fixes stuck replication when a client loses connection to the master during a commit.
-
Julien Muchembled authored
-
Julien Muchembled authored
The consequence was that the client never reconnected to that storage node. On commits, writes to that node always failed, causing the master to disconnect it.
-
Julien Muchembled authored
Nothing wrong actually happens. Traceback (most recent call last): File "neo/scripts/neostorage.py", line 32, in main app.run() File "neo/storage/app.py", line 194, in run self._run() File "neo/storage/app.py", line 225, in _run self.doOperation() File "neo/storage/app.py", line 310, in doOperation poll() File "neo/storage/app.py", line 134, in _poll self.em.poll(1) File "neo/lib/event.py", line 168, in poll self._poll(0) File "neo/lib/event.py", line 220, in _poll if conn.readable(): File "neo/lib/connection.py", line 483, in readable self._closure() File "neo/lib/connection.py", line 541, in _closure self.close() File "neo/lib/connection.py", line 533, in close handler.connectionClosed(self) File "neo/storage/handlers/client.py", line 46, in connectionClosed app.tm.abortFor(conn.getUUID()) File "neo/storage/transactions.py", line 594, in abortFor self.abort(ttid) File "neo/storage/transactions.py", line 570, in abort self._replicated.get(self.getPartition(oid))), x AssertionError: ('\x00\x00\x00\x00\x00\x01a\xe5', '\x03\xcaZ\x04\x14o\x8e\xbb', '\x03\xcaZ\x04\x0eX{\xbb', {1: None, 21: '\x03\xcaZ\x04\x11\xc6\x94\xf6'}, set([]))
-
Julien Muchembled authored
-
- 05 Nov, 2018 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
In case of storage disconnection, one packet (VoteTransaction) was not handled the same way as other writes, and the failure was not reported to the master so that it arbitrates the vote. The transaction was therefore partially committed.
-
- 06 Sep, 2018 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
Here is what happened after simulating a network failure between a client and a storage: C8 DEBUG recv failed for <SSLSocketConnectorIPv6 at 0x7f8198027f90 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 53970), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)>: ECONNRESET (Connection reset by peer) DEBUG connection closed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, closed, client) at 7f81939a0950> DEBUG connection started for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, fd=17, on_close=onConnectionClosed, connecting, client) at 7f8192eb17d0> PACKET #0x0000 RequestIdentification > S2 ([xxxx:xxxx:60:4c2c::25c3]:39085) | (<EnumItem CLIENT (2)>, -536870904, None, '...', [], 1535555463.455761) DEBUG SSL handshake done for <SSLSocketConnectorIPv6 at 0x7f8192eb1850 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 54014), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)>: ECDHE-RSA-AES256-GCM-SHA384 256 DEBUG connection completed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, fd=17, on_close=onConnectionClosed, client) at 7f8192eb17d0> (from xxxx:xxxx:120:cd8::90a1:54014) DEBUG <SSLSocketConnectorIPv6 at 0x7f8192eb1850 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 54014), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)> closed in recv DEBUG connection closed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, closed, client) at 7f8192eb17d0> ERROR Connection to <StorageNode(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, state=RUNNING, connection=None, not identified) at 7f81a8874690> failed S2 DEBUG accepted a connection from xxxx:xxxx:120:cd8::90a1:54014 DEBUG SSL handshake done for <SSLSocketConnectorIPv6 at 0x7f657144a910 fileno 22 ('xxxx:xxxx:60:4c2c::25c3', 39085), opened from ('xxxx:xxxx:120:cd8::90a1', 54014)>: ECDHE-RSA-AES256-GCM-SHA384 256 DEBUG connection completed for <ServerConnection(uuid=None, address=[xxxx:xxxx:120:cd8::90a1]:54014, handler=IdentificationHandler, fd=22, server) at 7f657144a090> (from xxxx:xxxx:60:4c2c::25c3:39085) PACKET #0x0000 RequestIdentification < None ([xxxx:xxxx:120:cd8::90a1]:54014) | (<EnumItem CLIENT (2)>, -536870904, None, '...', [], 1535555463.455761) DEBUG connection closed for <ServerConnection(uuid=None, address=[xxxx:xxxx:120:cd8::90a1]:54014, handler=IdentificationHandler, closed, server) at 7f657144a090> WARNING A connection was lost during identification ERROR Pre-mortem data: ERROR Traceback (most recent call last): ERROR File "neo/storage/app.py", line 194, in run ERROR self._run() ERROR File "neo/storage/app.py", line 225, in _run ERROR self.doOperation() ERROR File "neo/storage/app.py", line 310, in doOperation ERROR poll() ERROR File "neo/storage/app.py", line 134, in _poll ERROR self.em.poll(1) ERROR File "neo/lib/event.py", line 160, in poll ERROR to_process.process() ERROR File "neo/lib/connection.py", line 499, in process ERROR self._handlers.handle(self, self._queue.pop(0)) ERROR File "neo/lib/connection.py", line 85, in handle ERROR self._handle(connection, packet) ERROR File "neo/lib/connection.py", line 100, in _handle ERROR pending[0][1].packetReceived(connection, packet) ERROR File "neo/lib/handler.py", line 123, in packetReceived ERROR self.dispatch(*args) ERROR File "neo/lib/handler.py", line 72, in dispatch ERROR method(conn, *args, **kw) ERROR File "neo/storage/handlers/identification.py", line 56, in requestIdentification ERROR assert not node.isConnected(), node ERROR AssertionError: <ClientNode(uuid=C8, state=RUNNING, connection=<ServerConnection(uuid=C8, address=[xxxx:xxxx:120:cd8::90a1]:53970, handler=ClientOperationHandler, fd=18, on_close=onConnectionClosed, server) at 7f657147d7d0>) at 7f65714d6cd0>
-
- 03 Sep, 2018 1 commit
-
-
Julien Muchembled authored
Traceback (most recent call last): File "neo/tests/functional/testClient.py", line 241, in testExport self.assertEqual(dump, self.__dump(neo_db.storage, list)) File "neo/tests/functional/testClient.py", line 210, in __dump for t in storage.iterator()} File "neo/tests/functional/testClient.py", line 207, in <dictcomp> return {u64(t.tid): sorted((u64(o.oid), o.data_txn and u64(o.data_txn), File "neo/client/iterator.py", line 69, in iterator max_tid, chunk = app.transactionLog(start, stop, CHUNK_LENGTH) File "neo/client/app.py", line 841, in transactionLog Packets.AskTIDsFrom(start, stop, limit, offset)) File "neo/client/app.py", line 296, in _askStorageForRead return askStorage(conn, packet) File "neo/client/app.py", line 164, in _askStorage return self._ask(conn, packet, handler=self.storage_handler, **kw) File "neo/lib/threaded_app.py", line 144, in _ask _handlePacket(qconn, qpacket, kw, handler) File "neo/lib/threaded_app.py", line 133, in _handlePacket handler.dispatch(conn, packet, kw) File "neo/lib/handler.py", line 72, in dispatch method(conn, *args, **kw) File "neo/lib/handler.py", line 208, in error getattr(self, Errors[code])(conn, message) File "neo/lib/handler.py", line 227, in backendNotImplemented raise NotImplementedError(message) NotImplementedError: neo.storage.database.importer.ImporterDatabaseManager does not implement getReplicationTIDList
-
- 13 Aug, 2018 1 commit
-
-
Julien Muchembled authored
Resulting trace files can be reused to simulate a workload and test new cache algorithms.
-
- 07 Aug, 2018 2 commits
-
-
Julien Muchembled authored
Besides the use of another module for option parsing, the main change is that there's no more Config class that mixes configuration for different components. Application classes now takes a simple 'dict' with parsed values. The changes in 'neoctl' are somewhat ugly, because command-line options are not defined on the command-line class, but this component is likely to disappear in the future. It remains possible to pass options via a configuration file. The code is a bit complex but isolated in neo.lib.config For SSL, the code may be simpler if we change for a single --ssl option that takes 3 paths. Not done to not break compatibility. Hence, the hack with an extra OptionList class in neo.lib.app A new functional test tests the 'neomigrate' script, instead of just the internal API to migrate data.
-
Julien Muchembled authored
-
- 06 Aug, 2018 1 commit
-
-
Julien Muchembled authored
This mainly concerns the storage node, and depending on how its polling loop is changed, the following crash could happen again during replication: File "neo/scripts/neostorage.py", line 66, in main app.run() File "neo/storage/app.py", line 147, in run self._run() File "neo/storage/app.py", line 178, in _run self.doOperation() File "neo/storage/app.py", line 258, in doOperation _poll(0) File "neo/lib/event.py", line 231, in _poll conn.writable() File "neo/lib/connection.py", line 418, in writable if self.connector.send(): File "neo/lib/connector.py", line 179, in send n = self.socket.send(msg) File "ssl.py", line 719, in send v = self._sslobj.write(data) OverflowError: string longer than 2147483647 byte
-
- 30 Jul, 2018 1 commit
-
-
Julien Muchembled authored
-
- 16 Jul, 2018 1 commit
-
-
Julien Muchembled authored
-
- 22 Jun, 2018 2 commits
-
-
Julien Muchembled authored
This commit adds a contraint when tweaking the partition table with replicas, so that cells of each partition are assigned as far as possible from each other, e.g. not on the same machine even if each one has several disks, and in any case not on the same storage device. Currently, the topology path of each node is automatically calculated by the storage backend. Both MySQL and SQLite return a 2-tuple (host, st_dev). To be improved: - Add a storage option to override the path: the 'tweak' algorithm can already handle topology paths of any length, so something like (room, machine, disk) could be done easily. - Write OS-specific code to determine the real hardware behind st_dev (e.g. 2 different 'st_dev' values may actually refer to the same disk, because of layers like partitioning, device-mapper, loop, btrfs subvolumes, and so on). - Make 'neoctl' report in some way if the PT is optimal. Meanwhile, if it isn't, the master only logs a WARNING during tweak.
-
Julien Muchembled authored
This is a follow-up of commit b3dd6973 ("Optimize resumption of replication by starting from a greater TID"). I missed the case where a storage node is restarted while it is replicating: it lost the TID where it was interrupted. Although we commit after each replicated chunk, to avoid transferring again all the data from the beginning, it could still waste time to check that the data are already replicated.
-
- 21 Jun, 2018 1 commit
-
-
Julien Muchembled authored
-
- 19 Jun, 2018 4 commits
-
-
Julien Muchembled authored
Otherwise, they were either lost or flushed to a file of a next test.
-
Julien Muchembled authored
The bug is likely to be in the test rather than in NEO.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 04 Jun, 2018 1 commit
-
-
Julien Muchembled authored
-
- 31 May, 2018 1 commit
-
-
Kirill Smelkov authored
NEO functional tests use pdb.wait() in a few places, for example in NEOCluster .run(), .start() and .expectCondition(). The wait implementation uses polling with exponentially growing wait period. With the following instrumentation --- a/neo/tests/cluster.py +++ b/neo/tests/cluster.py @@ -236,6 +236,7 @@ def wait(self, test, timeout): return False finally: cluster_dict.release() + print 'next_sleep:', next_sleep sleep(next_sleep) return True during execution of functional tests it is not uncommon to see the following sleep periods next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.15 next_sleep: 0.225 next_sleep: 0.3375 next_sleep: 0.50625 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.15 next_sleep: 0.225 next_sleep: 0.3375 next_sleep: 0.50625 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.15 next_sleep: 0.225 next_sleep: 0.3375 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.1 next_sleep: 0.15 next_sleep: 0.225 next_sleep: 0.3375 next_sleep: 0.50625 . Without going into reworking the wait mechanism to use real notifications instead of polling, it was observed that the exponential progression tends to create too coarse sleeps. Initial 0.1s interval was found to be also too much. This patch remove the exponential period growth and reduces period by order of one magnitude. For functional tests timings on my computer it is thus: before patch: Functional tests 28 Tests, 0 Failed Title : TestRunner Date : 2018-04-04 Node : deco Machine : x86_64 System : Linux Python : 2.7.14 Directory : /tmp/neo_tests/1522868674.115798 Status : 100.000% NEO_TESTS_ADAPTER : SQLite NEO TESTS REPORT Test Module | run | unexpected | expected | skipped | time --------------------------+-------+------------+----------+---------+---------- Client | 6 | . | . | . | 8.51s Cluster | 7 | . | . | . | 9.84s Master | 4 | . | . | . | 9.68s Storage | 11 | . | . | . | 20.76s --------------------------+-------+------------+----------+---------+---------- neo.tests.functional | | | | | --------------------------+-------+------------+----------+---------+---------- Summary | 28 | . | . | . | 48.79s --------------------------+-------+------------+----------+---------+---------- after patch: Functional tests 28 Tests, 0 Failed Title : TestRunner Date : 2018-04-04 Node : deco Machine : x86_64 System : Linux Python : 2.7.14 Directory : /tmp/neo_tests/1522868527.624376 Status : 100.000% NEO_TESTS_ADAPTER : SQLite NEO TESTS REPORT Test Module | run | unexpected | expected | skipped | time --------------------------+-------+------------+----------+---------+---------- Client | 6 | . | . | . | 7.38s Cluster | 7 | . | . | . | 7.05s Master | 4 | . | . | . | 8.22s Storage | 11 | . | . | . | 19.22s --------------------------+-------+------------+----------+---------+---------- neo.tests.functional | | | | | --------------------------+-------+------------+----------+---------+---------- Summary | 28 | . | . | . | 41.87s --------------------------+-------+------------+----------+---------+---------- in other words ~ 10% improvement for the whole time to run functional tests. /reviewed-by @vpelletier, @jm /reviewed-on nexedi/neoppod!10
-
- 30 May, 2018 6 commits
-
-
Julien Muchembled authored
/reviewed-on !9
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
I made a mistake in commit 13a64cfe ("Simplify definition of packets by computing automatically their codes"). My intention was that the code an answer packet continues to only differ by the highest bit, as implemented now by this commit. Before: 0x0001, 0x8002 Ask1, Answer1 0x0003 Notify2 0x0004, 0x8005 Ask3, Answer3 0x0006, 0x8007 Ask4, Answer4 After: 0x0001, 0x8001 Ask1, Answer1 0x0002 Notify2 0x0003, 0x8003 Ask3, Answer3 0x0004, 0x8004 Ask4, Answer4 This makes the protocol easier to document. And by not wasting the range of possible values, it seems we have enough space to shrink to a single byte. This also removes code that became meaningless since that codes are generated automatically.
-
Julien Muchembled authored
Although data that are already transferred aren't transferred again, checking that the data are there for a whole partition can still be a lot of work for big databases. This commit is a major performance improvement in that a storage node that gets disconnected for a short time now gets fully operational quite instantaneously because it only has to replicate the new data. Before, the time to recover depended on the size of the DB. For OUT_OF_DATE cells, the difficult part was that they are writable and can then contain holes, so we can't just take the last TID in trans/obj (we wrongly did that at the beginning, and then committed 6b1f198f as a workaround). We solve that by storing up to where it was up-to-date: this value is initialized from the last TIDs in trans/obj when the state switches from UP_TO_DATE/FEEDING. There's actually one such OUT_OF_DATE TID per assigned cell (backends store these values in the 'pt' table). Otherwise, a cell that still has a lot to replicate would still cause all other cells to resume from the a very small TID, or even ZERO_TID; the worse case is when a new cell is assigned to a node (as a result of tweak). For UP_TO_DATE cells of a backup cluster, replication was resumed from the maximum TID at which all assigned cells are known to be fully replicated. Like for OUT_OF_DATE cells, the presence of a late cell could cause a lot of extra work for others, the worst case being when setting up a backup cluster (it always restarted from ZERO_TID as long as at least 1 cell was still empty). Because UP_TO_DATE cells are guaranteed to have no holes, there's no need to store extra information: we simply look at the last TIDs in trans/obj. We even handle trans & obj independently, to minimize the work in 1 table (i.e. trans since it's processed first) if the other is late (obj). There's a small change in the protocol so that OUT_OF_DATE enum value equals 0. This way, backends can store the OUT_OF_DATE TID (verbatim) in the same column as the cell state. Note about MySQL changes in commit ca58ccd7: what we did as a workaround is not one any more. Now, we do so much on Python side that it's unlikely we could reduce the number of queries using GROUP BY. We even stopped doing that for SQLite.
-
Julien Muchembled authored
-
- 25 May, 2018 1 commit
-
-
Julien Muchembled authored
-
- 24 May, 2018 7 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
It was confusing and there's already the 'Unlock TXN' log just before abort() is called (in this case, it's more a cleanup than an abort).
-
Julien Muchembled authored
Future migration steps are likely to alter tables, possibly with transformation of data, and this is complicated for both supported backend.
-
Julien Muchembled authored
-
Julien Muchembled authored
Some changes in the storage format are minor and applying them automatically would cost too much for big databases. Here, we apply them manually so that testStorageUpgrade will be able to compare dumps. We hope however that with improvements like https://jira.mariadb.org/browse/MDEV-12836 we'll be able to implement more migration steps and revert parts of this commit.
-
Julien Muchembled authored
These dumps were generated with an old version of NEO, plus a backport of the test that will use them. In MySQL dumps, --hex-blob was used only for inserts in the 'data' table.
-