Commits · 60bcbc5c04edafb4160618f885ba3b5b1e483ad3 · nexedi / neoppod

02 Apr, 2021 2 commits
- PartitionTable: small optimization · 60bcbc5c
  Julien Muchembled authored Apr 02, 2021
  
  60bcbc5c
- PartitionTable: rename getAssignedPartitionList to getReadableOffsetList · aa48adf9
  Julien Muchembled authored Mar 12, 2021
  
  aa48adf9
22 Mar, 2021 1 commit
- qa: renew certificates for tests · fa581be5
  Julien Muchembled authored Mar 22, 2021
  
  fa581be5
04 Mar, 2021 2 commits
- Drop support for ZODB3 · 3a8f6f03
  Julien Muchembled authored Mar 04, 2021
  
  3a8f6f03
- importer: fix assertion failure when loading a deleted oid that is fully imported · 414573b9
  Julien Muchembled authored Mar 04, 2021
  
  414573b9
15 Jan, 2021 2 commits

ssl: don't care whether EOF is ragged or not · d98205d0

Julien Muchembled authored Jan 15, 2021

The purpose of suppress_ragged_eofs=False was to micro-optimize the
normal case: when there's no EOF.

But commit 061cd5d8 showed that this
option only turns ragged EOF into an exception. It may be easier for
alternate NEO implementations to close the SSL connection properly. Or
the performance benefit was not worth the risk to freeze a NEO process.

d98205d0

ssl: Don't ignore non-ragged EOF · 061cd5d8

Kirill Smelkov authored Jan 13, 2021

Testing NEO/go client wrt NEO/py server revealed a bug in NEO/py SSL
handling: proper non-ragged EOF from a peer is ignored, and so leads to
hang in infinite loop inside _SSL.receive with read_buf memory growing
indefinitely. Details are below:

NEO/py wraps raw sockets with

	ssl.wrap_socket(suppress_ragged_eofs=False)

which instructs SSL layer to convert unexpected EOF when receiving a TLS
record into SSLEOFError exception. However when remote peer properly
closes its side of the connection, socket.read() still returns b'' to
report non-ragged regular EOF:

https://github.com/python/cpython/blob/v2.7.18/Lib/ssl.py#L630-L650

The code was handling SSLEOFError but not b'' return from socket recv.
Thus after NEO/go client was disconnecting and properly closing its side
of the connection, the code started to loop indefinitely in _SSL.receive
under `while 1` with  b'' returned by self.socket.recv() appended to
read_buf again and again.

-> Fix it by detecting non-ragged EOF as well and, similarly to how
SSLEOFError is handled, converting them into self._error('recv', None).

See merge request !17

061cd5d8

11 Jan, 2021 4 commits
- client: fix relative import · 261dd4b4
  Julien Muchembled authored Jan 06, 2021
  
  261dd4b4
- qa: add testStorageGettingReadyDuringRecovery · 80d180e7
  Julien Muchembled authored Dec 17, 2020
  
  80d180e7
- master: ignore late AnswerInformationLocked during recovery · b22847d2
  Julien Muchembled authored Dec 15, 2020
  
  b22847d2
- master: simplify verification by ignoring completely nodes without readable cells · a760258b
  Julien Muchembled authored Nov 06, 2020
```
The scenario that was described in comments was meaningless
because S1 never goes out-of-date.
```
  a760258b
02 Oct, 2020 1 commit

Fix handling of -m/--masters arg · fa63d856

Julien Muchembled authored Oct 02, 2020

For the master, the purpose of -m/--masters is to specify addresses
of other master nodes, since its own address is already known via
-b/--bind. Therefore, an empty value for -m/--masters is valid.
The user remains free to repeat the -b value in -m.

More generally, a node may choose to only specify master addresses
via -D/--dynamic-master-list, so the check that at least one master
address is specified is moved where the NodeManager is expected to be
initialized.

fa63d856

29 Sep, 2020 1 commit
- Remove dead code · c34d332f
  Julien Muchembled authored Sep 29, 2020
  
  c34d332f
25 Sep, 2020 4 commits

storage: show whether transaction is voted in its __repr__ · c1c26894
Julien Muchembled authored May 28, 2020

c1c26894

New algorithm for deadlock avoidance · 5e7f34d2

Julien Muchembled authored Jul 25, 2019

The time complexity of previous one was too bad. With several tens of
concurrent transactions, we saw commits take minutes to complete and
the whole application looked frozen.

This new algorithm is much simpler. Instead of asking the oldest
transaction to somewhat restart (we used the "rebase" term because
the concept was similar to what git-rebase does), the storage gives
it priority and the newest is asked to relock (this request is ignored
if vote already happened, which means there was actually no deadlock).

testLocklessWriteDuringConflictResolution was initially more complex
because Transaction.written (client) ignored KeyError (which is not the
case anymore since commit 8ef1ddba).

5e7f34d2

qa: deindent code · d98b576c
Julien Muchembled authored Aug 13, 2020

d98b576c
Update comments · dbf128b7
Julien Muchembled authored Jul 25, 2019

dbf128b7

10 Sep, 2020 3 commits

storage: commit from time to time when truncating · 910d1e91

Julien Muchembled authored Sep 10, 2020

This is all the more important for RocksDB that it wants to keep all
transaction work in RAM.

Once we had to truncate 40% of a 1TB MyRocks DB with 24 partitions,
4 being processed in parallel. Even when committing between partitions,
MariaDB used up to 200 GB. Without the commit, 1TB RAM would not have
been enough.

910d1e91

mysql: make sure any configured session variable is not silently capped · d14de83e
Julien Muchembled authored Sep 10, 2020

d14de83e
mysql: set rocksdb_max_row_locks to the maximum allowed value · edc63c0f
Julien Muchembled authored Sep 10, 2020
```
The default value is quickly exceeded when truncating a DB.
Obviously, you may need a lot of RAM.
```
edc63c0f

04 Sep, 2020 1 commit
- mysql: split queries to avoid exceeding max_allowed_packet when pruning data · a8b9ec0c
  Julien Muchembled authored Apr 22, 2019
  
  a8b9ec0c
21 Aug, 2020 1 commit

qa: fix node names in threaded test logs · e02dffd2

Julien Muchembled authored Aug 21, 2020

Resetting a storage node could mark all TEST log entries as being
emitted by this storage node. For example:

16:18:12.9114 S2         #0x0007 AskStoreObject                 > S1 (...)

e02dffd2

25 Jun, 2020 1 commit
- neolog: new --color option · fb746e6b
  Julien Muchembled authored Jun 15, 2020
  
  fb746e6b
24 Jun, 2020 1 commit
- master: add support for PyPy · 497edbe1
  Julien Muchembled authored Jun 23, 2020
  
  497edbe1
12 Jun, 2020 1 commit

qa: skip broken ZODB test · f4cb59d2

Julien Muchembled authored Jun 12, 2020

======================================================================
FAIL: check_tid_ordering_w_commit (neo.tests.zodb.testBasic.BasicTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "ZODB/tests/BasicStorage.py", line 397, in check_tid_ordering_w_commit
    self.assertEqual(results.pop('lastTransaction'), tids[1])
  File "neo/tests/__init__.py", line 301, in assertEqual
    return super(NeoTestBase, self).assertEqual(first, second, msg=msg)
failureException: '\x03\xd8\x85H\xbffp\xbb' != '\x03\xd8\x85H\xbfs\x0b\xdd'

f4cb59d2

11 Jun, 2020 1 commit
- client: fix race with invalidations when starting a new transaction on ZODB 5 · a7d101ec
  Julien Muchembled authored Jun 05, 2020
```
This requires ZODB >= 5.6.0
```
  a7d101ec
29 May, 2020 3 commits
- stress: add support for old nftables · fc58c089
  Julien Muchembled authored May 29, 2020
  
  fc58c089
- stress: extend -b option to log everything · f21cdd86
  Julien Muchembled authored May 29, 2020
  
  f21cdd86
- debug: fix 'app' when using pdb in stress tool · e3414b6f
  Julien Muchembled authored May 28, 2020
  
  e3414b6f
18 May, 2020 1 commit

admin: fix monitoring timer after 2 identical consecutive checks · c611c48f

Julien Muchembled authored May 18, 2020

This fixes the bug that with only email notification, monitoring
stopped checking whether backup clusters are lagging after status is
unchanged since the last check (about lagging, what is compared is
the set of lagging backups). Until another event wakes up monitoring.

The code is also simplified in that there's no need for the moment to
have a different timeout between the normal case and a smtp failure.

c611c48f

20 Mar, 2020 1 commit
- admin: Qualify email sender address. · 2f782572
  Vincent Pelletier authored Mar 20, 2020
  
  2f782572
16 Mar, 2020 2 commits
- qa: add testProtocolVersionMismatch · f4725366
  Julien Muchembled authored Mar 16, 2020
  
  f4725366
- Code clean-up, comment fixes · 43029be2
  Julien Muchembled authored Feb 20, 2020
  
  43029be2
14 Feb, 2020 1 commit

master: fix tpc_finish possibly trying to kill too many nodes after client-storage failures · 82eea0cd

Julien Muchembled authored Feb 14, 2020

When concurrent transactions fail with different storages (e.g. only network
issues between C1-S2 and C2-S1), in such a way that each transaction can be
committed but not both (or the cluster would be non-operational), and if the
first transaction is aborted (between tpc_vote and tpc_finish), then the second
wrongly failed with INCOMPLETE_TRANSACTION.

And if both transactions could be committed (e.g. more than 1 replica),
some nodes would be disconnected for nothing.

82eea0cd

21 Jan, 2020 1 commit

admin: fix possible crash when monitoring a backup cluster that has just switch to BACKINGUP state · 5ee0b0a3

Julien Muchembled authored Jan 21, 2020

This fixes:

  Traceback (most recent call last):
    ...
    File "neo/admin/handler.py", line 200, in answerLastTransaction
      app.maybeNotify(name)
    File "neo/admin/app.py", line 380, in maybeNotify
      self._notify(False)
    File "neo/admin/app.py", line 302, in _notify
      body += '', name, '    ' + backup.formatSummary(upstream)[1]
    File "neo/admin/app.py", line 74, in formatSummary
      tid = self.backup_tid if backup else self.ltid
  AttributeError: 'Backup' object has no attribute 'backup_tid'

5ee0b0a3

10 Jan, 2020 1 commit

master: fix crash of backup master when disconnected from upstream while serving clients · 7e8ca9ec

Julien Muchembled authored Jan 10, 2020

This fixes:

  Traceback (most recent call last):
    File "neo/master/app.py", line 172, in run
      self._run()
    File "neo/master/app.py", line 182, in _run
      self.playPrimaryRole()
    File "neo/master/app.py", line 314, in playPrimaryRole
      self.backup_app.provideService())
    File "neo/master/backup_app.py", line 101, in provideService
      app.changeClusterState(ClusterStates.STARTING_BACKUP)
    File "neo/master/app.py", line 474, in changeClusterState
      ) or not node.isClient(), (state, node)
  AssertionError: (<EnumItem STARTING_BACKUP (4)>, <ClientNode(uuid=C1, state=RUNNING, connection=<ServerConnection(nid=C1, address=127.0.0.1:52430, handler=ClientReadOnlyServiceHandler, fd=59, on_close=onConnectionClosed, server) at 7f38f5628390>) at 7f38f5628ad0>)

7e8ca9ec

07 Jan, 2020 1 commit

admin: fix handling of immediate connection failure to upstream admin · e2b11d54

Julien Muchembled authored Jan 07, 2020

In such case, it didn't reconnect, but thought it was connected,
which eventually led to crashes like:

  Traceback (most recent call last):
    ...
    File "neo/admin/handler.py", line 130, in answerClusterState
      self.app.updateMonitorInformation(None, cluster_state=state)
    File "neo/admin/app.py", line 274, in updateMonitorInformation
      self.upstream_admin_conn.send(Packets.NotifyMonitorInformation(kw))
    File "neo/lib/connection.py", line 565, in send
      raise ConnectionClosed
  neo.lib.connection.ConnectionClosed

e2b11d54

26 Dec, 2019 2 commits
- client: merge load optimizations · 4d571267
  Julien Muchembled authored Dec 26, 2019
  
  4d571267
- Merge protocol v0 · 8ba42463
  Julien Muchembled authored Nov 27, 2019
  
  8ba42463
13 Nov, 2019 1 commit

admin: fix possible crash when connecting to upstream admin · d4603189

Julien Muchembled authored Nov 13, 2019

This fixes:

  Traceback (most recent call last):
    File "neo/scripts/neoadmin.py", line 31, in main
      app.run()
    File "neo/admin/app.py", line 179, in run
      self._run()
    File "neo/admin/app.py", line 199, in _run
      self.em.poll(1)
    File "neo/lib/event.py", line 155, in poll
      self._poll(blocking)
    File "neo/lib/event.py", line 220, in _poll
      if conn.readable():
    File "neo/lib/connection.py", line 487, in readable
      self._closure()
    File "neo/lib/connection.py", line 545, in _closure
      self.close()
    File "neo/lib/connection.py", line 534, in close
      handler.connectionFailed(self)
    File "neo/admin/handler.py", line 210, in connectionClosed
      app.connectToUpstreamAdmin()
    File "neo/admin/app.py", line 230, in connectToUpstreamAdmin
      None, None, self.name, None, {}))
    File "neo/lib/connection.py", line 574, in ask
      raise ConnectionClosed
  neo.lib.connection.ConnectionClosed

d4603189