Commits · 24fb475793e75c1f658e8ad08464e6595f8d1492 · nexedi / neoppod

15 Jun, 2023 2 commits

Connection: Adjust msg_id a bit so it behaves like stream_id in HTTP/2 · 24fb4757

Kirill Smelkov authored Dec 18, 2016

This is 2020 edition of my original patch from 2016 ( kirr/neo@dd3bb8b4 ).

It was described in my NEO/go article ( https://navytux.spb.ru/~kirr/neo.html )
in the text quoted below:

Then comes the link layer which provides service to exchange messages over
network. In current NEO/py every message has `msg_id` field, that similarly to
ZEO/py marks a request with serial number with requester then waiting for
corresponding answer to come back with the same message id. Even though there
might be several reply messages coming back to a single request, as e.g. NEO/py
asynchronous replication code[0], this approach is still similar to ZEO/py
remote procedure call (RPC) model because of single request semantic. One of
the places where this limitation shows is the same replicator code where
transactions metadata is fetched first with first series of RPC calls, and only
then object data is fetched with the second series of RPC calls. This could be
not very good e.g. in case when there is a lot of transactions/data to
synchronize, because 1) it puts assumption on, and so constraints, the storage
backend model on how data is stored (separate SQL tables for metadata and
data), and 2) no data will be synchronized at all until all transactions are
synchronized first. The second point prevents for example the syncing storage
in turn to provide, even if read-only, service for the already fetched data.
What would be maybe more useful is for requester to send request that it wants
to fetch ZODB data in `tid_min..tid_max` range and then the sender sending
intermixed stream of metadata/data in zodbdump-like format.

Keeping in mind this, and other examples, NEO/go shifts from thinking about
protocol logic as RPC to thinking of it as more general network protocol and
settles to provide general connection-oriented message exchange service[1] :
whenever a message with new `msg_id` is sent, a new connection is established
multiplexed on top of a single node-node TCP link. Then it is possible to
send/receive arbitrary messages over back and forth until so established
connection is closed. This works transparently to NEO/py who still thinks it
operates in simple RPC mode because of the way messages are put on the wire and
because simple RPC is subset of a general exchange. The `neonet` module also
provides `DialLink` and `ListenLink` primitives[2] that work similarly to
standard Go `net.Dial` and `net.Listen` but wrap so created link into the
multiplexing layer. What is actually done this way is very similar to HTTP/2
which also provides multiple general streams multiplexing on top of a single
TCP connection ([3], [4]). However if connection ids (sent in place of
`msg_id` on the wire) are assigned arbitrary, there could be a case when two
nodes could try to initiate two new different connections to each other with
the same connection id. To prevent such kind of conflict a simple rule to
allocate connection ids either even or odd, depending on the role peer played
while establishing the link, could be used. HTTP/2 takes similar approach[5]
where `"Streams initiated by a client MUST use odd-numbered stream identifiers;
those initiated by the server MUST use even-numbered stream identifiers."` with
NEO/go doing the same corresponding to who was originally dialer and who was a
listener. However it requires small patch to be applied on NEO/py side to
increment `msg_id` by 2 instead of 1.

NEO/py currently explicitly specifies `msg_id` for an answer in only limited
set of cases, by default assuming a reply comes to the last received message
whose `msg_id` it remembers globally per TCP-link. This approach is error-prone
and cannot generally work in cases where several simultaneous requests are
received over single link. This way NEO/go does not maintain any such global
per-link knowledge and handles every request by always explicitly using
corresponding connection object created at request reception time.

[0] https://lab.nexedi.com/kirr/neo/blob/463ef9ad/neo/storage/replicator.py
[1] https://lab.nexedi.com/kirr/neo/blob/463ef9ad/go/neo/neonet/connection.go
[2] https://lab.nexedi.com/kirr/neo/blob/463ef9ad/go/neo/neonet/newlink.go
[3] https://tools.ietf.org/html/rfc7540#section-5
[4] https://http2.github.io/faq/#why-is-http2-multiplexed
[5] https://tools.ietf.org/html/rfc7540#section-5.1.1

It can be criticized, but the fact is:

- it does no harm to NEO/py and is backward-compatible: a NEO/py node
without this patch can still successfully connect and interoperate to
another NEO/py node with this patch.

- it is required for NEO/go to be able to interoperate with NEO/py.
Both client and server parts of NEO/go use the same neonet module to exchange messages.

- NEO/go client is used by wendelin.core 2, which organizes access to on-ZODB
ZBigFile data via WCFS filesystem implemented in Go.

So on one side this patch is small, simple and does not do any harm to NEO/py.
On the other side it is required for NEO/go and wendelin.core 2.

To me this clearly indicates that there should be no good reason to reject
inclusion of this patch into NEO/py.

--------

My original patch from 2016 came with corresponding adjustments to neo/tests/testConnection.py
( kirr/neo@dd3bb8b4 )
but commit f6eb02b4 (Remove packet timeouts; 2017-05-04) removed testConnection.py
completely and, if I understand correctly, did not add any other test to
compensate that. This way I'm not trying to restore my tests to
Connection neither.

Anyway, with this patch there is no regression to all other existing NEO/py tests.

--------

My original patch description from 2016 follows:

- even for server initiated streams
- odd for client initiated streams

This way I will be able to use Pkt.msg_id as real stream_id in go's Conn
because with even / odd scheme there is no possibility for id conflicts
in between two peers.

24fb4757

Add branch for future WC2 compatibility mode · 3ef0fb47

Levin Zimmermann authored Jun 15, 2023

Between NEO/go and NEO/py there are various incompatibilities.
Similarity to the 'wc2' branch [1], this branch aims to transparently
communicate those incompatibilities.

Unlike the 'wc2' branch, we apply those patches here on an up-to-date
recent NEO/py. The background is that current NEO/go version still uses
an old pre-msgpack protocol and didn't fully implement the new msgpack
protocol yet. So the 'wc2' branch still builds upon an old NEO/py
version (1.12, from the oldproto branch). The diff between the
wc2-future branch and master on the other hand aims to be as minimal
as possible and to only contain the compatibility patches.

[1] https://lab.nexedi.com/nexedi/neoppod/tree/wc2 and nexedi/neoppod@739096b7

3ef0fb47

04 Apr, 2023 4 commits
- debug: add an example to profile with yappi · fd87e153
  Julien Muchembled authored Mar 27, 2023
  
  fd87e153
- storage: small code simplification · 3f516cd6
  Julien Muchembled authored Apr 04, 2023
  
  3f516cd6
- Small optimization in epoll loop · f7f8533a
  Julien Muchembled authored Mar 11, 2023
  
  f7f8533a
- sqlite3: fix rare strange exception when table does not exist · 90a5aa17
  Julien Muchembled authored Apr 01, 2023
```
Found by running testPruneOrphan many times. Once I even got:

  SystemError: NULL result without error in PyObject_Call
```
  90a5aa17
09 Mar, 2023 2 commits
- qa: do never import MySQL-specific code when testing SQLite · 40b6ba64
  Julien Muchembled authored Mar 09, 2023
  
  40b6ba64
- qa: fix interface check of SQLite & MySQL implementations · cf8f2028
  Julien Muchembled authored Mar 03, 2023
```
The reverts a wrong change in commit 30a02bdc
("importer: new option to write back new transactions to the source database").
```
  cf8f2028
19 Feb, 2023 1 commit
- mysql: drop support for TokuDB · ecc9c63c
  Julien Muchembled authored Feb 19, 2023
  
  ecc9c63c
16 Feb, 2023 2 commits
- qa: update a test for recent ZConfig · b83668e7
  Julien Muchembled authored Feb 15, 2023
  
  b83668e7
- Add support for PyPy & PyMySQL · 6153a752
  Julien Muchembled authored Feb 13, 2023
  
  6153a752
14 Feb, 2023 3 commits
- fixup! Drop support for ZODB3 · c29b8b2d
  Julien Muchembled authored Feb 13, 2023
  
  c29b8b2d
- mysql: minor optimization · f844fe0b
  Julien Muchembled authored Feb 13, 2023
```
It's been many years we don't get 'array' objects, no idea when exactly.
```
  f844fe0b
- Rename mysqldb.py to mysql.py · 9691459d
  Julien Muchembled authored Feb 14, 2023
  
  9691459d
10 Feb, 2023 1 commit

sqlite: minor optimization · 933412cd

Julien Muchembled authored Feb 10, 2023

Like commit 243c1a0f
("sqlite: optimize storage of metadata"), the fake changes in test
data are because we don't force upgrade for this optimization.

933412cd

02 Feb, 2022 1 commit

Fix breakage with zodbpickle >= 2 · d5afef8e

Kirill Smelkov authored Feb 02, 2022

Starting from zodbpickle 2 its binary class does not allow users to set
arbitrary attributes and so

	binary._pack = bytes.__str__

fails with

	TypeError: can't set attributes of built-in/extension type 'zodbpickle.binary'

-> Fix it by explicitly checking for binary type on encoding instead of
setting binary._pack

See nexedi/slapos@27f574bc for pre-history.

/cc @jerome

d5afef8e

04 Jun, 2021 1 commit

admin: fix crash if not operational and a downstream cluster is RUNNING · 7f81ac2d

Julien Muchembled authored Jun 03, 2021

Traceback (most recent call last):
  ...
  File ".../neo/lib/handler.py", line 75, in dispatch
    method(conn, *args, **kw)
  File ".../neo/admin/handler.py", line 174, in wrapper
    return func(self, name, *args, **kw)
  File ".../neo/admin/handler.py", line 190, in notifyMonitorInformation
    self.app.updateMonitorInformation(name, **info)
  File ".../neo/admin/app.py", line 290, in updateMonitorInformation
    self._notify(self.operational)
  File ".../neo/admin/app.py", line 315, in _notify
    body += '', name, '    ' + backup.formatSummary(upstream)[1]
  File ".../neo/admin/app.py", line 83, in formatSummary
    tid = self.ltid
AttributeError: 'Backup' object has no attribute 'ltid'

7f81ac2d

11 May, 2021 1 commit
- neoctl: fix tweak command when used without any argument · ba0bc779
  Julien Muchembled authored May 11, 2021
  
  ba0bc779
02 Apr, 2021 5 commits
- qa: more Importer tests · de0feb4e
  Julien Muchembled authored Mar 30, 2021
  
  de0feb4e
- qa: at the end of each ZODB test, check there is no storage space leak · 34d0725e
  Julien Muchembled authored Mar 17, 2021
  
  34d0725e
- qa: when comparing replicas, checksum metadata & data rather than only keys · 28e097c8
  Julien Muchembled authored Mar 12, 2021
  
  28e097c8
- PartitionTable: small optimization · 60bcbc5c
  Julien Muchembled authored Apr 02, 2021
  
  60bcbc5c
- PartitionTable: rename getAssignedPartitionList to getReadableOffsetList · aa48adf9
  Julien Muchembled authored Mar 12, 2021
  
  aa48adf9
22 Mar, 2021 1 commit
- qa: renew certificates for tests · fa581be5
  Julien Muchembled authored Mar 22, 2021
  
  fa581be5
04 Mar, 2021 2 commits
- Drop support for ZODB3 · 3a8f6f03
  Julien Muchembled authored Mar 04, 2021
  
  3a8f6f03
- importer: fix assertion failure when loading a deleted oid that is fully imported · 414573b9
  Julien Muchembled authored Mar 04, 2021
  
  414573b9
15 Jan, 2021 2 commits

ssl: don't care whether EOF is ragged or not · d98205d0

Julien Muchembled authored Jan 15, 2021

The purpose of suppress_ragged_eofs=False was to micro-optimize the
normal case: when there's no EOF.

But commit 061cd5d8 showed that this
option only turns ragged EOF into an exception. It may be easier for
alternate NEO implementations to close the SSL connection properly. Or
the performance benefit was not worth the risk to freeze a NEO process.

d98205d0

ssl: Don't ignore non-ragged EOF · 061cd5d8

Kirill Smelkov authored Jan 13, 2021

Testing NEO/go client wrt NEO/py server revealed a bug in NEO/py SSL
handling: proper non-ragged EOF from a peer is ignored, and so leads to
hang in infinite loop inside _SSL.receive with read_buf memory growing
indefinitely. Details are below:

NEO/py wraps raw sockets with

	ssl.wrap_socket(suppress_ragged_eofs=False)

which instructs SSL layer to convert unexpected EOF when receiving a TLS
record into SSLEOFError exception. However when remote peer properly
closes its side of the connection, socket.read() still returns b'' to
report non-ragged regular EOF:

https://github.com/python/cpython/blob/v2.7.18/Lib/ssl.py#L630-L650

The code was handling SSLEOFError but not b'' return from socket recv.
Thus after NEO/go client was disconnecting and properly closing its side
of the connection, the code started to loop indefinitely in _SSL.receive
under `while 1` with  b'' returned by self.socket.recv() appended to
read_buf again and again.

-> Fix it by detecting non-ragged EOF as well and, similarly to how
SSLEOFError is handled, converting them into self._error('recv', None).

See merge request nexedi/neoppod!17

061cd5d8

11 Jan, 2021 4 commits
- client: fix relative import · 261dd4b4
  Julien Muchembled authored Jan 06, 2021
  
  261dd4b4
- qa: add testStorageGettingReadyDuringRecovery · 80d180e7
  Julien Muchembled authored Dec 17, 2020
  
  80d180e7
- master: ignore late AnswerInformationLocked during recovery · b22847d2
  Julien Muchembled authored Dec 15, 2020
  
  b22847d2
- master: simplify verification by ignoring completely nodes without readable cells · a760258b
  Julien Muchembled authored Nov 06, 2020
```
The scenario that was described in comments was meaningless
because S1 never goes out-of-date.
```
  a760258b
02 Oct, 2020 1 commit

Fix handling of -m/--masters arg · fa63d856

Julien Muchembled authored Oct 02, 2020

For the master, the purpose of -m/--masters is to specify addresses
of other master nodes, since its own address is already known via
-b/--bind. Therefore, an empty value for -m/--masters is valid.
The user remains free to repeat the -b value in -m.

More generally, a node may choose to only specify master addresses
via -D/--dynamic-master-list, so the check that at least one master
address is specified is moved where the NodeManager is expected to be
initialized.

fa63d856

29 Sep, 2020 1 commit
- Remove dead code · c34d332f
  Julien Muchembled authored Sep 29, 2020
  
  c34d332f
25 Sep, 2020 4 commits

storage: show whether transaction is voted in its __repr__ · c1c26894
Julien Muchembled authored May 28, 2020

c1c26894

New algorithm for deadlock avoidance · 5e7f34d2

Julien Muchembled authored Jul 25, 2019

The time complexity of previous one was too bad. With several tens of
concurrent transactions, we saw commits take minutes to complete and
the whole application looked frozen.

This new algorithm is much simpler. Instead of asking the oldest
transaction to somewhat restart (we used the "rebase" term because
the concept was similar to what git-rebase does), the storage gives
it priority and the newest is asked to relock (this request is ignored
if vote already happened, which means there was actually no deadlock).

testLocklessWriteDuringConflictResolution was initially more complex
because Transaction.written (client) ignored KeyError (which is not the
case anymore since commit 8ef1ddba).

5e7f34d2

qa: deindent code · d98b576c
Julien Muchembled authored Aug 13, 2020

d98b576c
Update comments · dbf128b7
Julien Muchembled authored Jul 25, 2019

dbf128b7

10 Sep, 2020 2 commits

storage: commit from time to time when truncating · 910d1e91

Julien Muchembled authored Sep 10, 2020

This is all the more important for RocksDB that it wants to keep all
transaction work in RAM.

Once we had to truncate 40% of a 1TB MyRocks DB with 24 partitions,
4 being processed in parallel. Even when committing between partitions,
MariaDB used up to 200 GB. Without the commit, 1TB RAM would not have
been enough.

910d1e91

mysql: make sure any configured session variable is not silently capped · d14de83e
Julien Muchembled authored Sep 10, 2020

d14de83e