1. 24 Sep, 2024 20 commits
    • Kirill Smelkov's avatar
      go/zodbpickle: Switch to unpickle with PyDict=y mode · 1e87fe5b
      Kirill Smelkov authored
      Previously we were unpickling dicts into map[any]any and it was making a
      difference from which Python version the dict was saved: if it was saved
      from py2 with str keys, then the dict needed to be accessed with ByteString
      keys, while if it was saved from py3 with str keys, the dict needed to
      be accessed with string keys. This peculiarity is one of the reasons why
      Map test currently fails on py3_pickle3 - because there - both in the
      test and inside Map.PySetState, dictionaries are accessed via ByteString
      keys which works only wrt py2-saved data.
      
      -> Fix it by switching the unpickler to work in PyDict=y mode(*) which
      results in dicts unpickled as ogórek.Dict instead of map[any]any
      with the ogórek.Dict considering ByteString to be equal to both string and Bytes
      with the same underlying content. This allows programs to access Dict via
      string/bytes keys following Python3 model, while still being able to handle
      dictionaries generated from under Python2.
      
      For ZODB/go this is change in behaviour exposed to outside. However
      there is currently only one known ZODB/go user - WCFS in Wendelin.core -
      and that user will be updated correspondingly as well.
      
      The change fixes the following failure in go/zodb tests
      
          --- FAIL: TestPersistentMapListLoad/py3_pickle3 (0.70s)
              persistent_x_test.go:57: persistent.mapping.PersistentMapping(0000000000000000): activate: pysetstate: noone of ["data" "_container"] is present in state dict
      
      but one more remains there which will be addressed in the next patch.
      
      (*) see https://github.com/kisielk/og-rek/pull/75 for details.
      1e87fe5b
    • Kirill Smelkov's avatar
      go/go.mode: v↑ github.com/kisielk/og-rek to pre-release with PyDict patches · 90799e25
      Kirill Smelkov authored
      We will need PyDict mode in the next patch to fix Map test on
      py3_pickle3 and so that working with unpickled dictionaries is easy
      independently from which Python version - py2 or py3 - they come.
      
      PyDict mode patch: https://github.com/kisielk/og-rek/commit/3bf6c92d
      		  (https://github.com/kisielk/og-rek/pull/75)
      
      Whole set of brought changes:
      
      	https://github.com/kisielk/og-rek/compare/b1bd3f2e3d5e...e691997e3596
      90799e25
    • Kirill Smelkov's avatar
      go/zodb/btree: go/zodb/fs1tools: Test it on all py2/py3 ZODB kinds of data we care about · 6841930a
      Kirill Smelkov authored
      Previously, like with FileStorage, we were testing btree/go only with
      data generated by python2 and pickle protocol=2. However even on py2
      there are more pickle protocols that are in use, and also there is python3.
      
      -> Modernize testdata/gen-testdata to use run_with_all_zodb_pickle_kinds
         that was recently added as part of nexedi/zodbtools@f9d36ba7
         and generate test data with both python2 and python3. It is handy to
         use py2py3-venv(*) to prepare python environment to do that.
      
         Adjust tests on Go side to verify how btree handles all generated zkinds.
      
      All py2_pickle1, py2_pickle2, py2_pickle3 and py3_pickle3 are handled well out of the box.
      
      (*) see nexedi/zodbtools@fac2f190
      6841930a
    • Kirill Smelkov's avatar
      go/zodb/fs1: Remove old testdata files · 0dd81a41
      Kirill Smelkov authored
      We recently modrnized FileStorage/go tests to have all ZODB kinds of
      data which live in testdata/<zkind>/ for example testdata/py2_pickle3/ -
      see recent patch "go/zodb/fs1: Test FileStorage on all py2/py3 ZODB
      kinds of data we care about".
      
      We also updated all other packages to use that new FileStorage testdata.
      
      -> Now is the time for old testdata to go.
      0dd81a41
    • Kirill Smelkov's avatar
      go/internal/xtesting: Switch to updated FileStorage testdata to validate LoadDBHistory · bfd9eeb4
      Kirill Smelkov authored
      Old fs1/testdata/1.fs will be going away -> we need to switch to using
      newly generated multi-zkind testdata. However for this test using only
      one zkind should be enough. Let's pick what we currently primarily use:
      py2_pickle3.
      bfd9eeb4
    • Kirill Smelkov's avatar
      go/zodbpickle: Switch to pickling with protocol=3 · 1bb0eb24
      Kirill Smelkov authored
      Pickle protocol 3 allows to natively support all py bytes and string
      types and is supported by zodbpickle/py on both py2 and py3 out of the
      box. ZODB/py3 is using protocol=3 since 2013 and ZODB/py2 since 2018. In
      other words nowdays protocol=3 is the default protocol with which
      ZODB emits pickles, and it can be read practically everywhere.
      
      The difference in between protocol=2 and protocol=3 is addition of
      BINBYTES and SHORT_BINBYTES opcodes to represent bytes. Without those
      opcodes bytes are encoded as
      
          `_codecs.encode(byt.decode('latin1'), 'latin1')`
      
      which, when unpickled, results in bytes on py3 and str on py2.
      
      Compared to using `BINBYTES byt` this form is much bigger in size, but
      what is more important might turn bytes back into str when decoded and
      reencoded on py2.
      
      This form of bytes encoding is also not accepted by ZEO/py which rejects
      it with
      
          2024-07-18T20:44:39 ERROR ZEO.asyncio.server Can't deserialize message
          Traceback (most recent call last):
            File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/server.py", line 100, in message_received
              message_id, async_, name, args = self.decode(message)
            File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/marshal.py", line 119, in pickle_server_decode
              return unpickler.load()  # msgid, flags, name, args
            File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/marshal.py", line 176, in server_find_global
              raise ImportError("Module not allowed: %s" % (module,))
          ImportError: Module not allowed: _codecs
      
      All in all using pickle protocol=3 is ok from both py2 and py3 point of
      view, brings size optimization and correctness, and fixes ZEO and
      probably other issues.
      
      So let the pickles ZODB/go saves and other Go places emit be encoded with protocol=3 now as well.
      
      For the reference, the
      
          // TODO 2 -> 3 since ZODB5 switched to it and uses zodbpickle.
      
      is from 2019 added in a16c9e06 (go/zodb: Teach Persistent to serialize
      itself).
      1bb0eb24
    • Kirill Smelkov's avatar
      go/zodb/zeo: Fix ZRPC/pickle binary and string encoding/decoding wrt py3 · 9f402fcc
      Kirill Smelkov authored
      ZEO/go client was asserting "method" field to be bytestring. As the
      result interoperation with ZEO/py3 server was failing as e.g.
      
          2024-07-18T20:41:14 INFO ZEO.asyncio.server received handshake 'Z5'
          2024/07/18 20:41:14 zlink @ - /tmp/zeo2281880331/1.fs.zeosock: serveRecv1: decode: .0: method: got string; expected str
              zeo_test.go:221: openzeo /tmp/zeo2281880331/1.fs.zeosock: zeo: open /tmp/zeo2281880331/1.fs.zeosock: zlink @ - /tmp/zeo2281880331/1.fs.zeosock: call register: zlink is closed
      
      because ZEO/py3 sends "method" as unicode.
      
      -> Fix it by using pickle.AsString when decoding "method" field.
      
      Similarly ZEO/go client was sending strings as bytestr, which on py3 decode to
      unicode ok only if they are ASCII. Adjust string encoding to be emitted
      as unicode always which works wrt both ZEO/py3 and ZEO/py2.
      
      --------
      
      ZEO/go client was sending binary ZRPC arguments as bytestr as well,
      which was resulting in the following exception on ZEO/py3 side:
      
          2024-07-18T20:42:31 ERROR ZEO.asyncio.server Can't deserialize message
          Traceback (most recent call last):
            File "/home/kirr/src/wendelin/z/ZEO/src/ZEO/asyncio/server.py", line 102, in message_received
              message_id, async_, name, args = self.decode(message)
            File "/home/kirr/src/wendelin/z/ZEO/src/ZEO/asyncio/marshal.py", line 122, in pickle_server_decode
              return unpickler.load()  # msgid, flags, name, args
          UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 1: ordinal not in range(128)
          2024/07/18 20:42:31 zlink @ - /tmp/zeo627526011/1.fs.zeosock: recvPkt: EOF
              xtesting.go:361: load 0285cbac70a3d733:0000000000000001: returned err unexpected:
                  have: /tmp/zeo627526011/1.fs.zeosock: load 0285cbac70a3d733:0000000000000001: zlink @ - /tmp/zeo627526011/1.fs.zeosock: call loadBefore: zlink is closed
                  want: nil
      
      -> Fix it by using pickle.Bytes which should be encoded as bytes in the
         pickle stream.
      
      This fix is ok but depends on the next patch to upgrade ZODB
      serialization protocol to 3, because on pickle protocol 2 bytes are
      serialized as `_codecs.encode(byt.decode('latin1'), 'latin1')` and
      ZEO/py forbids import of _codecs:
      
          2024-07-18T20:44:39 ERROR ZEO.asyncio.server Can't deserialize message
          Traceback (most recent call last):
            File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/server.py", line 100, in message_received
              message_id, async_, name, args = self.decode(message)
            File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/marshal.py", line 119, in pickle_server_decode
              return unpickler.load()  # msgid, flags, name, args
            File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/marshal.py", line 176, in server_find_global
              raise ImportError("Module not allowed: %s" % (module,))
          ImportError: Module not allowed: _codecs
      
      We will address this in the next patch.
      9f402fcc
    • Kirill Smelkov's avatar
      go/zodb/zeo: Test it on all py2/py3 ZODB kinds of data we care about and wrt... · df8d5cc3
      Kirill Smelkov authored
      go/zodb/zeo: Test it on all py2/py3 ZODB kinds of data we care about and wrt both ZEO/py2 and ZEO/py3
      
      Similarly to FileStorage, fs1tools and other Go packages previously we
      were testing ZEO/go client only with old FileStorage testdata generated
      by python2 and pickle protocol=2. However even on py2 there are more
      pickle protocols that are in use, and also there is python3.
      
      We were also testing our ZEO/go client only wrt ZEO/py2 but there is
      also ZEO/py3.
      
      -> Adjust ZEO/go testing to automatically load and test against all ZODB
      kinds from recently updated FileStorage testdata and wrt both ZEO/py2
      and ZEO/py3.
      
      All py2_pickle1, py2_pickle2, py2_pickle3 and py3_pickle3 are handled well out of the box.
      However only ZEO/py2 succeeds: tests wrt ZEO/py3 server currently fail
      and so are marked with "xfail".
      
      We will fix tests for ZEO/py3 in the next patch.
      df8d5cc3
    • Kirill Smelkov's avatar
      go/zodb/demo: Test it on all py2/py3 ZODB kinds of data we care about · e76c9501
      Kirill Smelkov authored
      Similarly to FileStorage, fs1tools and zodbtools previously we were testing
      demo only with old FileStorage testdata generated by python2 and pickle
      protocol=2. However even on py2 there are more pickle protocols that
      are in use, and also there is python3.
      
      -> Adjust demo testing to automatically load and test agains all ZODB
      kinds from recently updated FileStorage testdata.
      
      All py2_pickle1, py2_pickle2, py2_pickle3 and py3_pickle3 are handled well out of the box.
      e76c9501
    • Kirill Smelkov's avatar
      go/zodb/zodbtools: Test it on all py2/py3 ZODB kinds of data we care about · c67e2fec
      Kirill Smelkov authored
      Like with FileStorage and fs1tools previously we were testing zodbtools/go only
      with data generated by python2 and pickle protocol=2. However even on py2 there
      are more pickle protocols that are in use, and also there is python3.
      
      -> Modernize testdata generation to automatically pick up all ZODB kinds
         from recently-updated FileStorage/go testdata.
      
         Adjust tests on Go side to verify how zodbtools/go handle all generated zkinds.
      
      All py2_pickle1, py2_pickle2, py2_pickle3 and py3_pickle3 are handled well out of the box.
      c67e2fec
    • Kirill Smelkov's avatar
      go/zodb/fs1tools: test: Normalize dumpOk from py3 to py2 syntax · 93e2f065
      Kirill Smelkov authored
      Because else, py3_pickle3 tests fail e.g. as
      
              --- FAIL: TestFsDump/py3_pickle3/db=1 (0.00s)
                  dump_test.go:70: fsdump: dump different:
                       Trans #00000 tid=0285cbac70a3d733 size=211 time=1979-01-03 21:00:26.400001 offset=117
                      -    status='p' user=b'user0.12' description=b'step 0.12'
                      +    status='p' user='user0.12' description='step 0.12'
      
              --- FAIL: TestFsDumpv/py3_pickle3/db=1 (0.00s)
                  dump_test.go:70: fsdumpv: dump different:
                       ************************************************************
                      -file identifier: b'FS30'
                      +file identifier: 'FS30'
      
              ...
      
      fs1tools/go sticks to py2 syntax because in my view it is more user
      friendly. The preference of py2 or py3 syntax needs to be made anyway
      because for fs1tools/go it is not possible to know beforehand to which
      py kind its output will be compared against.
      
      I think it is not very good for fstools/py output to be dependent on
      python version. For the reference on this topic: output of zodbtools/py
      is designed to be the same on both py2 and py3.
      93e2f065
    • Kirill Smelkov's avatar
      go/zodb/fs1tools: Test it on all py2/py3 ZODB kinds of data we care about · 117172cf
      Kirill Smelkov authored
      Like with FileStorage previously we were testing fs1tools only with data
      generated by python2 and pickle protocol=2. However even on py2 there are
      more pickle protocols that are in use, and also there is python3.
      
      -> Modernize testdata generation to automatically pick up all ZODB kinds
         from recently-updated FileStorage/go testdata.
      
         Adjust tests on Go side to verify how fs1tools handles all generated zkinds.
      
      py2_pickle1, py2_pickle2 and py2_pickle3 are handled well.
      Tests for py3_pickle3 currently fail and so are marked with "xfail".
      
      We will fix tests for py3_pickle3 in follow-up patches.
      117172cf
    • Kirill Smelkov's avatar
      go/zodb/fs1tools: DumperFsDump: Print size of every transaction record · 9d10dcc5
      Kirill Smelkov authored
      Like fsdump/py does.
      
      https://github.com/zopefoundation/ZODB/commit/403f9869 started to print
      size of every transaction record saying that it readded it. And indeed
      this printing was there starting from
      https://github.com/zopefoundation/ZODB/commit/06e757b3 authored in 2003.
      
      Both commits use `size(txn) = trans._tend - trans._tpos` which is wrong
      by 8 because during file iteration tpos points to the beginning of
      transaction and tend points to tpos + tlen, but tlen is full
      transaction length - 8:
      
      https://github.com/zopefoundation/ZODB/blob/5.8.1/src/ZODB/FileStorage/FileStorage.py#L1997-L1998
      https://github.com/zopefoundation/ZODB/blob/5.8.1/src/ZODB/FileStorage/format.py#L28
      
      Mimic fsdump/py behaviour exactly for compatibility, even if it is a bit
      buggy, since it was there for such a long time.
      
      Without the fix TestFsDump was failing like this:
      
          --- FAIL: TestFsDump/db=1 (0.00s)
              dump_test.go:70: fsdump: dump different:
                  -Trans #00000 tid=0285cbac258bf266 size=151 time=1979-01-03 21:00:08.800000 offset=52
                  +Trans #00000 tid=0285cbac258bf266 time=1979-01-03 21:00:08.800000 offset=52
                       status=' ' user='' description='initial database creation'
                     data #00000 oid=0000000000000000 size=61 class=persistent.mapping.PersistentMapping
                   ...
      9d10dcc5
    • Kirill Smelkov's avatar
      go/zodb/fs1tools: Regenerate testdata with ZODB 5.8.1 · ce122ee7
      Kirill Smelkov authored
      https://github.com/zopefoundation/ZODB/commit/403f9869 adjusted fsdump
      to emit size of transaction record. Everything else remains unchanged.
      
      Our fsdump becomes broken, because it does not yet emit that size(txn).
      We will fix it in the next patch.
      ce122ee7
    • Kirill Smelkov's avatar
      go/zodb/fs1: Fix index save wrt py3 · ef615203
      Kirill Smelkov authored
      We were saving index data with bytestrings inside index records. This
      works for py2 but on py3 it causes failure on unpickling because, by default,
      *STRING opcodes are unpickled into unicode on py3:
      
          === RUN   TestIndexSaveToPy/py2_pickle1/py3
          UnicodeDecodeError: 'ascii' codec can't decode byte 0x8d in position 25: ordinal not in range(128)
      
          The above exception was the direct cause of the following exception:
      
          Traceback (most recent call last):
            File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/./py/indexcmp", line 43, in <module>
              main()
            File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/./py/indexcmp", line 30, in main
              d1 = fsIndex.load(path1)
            File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/fsIndex.py", line 138, in load
              v = unpickler.load()
          SystemError: <built-in method read of _io.BufferedReader object at 0x7fb631a86670> returned a result with an error set
              index_test.go:229: zodb/py read/compare index: exit status 1
      
      -> Fix it by explicitly emitting index entries in binary form.
      
      To be able to compare "index from py2 -> loaded into go -> saved by go ->
      compared by py3" implement a bit of backward compatibility for  loading py2
      index files on py3. Do this compatibility only for known-good py2 files, not
      index produced by go code which is loaded by py3 without any compatibility
      shims.
      ef615203
    • Kirill Smelkov's avatar
      go/zodb/fs1: Fix index load from py3 data · 82098c49
      Kirill Smelkov authored
      Python3 emits index entries using bytes, but index.go was expecting only
      bytestrings. As the result loading index produced by py3 was failing
      e.g. as
      
          index_test.go:201: index load: testdata/py3_pickle3/1.fs.index: pickle @6: invalid oidPrefix: type ogórek.Bytes
      
      -> Fix it by accepting both bytes and bytestrings inside the index.
      82098c49
    • Kirill Smelkov's avatar
      go/zodb/fs1: Accept both FS21 and FS30 magics · 8bae132e
      Kirill Smelkov authored
      FileStorage/py2 saves data with FS21 magic, while FileStorage/py3 with
      FS30 magic:
      
      https://github.com/zopefoundation/ZODB/blob/5.8.1/src/ZODB/_compat.py#L25-L77
      
      Up till now FileStorage/go was accepting only FS21 and so with
      py3-generated data it was leading to e.g. the following failure:
      
          === RUN   TestEmptyDB/py3_pickle3
              filestorage_test.go:90: testdata/py3_pickle3/empty.fs: invalid fs1 magic "FS30"
      
      -> Fix it by accepting both py2 and py3 FileStorage magics.
      
      We do accept them both without any other change because FileStorage
      format, even when it comes with those two different magic, is really the
      same.
      8bae132e
    • Kirill Smelkov's avatar
      go/zodb/fs1: Test FileStorage on all py2/py3 ZODB kinds of data we care about · ede4b3bb
      Kirill Smelkov authored
      Previously we were testing FileStorage/go only with data generated by
      python2 and pickle protocol=2. However even on py2 there are more pickle
      protocols that are in use, and also there is python3.
      
      -> Modernize py/gen-testdata to use run_with_all_zodb_pickle_kinds
         that was recently added as part of nexedi/zodbtools@f9d36ba7
         and generate test data with both python2 and python3. It is handy to
         use py2py3-venv(*) to prepare python environment to do that.
      
         Adjust tests on Go side to verify how FileStorage handles all generated zkinds.
      
      py2_pickle1, py2_pickle2 and py2_pickle3 are handled well.
      Tests for py3_pickle3 currently fail and so are marked with "xfail".
      
      We will fix tests for py3_pickle3 in follow-up patches.
      
      Old testdata are not yet removed because e.g. fs1tools and zodbdump
      tests depend on them. We will remove old fs1 testdata after adjusting
      tests in dependent packages step-by-step.
      
      (*) see nexedi/zodbtools@fac2f190
      ede4b3bb
    • Kirill Smelkov's avatar
      go/zodb: Fix PyData.ClassName for py3_pickle3 · a865708a
      Kirill Smelkov authored
      Test for PyData.ClassName currently fails for py3_pickle3 zkind:
      
          === RUN   TestPyClassName/py3_pickle3
              pydata_test.go:50: class name for "\x80\x03X\x18\x00\x00\x00ZODB.tests.testSerializeq\x00X\x13\x00\x00\x00ClassWithoutNewargsq\x01\x86q\x02N\x86q\x03.":
                  have: "?.?"
                  want: "ZODB.tests.testSerialize.ClassWithoutNewargs"
              pydata_test.go:50: class name for "\x80\x03X\x18\x00\x00\x00ZODB.tests.testSerializeq\x00X\x10\x00\x00\x00ClassWithNewargsq\x01\x86q\x02K\x01\x85q\x03\x86q\x04.":
                  have: "?.?"
                  want: "ZODB.tests.testSerialize.ClassWithNewargs"
      
      This happens because classname decoder was expecting class and module
      names to come as py2 bytestrings, while py3 emits them as unicode.
      
      -> Fix it by accepting both py2 bytestring and unicode for class and
      module names.
      a865708a
    • Kirill Smelkov's avatar
      go/zodb: Test PyData and Map/List on all py2/py3 kinds of data we care about · ed2b96ca
      Kirill Smelkov authored
      Previously we were testing PyData and Map/List only with data generated by python2
      and pickle protocol=2. However even on py2 there are more pickle
      protocols that are in use, and also there is python3.
      
      -> Modernize py/pydata-gen-testdata to use run_with_all_zodb_pickle_kinds
         that was recently added as part of nexedi/zodbtools@f9d36ba7
         and generate test data with both python2 and python3. It is handy to
         use py2py3-venv(*) to prepare python environment to do that.
      
         Adjust tests on Go side to verify how PyData and Map/List handle all generated
         zkinds.
      
      py2_pickle1, py2_pickle2 and py2_pickle3 are handled well.
      Tests for py3_pickle3 currently fail and so are marked with "xfail".
      
      We will fix tests for py3_pickle3 in the next patches.
      
      (*) see nexedi/zodbtools@fac2f190
      ed2b96ca
  2. 20 Aug, 2024 11 commits
    • Kirill Smelkov's avatar
      go/internal/xtesting: Prepare to run user's tests with both py2 and py3 · 15401594
      Kirill Smelkov authored
      Tests, that want to verify something wrt both py2 and py3 will do
      WithEachPy(tested_func) which will run tested_func two times:
      
      1) under environment where python on $PATH points to python2, and
      2) under environment where python on $PATH points to python3.
      
      Client tests will now need to use just "python" instead of e.g.
      "python2" or "python3" to run some python program, and each time
      "python" will correspond to current phase of WithEachPy execution.
      
      This will be soon handy to test things wrt e.g. ZEO/py2 and ZEO/py3
      servers and similar situations.
      
      Tests that merely want to use some python program just for their inner
      working, for example to run `zodb commit`, no longer indicate their
      preference for py2. For such tests it is a matter of preference in
      pre-setup environment to where "python" points.
      
      For the reference under environments created with py2py3-venv(*) default
      "python" currently points to "python2".
      
      (*) see nexedi/zodbtools@fac2f190
      15401594
    • Kirill Smelkov's avatar
      go/internal/xtesting += ZTestData & friends · 07e1263a
      Kirill Smelkov authored
      ZTestData is Go counterpart of ztestdata on zodbtools/py side from
      https://lab.nexedi.com/nexedi/zodbtools/-/blob/f9d36ba7/zodbtools/test/conftest.py#L31-65
      that was recently added in nexedi/zodbtools@bf772ce0.
      
      On Go side each ZTestData describes testdata files on the filesystem +
      carry arbitrary extra information. For example FileStorage tests will
      use it to place there "what is expected to load" information
      corresponding to on-filesystem database.
      07e1263a
    • Kirill Smelkov's avatar
      go/internal/xmaps: New package that complements standard package maps · 28489790
      Kirill Smelkov authored
      Currently with the only utility to retrieve map keys in sorted order.
      
      We will need this in several tests in follow-up patches.
      28489790
    • Kirill Smelkov's avatar
      go/zodbpickle: Factor/generalize pickletools.Xstrbytes8 into here as zodbpickle.Unpack64 · 8da1234e
      Kirill Smelkov authored
      Similarly to ZODB/py which provides u64 and p64 utilities. Add
      zodbpickle.Pack64 for symmetry.
      8da1234e
    • Kirill Smelkov's avatar
      go/*: Switch to StrictUnicode mode for pickling/unpickling · 078b6262
      Kirill Smelkov authored
      Because without StrictUnicode both bytestrings (str from py2) and
      unicode (unicode from py2 and str from py3) load into the same Go type
      string so it becomes impossible to distinguish them from each other.
      Re-saving data, thus, also generally introduces changes as e.g.
      string loaded via bytestring will be saved as unicode when pickling with
      protocol 3. Or a loaded unicode will be saved as bytestring with
      pickling via protocol=2.
      
      -> Switching to StrictUnicode mode solves all those problems.
      
      Please see updated documentatin for zodbpickle.go and
      https://github.com/kisielk/og-rek/commit/b28613c2 for more details about
      StrictUnicode mode.
      
      For ZODB/go this is change in behaviour exposed to outside. However
      there is currently only one known ZODB/go user - WCFS in Wendelin.core -
      and that user will be updated correspondingly as well.
      078b6262
    • Kirill Smelkov's avatar
      go/*: Move creation of a Pickler and Unpickler to central place · 02076be2
      Kirill Smelkov authored
      We already have 3 places where we create picklers and unpicklers:
      zodb/pydata.go, fs1/index.go and zeo/proto.go and they are already
      diverging a bit: for example pydata was explicitly specifying protocol
      for pickle encoder, while other places were using defaults.
      
      We will soon want to use StrictUnicode option for all picklers and
      unpicklers we create, which means it is better to keep this
      customization only in one place instead of copy-pasting it in between
      callsites increasing the possibility for divergence and errors.
      
      -> Move the code to create pickler and unpickler to internal zodbpickle
      package as a preparatory step for that.
      
      For the reference: this package is merely a wrapper around ogórek, not a
      fork of any code like zodbpickle/py is.
      02076be2
    • Kirill Smelkov's avatar
      go/go.mod: v↑ go 1.14 -> go 1.18 · 8bc46475
      Kirill Smelkov authored
      We will soon want to use generics and `any` which are available only for go >= 1.18 .
      
      The other changes in go.sum and go.mod are there because after just
      
          -go 1.14
          +go 1.18
      
      the build stopped to work and asked to do `go mod tidy`
      8bc46475
    • Kirill Smelkov's avatar
      go/*: Switch pickletools.Xint64 to ogórek.AsInt64 · e0063782
      Kirill Smelkov authored
      ogórek.AsInt64 was added in https://github.com/kisielk/og-rek/commit/010fbd2e
      with functionality similar to what we had in pickletools.Xint64 before.
      e0063782
    • Kirill Smelkov's avatar
      go/go.mod: v↑ github.com/kisielk/og-rek to pre-release · 2d54cad5
      Kirill Smelkov authored
      This brings in support for StrictUnicode mode, AsInt64, AsBytes and
      AsString helpers, and other fixes.
      
      See https://github.com/kisielk/og-rek/compare/v1.2.0...b1bd3f2e3d5e for
      details.
      2d54cad5
    • Kirill Smelkov's avatar
      go: Add support for Go modules · cb174b5e
      Kirill Smelkov authored
      So that building and using ZODB/go works not only in GOPATH mode.
      cb174b5e
    • Kirill Smelkov's avatar
      go/zodb: Add tests for Map and List · 11f1a2bd
      Kirill Smelkov authored
      As promised in 6c162438 (go/zodb: Draft support for Map and List) add
      test that Map and List implemented on Go side can load PersistentMapping
      and PersistentList data generated by Python.
      
      Test that Py side can load Map and List data generated by Go is still pending.
      
      Testdata is still generated with zodbtools f9d36ba7~ i.e. before
      nexedi/zodbtools!29 was merged.
      11f1a2bd
  3. 19 Aug, 2024 2 commits
    • Levin Zimmermann's avatar
      go/neo/proto: Fix KnownMasterList nesting · dc844465
      Levin Zimmermann authored
      Before this patch, the 'KnownMasterList' field of the 'NotPrimaryMaster'
      was expected to be structured in the following way:
      
      	ArrayHeader (KnownMasterList)
      
      	    ArrayHeader (KnownMaster)
      
      		ArrayHeader (Address)
      
      		    Host (string)
      		    Port (uint16)
      
      However NEO/py sends the following structure:
      
      	ArrayHeader (KnownMasterList)
      
      	    ArrayHeader (Address)
      
      		Host (string)
      		Port (uint16)
      
      This also makes sense, as 'KnownMaster' doesn't need to add another
      nesting, because it only includes the address.
      
      This patch amends the NEO/go protocol definition to transparently
      represent the nesting as it's send by NEO/py. See also
      levin.zimmermann/neoppod@18287612 for a similar issue.
      
      ----
      kirr: tests currently live only on t branch.
      
      /reviewed-by @kirr
      /reviewed-on !6
      dc844465
    • Levin Zimmermann's avatar
      go/neo/proto/RowInfo: Fix representation on the wire · 7f96ad77
      Levin Zimmermann authored
      Some NEO protocol packets have the field 'RowList'. This field contains
      information about each row of a partition table. In NEO/go the information
      of each row is represented with the 'RowInfo' type [1]. This type is defined
      as a struct with the field ‘CellList’. ‘CellList’ is defined as a list of
      'CellInfo' [1] (e.g. an entry for each cell).
      
      NEO/go {en,de}codes struct types with ‘genStructHead’ (structures in
      golang are encoded as arrays in msgpack) [2]. From the 'RowList'
      definition, the msgpack decoder currently expects the following msgpack
      array structure:
      
          ArrayHeader (RowList)
      
              ArrayHeader (RowInfo)
      
                  ArrayHeader (CellList)
      
                      ArrayHeader (CellInfo)
      
                          int32 (NID)
                          enum (State)
      
      However NEO/py actually sends:
      
          ArrayHeader (RowList)
      
              ArrayHeader (CellList)
      
                  ArrayHeader (CellInfo)
      
                      int32 (NID)
                      enum (State)
      
      In other words, currently the NEO/go msgpack encoder expects one more
      nesting, which NEO/py doesn’t provide (and which also doesn’t seem to
      be necessary as the outer nesting would always only contain one
      element). We could adjust the msgpack {en,de}coder to introduce an
      exception for the 'RowInfo' type, however as the protocol definition in
      'proto.go' aims to transparently reflect the structure of the packets on
      the wire, it seems to be more appropriate to fix this straight in the
      protocol definition. This is also less error-prone as we don't have to
      fix all the different positions of the encoder, decoder & sizer and it's
      less code (particularly if 'RowInfo' doesn't stay the only case for such
      an issue).
      
      [1] https://lab.nexedi.com/kirr/neo/-/blob/1ad088c8/go/neo/proto/proto.go#L391-394
      [2] https://lab.nexedi.com/kirr/neo/-/blob/1ad088c8/go/neo/proto/protogen.go#L1770-1775
      
      --------
      
      kirr: I've applied the following interdiff to the original patch of
      levin.zimmermann/neoppod@c93d5dbc :
      
          --- a/go/neo/neo_test.go
          +++ b/go/neo/neo_test.go
          @@ -100,7 +100,7 @@ func _TestMasterStorage(t0 *tEnv) {
                          PTid:           1,
                          NumReplicas:    0,
                          RowList:        []proto.RowInfo{
          -                       proto.RowInfo{proto.CellInfo{proto.NID(proto.STORAGE, 1), proto.UP_TO_DATE}},
          +                       {proto.CellInfo{proto.NID(proto.STORAGE, 1), proto.UP_TO_DATE}},
                          },
                  }))
      
          @@ -173,7 +173,7 @@ func _TestMasterStorage(t0 *tEnv) {
                          PTid:           1,
                          NumReplicas:    0,
                          RowList:        []proto.RowInfo{
          -                       proto.RowInfo{proto.CellInfo{proto.NID(proto.STORAGE, 1), proto.UP_TO_DATE}},
          +                       {proto.CellInfo{proto.NID(proto.STORAGE, 1), proto.UP_TO_DATE}},
                          },
                  }))
      
          --- a/go/neo/proto/proto_test.go
          +++ b/go/neo/proto/proto_test.go
          @@ -210,9 +210,9 @@ func TestMsgMarshal(t *testing.T) {
                                  PTid:        0x0102030405060708,
                                  NumReplicas: 34,
                                  RowList: []RowInfo{
          -                               {CellInfo{11, UP_TO_DATE}, CellInfo{17, OUT_OF_DATE}},
          -                               {CellInfo{11, FEEDING}},
          -                               {CellInfo{11, CORRUPTED}, CellInfo{15, DISCARDED}, CellInfo{23, UP_TO_DATE}},
          +                               {{11, UP_TO_DATE}, {17, OUT_OF_DATE}},
          +                               {{11, FEEDING}},
          +                               {{11, CORRUPTED}, {15, DISCARDED}, {23, UP_TO_DATE}},
                                  },
                                  },
      
          @@ -229,9 +229,9 @@ func TestMsgMarshal(t *testing.T) {
                                  hex("cf0102030405060708") +
                                  hex("22") +
                                  hex("93") +
          -                               hex("92"+"920bd40001"+"9211d40000") +
          -                               hex("91"+"920bd40002") +
          -                               hex("93"+"920bd40003"+"920fd40004"+"9217d40001"),
          +                               hex("92"+"920bd40401"+"9211d40400") +
          +                               hex("91"+"920bd40402") +
          +                               hex("93"+"920bd40403"+"920fd40404"+"9217d40401"),
                          },
      
                          // map[Oid]struct {Tid,Tid,bool}
      
      for cosmetics and because the tests were failing as
      
          --- FAIL: TestMsgMarshal (0.00s)
              proto_test.go:106: M/proto.AnswerPartitionTable: encode result unexpected:
              proto_test.go:107:  have: 93cf0102030405060708229392920bd404019211d4040091920bd4040293920bd40403920fd404049217d40401
              proto_test.go:108:  want: 93cf0102030405060708229392920bd400019211d4000091920bd4000293920bd40003920fd400049217d40001
      
      /reviewed-by @kirr
      /reviewed-on !6
      7f96ad77
  4. 04 Aug, 2024 2 commits
    • Levin Zimmermann's avatar
      go/neo/neonet: DialLink: Fix SIGSEGV in case client handshake fails · d75f4ac2
      Levin Zimmermann authored
      In case the last 'handshakeClient' call returns an error, 'DialLink'
      returns 'link = nil, err = nil'. Callers of 'DialLink' then don't
      recognize that 'link' is 'nil', as it's the convention to only check if
      'err' is 'nil', which leads to a 'segmentation violation' as soon as
      subsequent code tries to access fields of 'link':
      
      ```
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x14 pc=0x7087ae]
      
      goroutine 5 [running]:
      lab.nexedi.com/kirr/neo/go/neo/neonet.(*NodeLink).NewConn(0x0)
      	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/neo/neonet/connection.go:404 +0x4e
      lab.nexedi.com/kirr/neo/go/neo/xneo.Dial.func1()
      	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/neo/xneo/connect.go:138 +0x52
      lab.nexedi.com/kirr/neo/go/internal/xio.WithCloseOnErrCancel.func2()
      	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/internal/xio/xio.go:114 +0x6a
      created by lab.nexedi.com/kirr/neo/go/internal/xio.WithCloseOnErrCancel in goroutine 21
      	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/internal/xio/xio.go:109 +0x1ad
      ```
      
      This patch fixes this issue so that now 'err' and 'link' are never both
      'nil' again.
      
      /reviewed-by @kirr
      /reviewed-on !10
      d75f4ac2
    • Kirill Smelkov's avatar
      go/neo/neonet: Demonstrate DialLink misbehaviour when all handshake attempts fail · 03db1d8a
      Kirill Smelkov authored
      Levin found that when all handshake attempts fail DialLink returns
      both link=nil and err=nil which breaks what callers expect and lead to
      segmentation fault when accessing that nil link.
      
      -> Add test to demonstrate the problem.
      
      With xfail removed that test currently fails as
      
          --- FAIL: TestDialLink_AllHandshakeErr (0.00s)
          panic: lab.nexedi.com/kirr/neo/go/neo/neonet.TestDialLink_AllHandshakeErr.gox.func4.1: lab.nexedi.com/kirr/neo/go/neo/neonet.TestDialLink_AllHandshakeErr.func2: DialLink to handshake-rejecting server:
          have: link=<nil> err=<nil>
          want: link=<nil> err=client:1 - server:2: handshake (client): unexpected EOF [recovered]
      
      We will fix the problem in the next patch.
      
      /reported-by @levin.zimmermann
      /reported-at !10
      03db1d8a
  5. 02 Feb, 2024 1 commit
  6. 17 Jul, 2023 3 commits
    • Levin Zimmermann's avatar
      go/neo/neonet: Fix client handshake not to accept server encoding if it is... · 052856ce
      Levin Zimmermann authored
      go/neo/neonet: Fix client handshake not to accept server encoding if it is different from what client indicated
      
      If the peers encoding is different than our encoding two different
      scenarios can happen, because the handshake order is undefined (e.g.
      we don't know if our handshake is received before the peer sends its
      handshake):
      
      1. Our handshake is received before peer sends its handshake, NEO/py
      closes connection if it sees unexpected magic, version, etc.
      
      2. The client already sends a handshake before it proceeds our handshake.
      In this case it initally sends us it version, we can extract its encoding,
      and only later, once it proceeded our handshake with the bad encoding,
      it closes the connection.
      
      Before this patch case (2) wasn't handled correctly by the automatic
      encoding detection of 'DialLink'. 'DialLink' simply accepted the
      different-than-expected encoding, but once the peer proceeded the nodes
      handshake the peer closed the connection and the initially established
      and returned link was immediately closed again. Due to this it was good
      luck whether connecting with a peer different with an encoding different
      from the expected one worked or didn't work (it depended on which handshake
      was faster). Now 'DialLink' should reliably find the correct encoding
      and return a stable link.
      
      --------
      
      kirr: this is based on the following original patch by Levin: levin.zimmermann/neoppod@f6b59772
      
      I updated documentation throughout correspondingly and also added
      corresponding handshake-specific test in the previous patch.
      
      See !5 and b2da69e2
      (go/neo/neonet: Demonstrate problem in handshake with NEO/py) for more
      in-depth description of the problem.
      052856ce
    • Kirill Smelkov's avatar
      go/neo/neonet: Demonstrate problem in handshake with NEO/py · b2da69e2
      Kirill Smelkov authored
      Levin Zimmerman discovered that sometimes NEO/py accepts our handshake
      hello with encoding 'M', then replies its owh handshake ehlo with
      encoding 'N' and then further terminates the connection. In other words
      it looks like that the handshake went successful, but it actually did
      not and NEO/py terminates the link after some time.
      
      This manifests itself e.g. as infrequent TestLoad failures on t branch
      with the following output:
      
      === RUN   TestLoad/py/!ssl
      I: runneo.py: /tmp/neo776618506/1 !ssl: started master(s): 127.0.0.1:21151
      === RUN   TestLoad/py/!ssl/enc=N(dialTryOrder=N,M)
          client_test.go:598: skip: does not excercise client redial
      === RUN   TestLoad/py/!ssl/enc=N(dialTryOrder=M,N)
          xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned err unexpected:
              have: neo://127.0.0.1:21151/1: load 0285cbac258bf266:0000000000000000: dial S1: dial 127.0.0.1:40345 (STORAGE): 127.0.0.1:56678 - 127.0.0.1:40345: request identification: 127.0.0.1:56678 - 127.0.0.1:40345 .1: recv: EOF
              want: nil
          xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned tid unexpected: 0000000000000000  ; want: 0285cbac258bf266
          xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned buf = nil
          xtesting.go:339: load 0285cbac258bf265:0000000000000000: returned err unexpected:
              have: neo://127.0.0.1:21151/1: load 0285cbac258bf265:0000000000000000: dial S1: dial 127.0.0.1:40345 (STORAGE): 127.0.0.1:56688 - 127.0.0.1:40345: request identification: 127.0.0.1:56688 - 127.0.0.1:40345 .1: recv: EOF
              want: neo://127.0.0.1:21151/1: load 0285cbac258bf265:0000000000000000: 0000000000000000: object was not yet created
          ...
          client_test.go:588: NEO log tail:
      
              log file 'storage_0.log' tail:
      
              2023-07-17 17:21:57.1519 DEBUG     S1         connection completed for <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, fd=20, server) at 7f3583fd4f50> (from 127.0.0.1:40345)
              2023-07-17 17:21:57.1537 WARNING   S1         Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, fd=20, server) at 7f3583fd4f50>
              2023-07-17 17:21:57.1548 DEBUG     S1         connection closed for <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, closed, server) at 7f3583fd4f50>
              2023-07-17 17:21:57.1551 WARNING   S1         A connection was lost during identification
              2023-07-17 17:22:00.1582 DEBUG     S1         accepted a connection from 127.0.0.1:51236
              2023-07-17 17:22:00.1585 DEBUG     S1         connection completed for <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, fd=20, server) at 7f3583fd4e90> (from 127.0.0.1:40345)
              2023-07-17 17:22:00.1604 WARNING   S1         Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, fd=20, server) at 7f3583fd4e90>
              2023-07-17 17:22:00.1622 DEBUG     S1         connection closed for <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, closed, server) at 7f3583fd4e90>
              2023-07-17 17:22:00.1625 WARNING   S1         A connection was lost during identification
              2023-07-17 17:22:03.1663 DEBUG     S1         accepted a connection from 127.0.0.1:51238
              2023-07-17 17:22:03.1666 DEBUG     S1         connection completed for <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, fd=20, server) at 7f3583fd4d10> (from 127.0.0.1:40345)
              2023-07-17 17:22:03.1674 WARNING   S1         Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, fd=20, server) at 7f3583fd4d10>
              2023-07-17 17:22:03.1688 DEBUG     S1         connection closed for <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, closed, server) at 7f3583fd4d10>
              2023-07-17 17:22:03.1691 WARNING   S1         A connection was lost during identification
              2023-07-17 17:22:06.1714 DEBUG     S1         accepted a connection from 127.0.0.1:57072
              2023-07-17 17:22:06.1719 DEBUG     S1         connection completed for <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, fd=20, server) at 7f3583fd4b50> (from 127.0.0.1:40345)
              2023-07-17 17:22:06.1727 WARNING   S1         Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, fd=20, server) at 7f3583fd4b50>
              2023-07-17 17:22:06.1738 DEBUG     S1         connection closed for <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, closed, server) at 7f3583fd4b50>
              2023-07-17 17:22:06.1738 WARNING   S1         A connection was lost during identification
      
              log file 'master_0.log' tail:
      
              2023-07-17 17:21:21.0799 PACKET    M1         #0x0012 NotifyNodeInformation          > A1 (127.0.0.1:37906)
              2023-07-17 17:21:21.0799 PACKET    M1          ! C0 | CLIENT |  | RUNNING | 2023-07-17 14:21:21.079838
              2023-07-17 17:21:21.0800 PACKET    M1         #0x0102 NotifyNodeInformation          > S1 (127.0.0.1:37918)
              2023-07-17 17:21:21.0800 PACKET    M1          ! C0 | CLIENT |  | RUNNING | 2023-07-17 14:21:21.079838
              2023-07-17 17:21:21.0801 DEBUG     M1         Handler changed on <ServerConnection(nid=None, address=127.0.0.1:37966, handler=ClientServiceHandler, fd=18, server) at 7f3584245910>
              2023-07-17 17:21:21.0802 PACKET    M1         #0x0001 AnswerRequestIdentification    > C0 (127.0.0.1:37966)
              2023-07-17 17:21:21.0804 PACKET    M1         #0x0000 NotifyNodeInformation          > C0 (127.0.0.1:37966)
              2023-07-17 17:21:21.0804 PACKET    M1          ! C0 | CLIENT  |                 | RUNNING | 2023-07-17 14:21:21.079838
              2023-07-17 17:21:21.0804 PACKET    M1          ! M1 | MASTER  | 127.0.0.1:21151 | RUNNING | None
              2023-07-17 17:21:21.0804 PACKET    M1          ! S1 | STORAGE | 127.0.0.1:40345 | RUNNING | 2023-07-17 14:21:18.737469
              2023-07-17 17:21:21.0805 PACKET    M1         #0x0002 NotifyPartitionTable           > C0 (127.0.0.1:37966)
              2023-07-17 17:21:21.0810 PACKET    M1         #0x0003 LastTransaction                < C0 (127.0.0.1:37966)
              2023-07-17 17:21:21.0811 PACKET    M1         #0x0003 AnswerLastTransaction          > C0 (127.0.0.1:37966)
              2023-07-17 17:22:06.2053 DEBUG     M1         <SocketConnectorIPv4 at 0x7f3584252d10 fileno 18 ('127.0.0.1', 21151), opened from ('127.0.0.1', 37966)> closed in recv
              2023-07-17 17:22:06.2056 DEBUG     M1         connection closed for <ServerConnection(nid=C0, address=127.0.0.1:37966, handler=ClientServiceHandler, closed, server) at 7f3584245910>
              2023-07-17 17:22:06.2058 PACKET    M1         #0x0014 NotifyNodeInformation          > A1 (127.0.0.1:37906)
              2023-07-17 17:22:06.2058 PACKET    M1          ! C0 | CLIENT |  | UNKNOWN | 2023-07-17 14:21:21.079838
              2023-07-17 17:22:06.2059 PACKET    M1         #0x0104 NotifyNodeInformation          > S1 (127.0.0.1:37918)
              2023-07-17 17:22:06.2059 PACKET    M1          ! C0 | CLIENT |  | UNKNOWN | 2023-07-17 14:21:21.079838
      
      The problem is due to that my analysis from e407f725 (go/neo/neonet:
      Rework handshake to differentiate client and server parts) turned out to
      be incorrect. Quoting that patch:
      
          -> Rework handshake so that client always sends its hello first, and
          only then the server side replies. This matches actual NEO/py behaviour:
      
          https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-67-g261dd4b4/neo/lib/connector.py#L293-294
      
          even though the "NEO protocol" states that
      
          	Handshake transmissions are not ordered with respect to each other and can go in parallel.
      
          	( https://neo.nexedi.com/P-NEO-Protocol.Specification.2019?portal_skin=CI_slideshow#/9/2 )
      
          If I recall correctly that sentence was authored by me in 2018 based on
          previous understanding of should-be full symmetry in-between client and
          server.
      
      so here "This matches actual NEO/py behaviour" was wrong: even though
      https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-67-g261dd4b4/neo/lib/connector.py#L293-294
      indeed says that
      
          #       The NEO protocol is such that a client connection is always the
          #       first to send a packet, as soon as the connection is established,
      
      in reality it is not the case as SocketConnector always queues handshake
      hello upon its creation before receiving anything from remote side:
      https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-93-gfd87e153/neo/lib/connector.py#L77-78 .
      In practice this leads to that in non-SSL case NEO/py server might be
      fast enough to send its prepared hello before receiving hello from us.
      Levin also explains at kirr/neo!5 (comment 187429):
      
          I think what happens is this:
      
          the NEO protocol doesn't specify in which order handshakes happen after
          initial dial. If the peer sends a handshake before receiving our
          handshake and if this peers handshake is received by us, 'DialLink'
          assumes everything is fine (no err is returned), it breaks the loop and
          returns the link. But then, very little time later, when the peer
          finally receives our handshake, this looks strange for the peer and it
          closes the connection.
      
          So in my understanding this should be fixed by explicitly comparing the
          encodings between our expected one and what the peer provided us. If
          encodings don't match we should retry with a new encoding in order to
          prevent the peer from closing the connection.
      
          For me this also explains why sometimes the tests passed and sometimes
          didn't: it depended on which node was faster ('race condition').
      
      -> In this patch we add correspondig handshake test that demonstrates
      this problem. It currently fails as
      
      --- FAIL: TestHandshake (0.01s)
          --- FAIL: TestHandshake/enc=N (0.00s)
              newlink_test.go:154: handshake encoding mismatch: client: unexpected error:
                  have: <nil>
                       "<nil>"
                  want: &neonet._HandshakeError{LocalRole:1, LocalAddr:net.pipeAddr{}, RemoteAddr:net.pipeAddr{}, Err:(*neonet._EncodingMismatchError)(0xc0000a4190)}
                       "pipe - pipe: handshake (client): protocol encoding mismatch: peer = 'M'  ; our side = 'N'"
          --- FAIL: TestHandshake/enc=M (0.00s)
              newlink_test.go:154: handshake encoding mismatch: client: unexpected error:
                  have: <nil>
                       "<nil>"
                  want: &neonet._HandshakeError{LocalRole:1, LocalAddr:net.pipeAddr{}, RemoteAddr:net.pipeAddr{}, Err:(*neonet._EncodingMismatchError)(0xc0001a22cc)}
                       "pipe - pipe: handshake (client): protocol encoding mismatch: peer = 'N'  ; our side = 'M'"
      
      We will fix it in the next patch.
      
      /reported-by @levin.zimmermann
      /reported-on kirr/neo!5
      b2da69e2
    • Kirill Smelkov's avatar
      go/neo/neonet: Dedicate an error type to indicate "protocol version mismatch"... · c100a0c9
      Kirill Smelkov authored
      go/neo/neonet: Dedicate an error type to indicate "protocol version mismatch" as handshake failure cause
      
      We will soon need to detect if a handshake failure was due to mismatch
      of protocol encodings and that would require introduction of dedicated
      error type for that cause. As a preparatory step first refactor "version
      mismatch cause" to follow the same style for symmetry.
      c100a0c9
  7. 18 Jan, 2023 1 commit