1. 20 Aug, 2024 12 commits
    • Levin Zimmermann's avatar
      go/neo/proto/RowInfo: Fix representation on the wire · 6f02bc97
      Levin Zimmermann authored
      Some NEO protocol packets have the field 'RowList'. This field contains
      information about each row of a partition table. In NEO/go the information
      of each row is represented with the 'RowInfo' type [1]. This type is defined
      as a struct with the field ‘CellList’. ‘CellList’ is defined as a list of
      'CellInfo' [1] (e.g. an entry for each cell).
      
      NEO/go {en,de}codes struct types with ‘genStructHead’ (structures in
      golang are encoded as arrays in msgpack) [2]. From the 'RowList'
      definition, the msgpack decoder currently expects the following msgpack
      array structure:
      
          ArrayHeader (RowList)
      
              ArrayHeader (RowInfo)
      
                  ArrayHeader (CellList)
      
                      ArrayHeader (CellInfo)
      
                          int32 (NID)
                          enum (State)
      
      However NEO/py actually sends:
      
          ArrayHeader (RowList)
      
              ArrayHeader (CellList)
      
                  ArrayHeader (CellInfo)
      
                      int32 (NID)
                      enum (State)
      
      In other words, currently the NEO/go msgpack encoder expects one more
      nesting, which NEO/py doesn’t provide (and which also doesn’t seem to
      be necessary as the outer nesting would always only contain one
      element). We could adjust the msgpack {en,de}coder to introduce an
      exception for the 'RowInfo' type, however as the protocol definition in
      'proto.go' aims to transparently reflect the structure of the packets on
      the wire, it seems to be more appropriate to fix this straight in the
      protocol definition. This is also less error-prone as we don't have to
      fix all the different positions of the encoder, decoder & sizer and it's
      less code (particularly if 'RowInfo' doesn't stay the only case for such
      an issue).
      
      [1] https://lab.nexedi.com/kirr/neo/-/blob/1ad088c8/go/neo/proto/proto.go#L391-394
      [2] https://lab.nexedi.com/kirr/neo/-/blob/1ad088c8/go/neo/proto/protogen.go#L1770-1775
      6f02bc97
    • Levin Zimmermann's avatar
      go/neo/proto/msgpack: Fix decoding of unset IdTime · c5b08706
      Levin Zimmermann authored
      The data type of IdTime is 'Optional[float]' [1], [2]. Before this patch the
      msgpack decoder could only decode 'IdTime' in case its data type was
      'float'. Now it also supports the decoding of IdTime in case it is NIL.
      
      Without this patch, NEO/go client tests fail [3].
      
      [1] See here, the fifth argument is IdTime:
      
          https://lab.nexedi.com/nexedi/neoppod/-/blob/3ddb6663/neo/master/handlers/identification.py#L26-27
      
          This is found to be 'Optional[float]':
      
          https://lab.nexedi.com/nexedi/neoppod/-/blob/3ddb6663/neo/tests/protocol#L98
      
      [2] This seems to be true for both, pre-msgpack and post-msgpack
          protocol, because in the pre-msgpack NEO/go there is already this
          note:
      
          https://lab.nexedi.com/kirr/neo/-/blob/1ad088c8/go/neo/proto/proto.go#L352-357
      
      [3] When running NEO/go client tests without this patch, we can see in the NEO/go log file:
      
      ```
      I0820 13:05:23.813992 1138432 connect.go:115] C?: talk master(127.0.0.1:29203): dial 127.0.0.1:29203 (MASTER): [
      I0820 13:05:23.816456 1138432 connect.go:122] C?: talk master(127.0.0.1:29203): dial 127.0.0.1:29203 (MASTER): dialed ok; requesting identification...
      I0820 13:05:23.817337 1138432 connect.go:180] C?: talk master(127.0.0.1:29203): dial 127.0.0.1:29203 (MASTER): identification accepted
      I0820 13:05:23.817346 1138432 connect.go:181] C?: talk master(127.0.0.1:29203): dial 127.0.0.1:29203 (MASTER): ]
      I0820 13:05:23.817369 1138432 mastered.go:187] C?: talk M1: [
      I0820 13:05:23.817401 1138432 mastered.go:195] C?: talk M1: ] (after identification: expect nodeTab: 127.0.0.1:41678 - 127.0.0.1:29203 .0: decode: decode: M: NotifyNodeInformation.IdTime: msgp: attempted to decode type "nil" with method for "float64")
      W0820 13:05:23.817429 1138432 mastered.go:127] C?: talk master(127.0.0.1:29203): C?: talk M1: after identification: expect nodeTab: 127.0.0.1:41678 - 127.0.0.1:29203 .0: decode: decode: M: NotifyNodeInformation.IdTime: msgp: attempted to decode type "nil" with method for "float64"
      ```
      c5b08706
    • Levin Zimmermann's avatar
      go/neo/proto/msgpack: Fix long 'types.Struct' decoding · afe76d47
      Levin Zimmermann authored
      In NEO/go protocol, we describe some parameters that can
      be send via NEO packages with structures. On the wire all of
      these structures are encoded as msgpack arrays [1]. Msgpack
      arrays can have 1 or more bytes as a header [2]. Therefore
      it's better to use "ReadArrayHeaderBytes" than using
      
        >>> data = data[1:]
      
      which fails in case we have array with 3 or 5 header bytes.
      
      [1] One could also declare a protocol where these parameters aren't
          send as msgpack arrays but as msgpack maps. See here for the
          msgpack map specification:
      
          https://github.com/msgpack/msgpack/blob/9aa092d6ca81/spec.md?plain=1#L338
      
      [2] https://github.com/msgpack/msgpack/blob/9aa092d6ca81/spec.md?plain=1#L315
      afe76d47
    • Levin Zimmermann's avatar
      go/neo/proto: Update msgcode increment logic to new protocol · 5e46e4fc
      Levin Zimmermann authored
      In pre-msgpack protocol, the msgcode increment logic is like this:
      
      Notify 	 add +1 to next msgcode
      Request  add +0 to next msgcode   (next msg is answer & should therefore be the same)
      Answer 	 add +1 to next msgcode
      
      ('Answer' msgcode is adjusted by 'AnswerBit')
      
      In post-msgpack protocol, the logic is a bit different:
      
      Notify 	 add +1 to next msgcode
      Request  add +0 to next msgcode   (next msg is answer & should therefore be the same)
      Answer 	 add +2 to next msgcode
      
      So here we produce gaps after Request/Answer pairs.
      5e46e4fc
    • Levin Zimmermann's avatar
      go/neo/proto: Update enum order to new protocol · 2d27b4a1
      Levin Zimmermann authored
      The enum order has changed in the migration to the msgpack encoded
      protocol.
      
      See here for pre-msgpack order:
      
      https://lab.nexedi.com/nexedi/neoppod/-/blob/6332112cb/neo/lib/protocol.py#L62-141
      
      And here for post-msgpack order:
      
      https://lab.nexedi.com/nexedi/neoppod/-/blob/9d0bf97a1327182ac29e95d65fd9e18742c43d1f/neo/lib/protocol.py#L102-181
      
      As this order defines the encoding of the enum values, this needs to be
      strictly followed to be compatible (to follow the protocol declaration).
      2d27b4a1
    • Levin Zimmermann's avatar
      9db79d7f
    • Levin Zimmermann's avatar
      proto/msgpack: Fix handshake magic · 0e3681c3
      Levin Zimmermann authored
      This fixes the basic NEO handshake in msgpack encoding.
      
      Without this fix NEO/go client test fails with a timeout. In NEO/go log
      we can see
      
      ```
      Log file created at: 2024/08/20 12:50:11
      Running on machine: ubuntu
      Binary: Built with gc go1.18.1 for linux/amd64
      Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
      I0820 12:50:11.300574 1137321 client.go:418] neo: open neo://1@127.0.0.1:23004: [
      I0820 12:50:11.301130 1137321 mastered.go:119] C?: talk master(127.0.0.1:23004): [
      I0820 12:50:11.301140 1137321 connect.go:115] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): [
      I0820 12:50:11.303723 1137321 connect.go:119] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): ] (127.0.0.1:42388 - 127.0.0.1:23004: handshake (client): rx hello reply: unexpected EOF)
      W0820 12:50:11.303729 1137321 mastered.go:127] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): 127.0.0.1:42388 - 127.0.0.1:23004: handshake (client): rx hello reply: unexpected EOF
      I0820 12:50:12.304076 1137321 connect.go:115] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): [
      I0820 12:50:12.306096 1137321 connect.go:119] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): ] (127.0.0.1:42410 - 127.0.0.1:23004: handshake (client): rx hello reply: unexpected EOF)
      W0820 12:50:12.306102 1137321 mastered.go:127] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): 127.0.0.1:42410 - 127.0.0.1:23004: handshake (client): rx hello reply: unexpected EOF
      I0820 12:50:13.306311 1137321 connect.go:115] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): [
      I0820 12:50:13.307818 1137321 connect.go:119] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): ] (127.0.0.1:56834 - 127.0.0.1:23004: handshake (client): rx hello reply: unexpected EOF)
      W0820 12:50:13.307824 1137321 mastered.go:127] C?: talk master(127.0.0.1:23004): dial 127.0.0.1:23004 (MASTER): 127.0.0.1:56834 - 127.0.0.1:23004: handshake (client): rx hello reply: unexpected EOF
      ```
      
      And in NEO/py logs we can see:
      
      ```
      2024-08-20 12:51:35.5152 DEBUG     M1         Rejecting non-NEO <ServerConnection(nid=None, address=127.0.0.1:35404, handler=IdentificationHandler, fd=18, server) at 7f9417467510>
      2024-08-20 12:51:35.5153 DEBUG     M1         connection closed for <ServerConnection(nid=None, address=127.0.0.1:35404, handler=IdentificationHandler, closed, server) at 7f9417467510>
      2024-08-20 12:51:36.5161 DEBUG     M1         accepted a connection from 127.0.0.1:35408
      2024-08-20 12:51:36.5168 DEBUG     M1         connection completed for <ServerConnection(nid=None, address=127.0.0.1:35408, handler=IdentificationHandler, fd=18, server) at 7f9417467210> (from 127.0.0
      ```
      
      NEO/py defines its handshake packet here:
      
      https://lab.nexedi.com/nexedi/neoppod/-/blob/9d0bf97a1/neo/lib/protocol.py#L27
      
      It's encoded form can be simply revealed with the help of a python shell:
      
      ```python
      >>> from neo.lib import protocol
      >>> protocol.HANDSHAKE_PACKET
      '\x92\xa3NEO\x00'
      ```
      0e3681c3
    • Levin Zimmermann's avatar
      go/neo/proto: Update py test data after protocol change · 74b0904e
      Levin Zimmermann authored
      nexedi/neoppod@9d0bf97a
      changed the msgcode increment logic, therefore we have to re-create the
      test data.
      74b0904e
    • Levin Zimmermann's avatar
      go/neo/client_test: NEO/py now only supports M encoding · 4161dd45
      Levin Zimmermann authored
      With the switch to the msgpack encoding, the NEO/py server
      now only supports the M encoding.
      4161dd45
    • Julien Muchembled's avatar
      protocol: switch to msgpack for packet serialization · 711bd878
      Julien Muchembled authored
      Not only for performance reasons (at least 3% faster) but also because of
      several ugly things in the way packets were defined:
      - packet field names, which are only documentary; for roots fields,
        they even just duplicate the packet names
      - a lot of repetitions for packet names, and even confusion between the name
        of the packet definition and the name of the actual notify/request packet
      - the need to implement field types for anything, like PByte to support new
        compression formats, since PBoolean is not enough
      
      neo/lib/protocol.py is now much smaller.
      711bd878
    • Levin Zimmermann's avatar
      go/neo/neonet: DialLink: Fix SIGSEGV in case client handshake fails · b89b447c
      Levin Zimmermann authored
      In case the last 'handshakeClient' call returns an error, 'DialLink'
      returns 'link = nil, err = nil'. Callers of 'DialLink' then don't
      recognize that 'link' is 'nil', as it's the convention to only check if
      'err' is 'nil', which leads to a 'segmentation violation' as soon as
      subsequent code tries to access fields of 'link':
      
      ```
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x14 pc=0x7087ae]
      
      goroutine 5 [running]:
      lab.nexedi.com/kirr/neo/go/neo/neonet.(*NodeLink).NewConn(0x0)
      	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/neo/neonet/connection.go:404 +0x4e
      lab.nexedi.com/kirr/neo/go/neo/xneo.Dial.func1()
      	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/neo/xneo/connect.go:138 +0x52
      lab.nexedi.com/kirr/neo/go/internal/xio.WithCloseOnErrCancel.func2()
      	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/internal/xio/xio.go:114 +0x6a
      created by lab.nexedi.com/kirr/neo/go/internal/xio.WithCloseOnErrCancel in goroutine 21
      	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/internal/xio/xio.go:109 +0x1ad
      ```
      
      This patch fixes this issue so that now 'err' and 'link' are never both
      'nil' again.
      
      /reviewed-by @kirr
      /reviewed-on kirr/neo!10
      b89b447c
    • Kirill Smelkov's avatar
      go/neo/neonet: Demonstrate DialLink misbehaviour when all handshake attempts fail · 12b3e2af
      Kirill Smelkov authored
      Levin found that when all handshake attempts fail DialLink returns
      both link=nil and err=nil which breaks what callers expect and lead to
      segmentation fault when accessing that nil link.
      
      -> Add test to demonstrate the problem.
      
      With xfail removed that test currently fails as
      
          --- FAIL: TestDialLink_AllHandshakeErr (0.00s)
          panic: lab.nexedi.com/kirr/neo/go/neo/neonet.TestDialLink_AllHandshakeErr.gox.func4.1: lab.nexedi.com/kirr/neo/go/neo/neonet.TestDialLink_AllHandshakeErr.func2: DialLink to handshake-rejecting server:
          have: link=<nil> err=<nil>
          want: link=<nil> err=client:1 - server:2: handshake (client): unexpected EOF [recovered]
      
      We will fix the problem in the next patch.
      
      /reported-by @levin.zimmermann
      /reported-at kirr/neo!10
      12b3e2af
  2. 23 Jul, 2024 2 commits
  3. 21 Jul, 2024 1 commit
    • Kirill Smelkov's avatar
      X: Sync zurl format with NEO/py · 95572d6a
      Kirill Smelkov authored
      Hello Kirill,
      
      in nexedi/neoppod!18 and nexedi/neoppod!21 we could find a common solution for a zurl format that previously diverged between NEOgo and NEOpy. The purpose of this MR is to sync again NEOgo and NEOpy zurl format. After merging this, we can continue to sync NEO zurl format in 'wendelin.core' & 'slapos'. Then we finally have unified approach again, which simplifies understanding and reduces unnecessary mental overhead.
      
      As this is strongly related to nexedi/neoppod!21 I thought it'd be a good idea to generally reduce difference and to replace WIP commits with merged NEOpy upstream commits.
      
      Best, Levin
      
      /reviewed-by @kirr
      /reviewed-on kirr/neo!7
      
      * lev/sync-zurl:
        client: Don't allow oPtion_nAme in zurl
        app: Remember SSL credentials so that it is possible to retrieve them
        client: Allow to force TLS via neos:// scheme
        client: Don't allow master_nodes and name to be present in options
        Revert "."
        Revert "Y client: Fix URI scheme to move credentials out of query"
        Revert "X Adjust NEO/go to neo:// URL change + py fixups"
        Revert "fixup! Y client: Fix URI scheme to move credentials out of query"
        Revert "Y client: Don't allow master_nodes and name to be present in options"
        go/client/zurl: Sync format to py upstream
      95572d6a
  4. 19 Jul, 2024 10 commits
  5. 02 Feb, 2024 3 commits
  6. 29 Jan, 2024 8 commits
  7. 22 Aug, 2023 1 commit
  8. 02 Aug, 2023 3 commits
    • Levin Zimmermann's avatar
      X: Teach NEO/go to handle multiple master nodes · db81e0de
      Levin Zimmermann authored
      See kirr/neo!2 for discussion,
      context and details.
      
      /reviewed-by @kirr
      
      * t-with-multiple-master-nodes:
        fixup! client_test: Add nmaster={1,2} to test matrix
        fixup! client_test: Support test cluster /w >1 master
        fixup! TalkMaster: Switch master if dialed M is secondary
        fixup! Node: Add support for NEO cluster with > 1 master
        fixup! Dial: Catch NotPrimaryMaster & return custom error
        fixup! proto: Implement Error for NotPrimaryMaster
        fixup! proto.NotPrimaryMaster: Fix .Primary data type (2)
        fixup! proto.NotPrimaryMaster: Fix .Primary data type (1)
        client_test: Add nmaster={1,2} to test matrix
        client_test: Support test cluster /w >1 master
        proto.NotPrimaryMaster: Fix .Primary data type
        TalkMaster: Switch master if dialed M is secondary
        Dial: Catch NotPrimaryMaster & return custom error
        proto: Implement Error for NotPrimaryMaster
        openClientByURL: Fix for >1 master (split URL host)
        Client.URL: Fix incomplete URL if > 1 master nodes
        Node: Add support for NEO cluster with > 1 master
      db81e0de
    • Kirill Smelkov's avatar
      fixup! client_test: Add nmaster={1,2} to test matrix · 4605cba1
      Kirill Smelkov authored
      Actually do test nmaster=2 case.
      4605cba1
    • Kirill Smelkov's avatar
      fixup! client_test: Support test cluster /w >1 master · f96c8ccd
      Kirill Smelkov authored
      - show all options in error context and ran test kind
      
      - skip nmaster > 1 for NEO/go as that is currently not implemented on NEO/go server
      
      - use json for interacting with runneo.py, so that we can use
        whatever builtin type for any argument without hardcoding ad-hoc
        handling of specific arguments inside runneo.py.
      
      - adjust comments + cosmetics
      f96c8ccd