1. 22 May, 2022 25 commits
    • David Howells's avatar
      afs: Adjust ACK interpretation to try and cope with NAT · adc9613f
      David Howells authored
      If a client's address changes, say if it is NAT'd, this can disrupt an in
      progress operation.  For most operations, this is not much of a problem,
      but StoreData can be different as some servers modify the target file as
      the data comes in, so if a store request is disrupted, the file can get
      corrupted on the server.
      
      The problem is that the server doesn't recognise packets that come after
      the change of address as belonging to the original client and will bounce
      them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if
      the packet number falls within the initial sequence number window of a call
      or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting
      it.  In both cases, firstPacket will be 1 and previousPacket will be 0 in
      the ACK information.
      
      Fix this by the following means:
      
       (1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1
           and previousPacket as 0, assume this indicates that the server saw the
           incoming packets from a different peer and thus as a different call.
           Fail the call with error -ENETRESET.
      
       (2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the
           first packet has been hard-ACK'd.  If it hasn't been hard-ACK'd, the
           ACK packet will cause it to get retransmitted, so the call will just
           be repeated.
      
       (3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of
           the operation.
      
       (4) Prioritise the error code over things like -ECONNRESET as the server
           did actually respond.
      
       (5) Make writeback treat -ENETRESET as a retryable error and make it
           redirty all the pages involved in a write so that the VM will retry.
      
      Note that there is still a circumstance that I can't easily deal with: if
      the operation is fully received and processed by the server, but the reply
      is lost due to address change.  There's no way to know if the op happened.
      We can examine the server, but a conflicting change could have been made by
      a third party - and we can't tell the difference.  In such a case, a
      message like:
      
          kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a)
      
      will be logged to dmesg on the next op to touch the file and the client
      will reset the inode state, including invalidating clean parts of the
      pagecache.
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-afs@lists.infradead.org
      Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html # v1
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adc9613f
    • David Howells's avatar
      rxrpc, afs: Fix selection of abort codes · de696c47
      David Howells authored
      The RX_USER_ABORT code should really only be used to indicate that the user
      of the rxrpc service (ie. userspace) implicitly caused a call to be aborted
      - for instance if the AF_RXRPC socket is closed whilst the call was in
      progress.  (The user may also explicitly abort a call and specify the abort
      code to use).
      
      Change some of the points of generation to use other abort codes instead:
      
       (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see
           ENOMEM and EFAULT during received data delivery and abort with
           RX_CALL_DEAD in the default case.
      
       (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a
           reply.
      
       (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had
           heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't.
      
       (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not
           completed successfully or been aborted.
      Reported-by: default avatarJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de696c47
    • David Howells's avatar
      rxrpc: Return an error to sendmsg if call failed · 4ba68c51
      David Howells authored
      If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that
      was being given data was aborted remotely or otherwise failed, return an
      error rather than returning the amount of data buffered for transmission.
      
      The call (presumably) did not complete, so there's not much point
      continuing with it.  AF_RXRPC considers it "complete" and so will be
      unwilling to do anything else with it - and won't send a notification for
      it, deeming the return from sendmsg sufficient.
      
      Not returning an error causes afs to incorrectly handle a StoreData
      operation that gets interrupted by a change of address due to NAT
      reconfiguration.
      
      This doesn't normally affect most operations since their request parameters
      tend to fit into a single UDP packet and afs_make_call() returns before the
      server responds; StoreData is different as it involves transmission of a
      lot of data.
      
      This can be triggered on a client by doing something like:
      
      	dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512
      
      at one prompt, and then changing the network address at another prompt,
      e.g.:
      
      	ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0
      
      Tracing packets on an Auristor fileserver looks something like:
      
      192.168.6.1 -> 192.168.6.3  RX 107 ACK Idle  Seq: 0  Call: 4  Source Port: 7000  Destination Port: 7001
      192.168.6.3 -> 192.168.6.1  AFS (RX) 1482 FS Request: Unknown(64538) (64538)
      192.168.6.3 -> 192.168.6.1  AFS (RX) 1482 FS Request: Unknown(64538) (64538)
      192.168.6.1 -> 192.168.6.3  RX 107 ACK Idle  Seq: 0  Call: 4  Source Port: 7000  Destination Port: 7001
      <ARP exchange for 192.168.6.2>
      192.168.6.2 -> 192.168.6.1  AFS (RX) 1482 FS Request: Unknown(0) (0)
      192.168.6.2 -> 192.168.6.1  AFS (RX) 1482 FS Request: Unknown(0) (0)
      192.168.6.1 -> 192.168.6.2  RX 107 ACK Exceeds Window  Seq: 0  Call: 4  Source Port: 7000  Destination Port: 7001
      192.168.6.1 -> 192.168.6.2  RX 74 ABORT  Seq: 0  Call: 4  Source Port: 7000  Destination Port: 7001
      192.168.6.1 -> 192.168.6.2  RX 74 ABORT  Seq: 29321  Call: 4  Source Port: 7000  Destination Port: 7001
      
      The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort
      code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the
      condition and generates an abort first and the unmarshal error is a
      consequence of that at the application layer.
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-afs@lists.infradead.org
      Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html # v1
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ba68c51
    • David Howells's avatar
      rxrpc: Automatically generate trace tag enums · dc9fd093
      David Howells authored
      Automatically generate trace tag enums from the symbol -> string mapping
      tables rather than having the enums as well, thereby reducing duplicated
      data.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc9fd093
    • David Howells's avatar
      rxrpc: Fix locking issue · ad25f5cb
      David Howells authored
      There's a locking issue with the per-netns list of calls in rxrpc.  The
      pieces of code that add and remove a call from the list use write_lock()
      and the calls procfile uses read_lock() to access it.  However, the timer
      callback function may trigger a removal by trying to queue a call for
      processing and finding that it's already queued - at which point it has a
      spare refcount that it has to do something with.  Unfortunately, if it puts
      the call and this reduces the refcount to 0, the call will be removed from
      the list.  Unfortunately, since the _bh variants of the locking functions
      aren't used, this can deadlock.
      
      ================================
      WARNING: inconsistent lock state
      5.18.0-rc3-build4+ #10 Not tainted
      --------------------------------
      inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes:
      ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b
      {SOFTIRQ-ON-W} state was registered at:
      ...
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&rxnet->call_lock);
        <Interrupt>
          lock(&rxnet->call_lock);
      
       *** DEADLOCK ***
      
      1 lock held by ksoftirqd/2/25:
       #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d
      
      Changes
      =======
      ver #2)
       - Changed to using list_next_rcu() rather than rcu_dereference() directly.
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad25f5cb
    • David Howells's avatar
      rxrpc: Use refcount_t rather than atomic_t · a0575429
      David Howells authored
      Move to using refcount_t rather than atomic_t for refcounts in rxrpc.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0575429
    • David Howells's avatar
      rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc · 33912c26
      David Howells authored
      Allow the list of in-use local UDP endpoints in the current network
      namespace to be viewed in /proc.
      
      To aid with this, the endpoint list is converted to an hlist and RCU-safe
      manipulation is used so that the list can be read with only the RCU
      read lock held.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33912c26
    • David S. Miller's avatar
      Merge branch 'ipa-next' · 0598cec9
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: a few more small items
      
      This series consists of three small sets of changes.  Version 2 adds
      a patch that avoids a warning that occurs when handling a modem
      crash (I unfortunately didn't notice it earlier).  All other patches
      are the same--just rebased.
      
      The first three patches allow a few endpoint features to be
      specified.  At this time, currently-defined endpoints retain the
      same configuration, but when the monitor functionality is added in
      the next cycle these options will be required.
      
      The fourth patch simply removes an unused function, explaining also
      why it would likely never be used.
      
      The fifth patch is new.  It counts the number of modem TX endpoints
      and uses it to determine how many TREs a transaction needs when
      when handling a modem crash.  It is needed to avoid exceeding the
      limited number of commands imposed by the last four patches.
      
      And the last four patches refactor code related to IPA immediate
      commands, eliminating an unused field and then simplifying and
      removing some unneeded code.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0598cec9
    • Alex Elder's avatar
      net: ipa: use data space for command opcodes · a224bd4b
      Alex Elder authored
      The 64-bit data field in a transaction is not used for commands.
      And the opcode array is *only* used for commands.  They're
      (currently) the same size; save a little space in the transaction
      structure by enclosing the two fields in a union.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a224bd4b
    • Alex Elder's avatar
      net: ipa: remove command info pool · 8797972a
      Alex Elder authored
      The ipa_cmd_info structure now contains only one field, and it's an
      enumerated type whose values all fit in 8 bits.  Currently we'll
      never use more than 8 TREs in a command transaction, and we can
      represent that number of command opcodes in the same space as a 64
      bit pointer to an ipa_cmd_info structure.
      
      Define IPA_COMMAND_TRANS_TRE_MAX as the maximum number of TREs that
      can be in a command transaction.  Replace the info pointer in a
      transaction with a fixed-size array named cmd_opcode[] of that many
      bytes.  Store the opcode in this array when adding a command TRE to
      a transaction, as was done previously for the info array.  This
      makes the ipa_cmd_info unused, so get rid of it.
      
      When committing an immediate command transaction, use the channel's
      Boolean command flag to determine whether to fill in the opcode,
      which will be taken (as before) from the array in the transaction.
      
      This makes the command info pool unnecessary, so get rid of it.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8797972a
    • Alex Elder's avatar
      net: ipa: remove command direction argument · 4de284b7
      Alex Elder authored
      We no longer use the direction argument for gsi_trans_cmd_add(), so
      get rid of it in its definition, and in its seven callers.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4de284b7
    • Alex Elder's avatar
      net: ipa: get rid of ipa_cmd_info->direction · 7ffba3bd
      Alex Elder authored
      The direction field of the ipa_cmd_info structure is set, but never
      used.  It seems it might have been used for the DMA_SHARED_MEM
      immediate command, but the DIRECTION flag is set based on the value
      of the passed-in direction flag there.
      
      Anyway, remove this unused field from the ipa_cmd_info structure.
      This is done as a separate patch to make it very obvious that it's
      not required.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ffba3bd
    • Alex Elder's avatar
      net: ipa: count the number of modem TX endpoints · 2091c79a
      Alex Elder authored
      In ipa_endpoint_modem_exception_reset_all(), a high estimate was
      made of the number of endpoints that need their status register
      updated.  We only used what was needed, so the high estimate didn't
      matter much.
      
      However the next few patches are going to limit the number of
      commands in a single transaction, and the overestimate would exceed
      that.  So count the number of modem TX endpoints at initialization
      time, and use it in ipa_endpoint_modem_exception_reset_all().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2091c79a
    • Alex Elder's avatar
      net: ipa: kill gsi_trans_commit_wait_timeout() · d15180b4
      Alex Elder authored
      Since the beginning gsi_trans_commit_wait_timeout() has existed to
      provide a way to allow waiting a limited time for a transaction
      to complete.  But that function has never been used.
      
      In fact, there is no use for this function, because a transaction
      committed to hardware should *always* complete.  The only reason it
      might not complete is if there were a hardware failure, or perhaps a
      system configuration error.
      
      Furthermore, if a timeout ever did occur, the IPA hardware would be
      in an indeterminate state, from which there is no recovery.  It
      would require some sort of complete IPA reset, and would require the
      participation of the modem, and at this time there is no such
      sequence defined.
      
      So get rid of the definition of gsi_trans_commit_wait_timeout(), and
      update a few comments accordingly.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d15180b4
    • Alex Elder's avatar
      net: ipa: specify RX aggregation time limit in config data · beb90cba
      Alex Elder authored
      Don't assume that a 500 microsecond time limit should be used for
      all receive endpoints that support aggregation.  Instead, specify
      the time limit to use in the configuration data.
      
      Set a 500 microsecond limit for all existing RX endpoints, as before.
      
      Checking for overflow for the time limit field is a bit complicated.
      Rather than duplicate a lot of code in ipa_endpoint_data_valid_one(),
      call WARN() if any value is found to be too large when encoding it.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      beb90cba
    • Alex Elder's avatar
      net: ipa: support hard aggregation limits · 3cebb7c2
      Alex Elder authored
      Add a new flag for AP receive endpoints that indicates whether
      a "hard limit" is used as a criterion for closing aggregation.
      Add comments explaining the difference between "hard" and "soft"
      aggregation limits.
      
      Pass a flag to ipa_aggr_size_kb() so it computes the proper
      aggregation size value whether using hard or soft limits.  Move
      that function earlier in "ipa_endpoint.c" so it can be used
      without a forward-reference.
      
      Update ipa_endpoint_data_valid_one() so it validates endpoints whose
      data indicate a hard aggregation limit is used, and so it reports
      set aggregation flags for endpoints without aggregation enabled.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cebb7c2
    • Alex Elder's avatar
      net: ipa: make endpoint HOLB drop configurable · 153213f0
      Alex Elder authored
      Add a new Boolean flag for RX endpoints defining whether HOLB drop
      is initially enabled or disabled for the endpoint.  All existing AP
      endpoints should have HOLB drop disabled.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      153213f0
    • Julia Lawall's avatar
      qed: fix typos in comments · 60f243ad
      Julia Lawall authored
      Spelling mistakes (triple letters) in comments.
      Detected with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60f243ad
    • Julia Lawall's avatar
      nfp: flower: fix typo in comment · b993e72c
      Julia Lawall authored
      Spelling mistake (triple letters) in comment.
      Detected with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Reviewed-by: default avatarNiklas Söderlund <niklas.soderlund@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b993e72c
    • Julia Lawall's avatar
      net: marvell: prestera: fix typo in comment · 878e2eb2
      Julia Lawall authored
      Spelling mistake (triple letters) in comment.
      Detected with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      878e2eb2
    • Julia Lawall's avatar
      cirrus: cs89x0: fix typo in comment · 3f660c18
      Julia Lawall authored
      Spelling mistake (triple letters) in comment.
      Detected with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f660c18
    • Julia Lawall's avatar
      net: qed: fix typos in comments · cc4e7fa5
      Julia Lawall authored
      Spelling mistakes (triple letters) in comments.
      Detected with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc4e7fa5
    • Julia Lawall's avatar
      net/mlx5: fix typo in comment · b0ea505b
      Julia Lawall authored
      Spelling mistake (triple letters) in comment.
      Detected with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0ea505b
    • Julia Lawall's avatar
      net: mvpp2: fix typo in comment · e34be16b
      Julia Lawall authored
      Spelling mistake (triple letters) in comment.
      Detected with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e34be16b
    • Julia Lawall's avatar
      net: sparx5: switchdev: fix typo in comment · 1f36a72a
      Julia Lawall authored
      Spelling mistake (triple letters) in comment.
      Detected with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f36a72a
  2. 21 May, 2022 12 commits
  3. 20 May, 2022 3 commits