1. 10 Jan, 2023 12 commits
    • Oz Shlomo's avatar
      net/mlx5e: TC, Restore pkt rate policing support · c09502d5
      Oz Shlomo authored
      The offending commit removed the support for all packet rate metering.
      Restore the pkt rate metering support by removing the restriction.
      
      Fixes: 3603f266 ("net/mlx5e: TC, allow meter jump control action")
      Signed-off-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c09502d5
    • Oz Shlomo's avatar
      net/mlx5e: TC, ignore match level for post meter rules · 2414c9b7
      Oz Shlomo authored
      The post meter table only matches on reg_c5. As such, the inner/outer
      match levels are irrelevant for the match critieria. The cited patch only
      sets the outer criteria to none, thus setting the inner match level for
      encapsulated packets. This caused rules with police action on tunnel
      devices to not find an existing flow group for the match criteria, thus
      failing to offload the rule.
      
      Set both the inner and outer match levels to none for post_meter rules.
      
      Fixes: 0d8c38d4 ("net/mlx5e: TC, init post meter rules with branching attributes")
      Signed-off-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2414c9b7
    • Dragos Tatulea's avatar
      net/mlx5e: IPoIB, Fix child PKEY interface stats on rx path · b5e23931
      Dragos Tatulea authored
      The current code always does the accounting using the
      stats from the parent interface (linked in the rq). This
      doesn't work when there are child interfaces configured.
      
      Fix this behavior by always using the stats from the child
      interface priv. This will also work for parent only
      interfaces: the child (netdev) and parent netdev (rq->netdev)
      will point to the same thing.
      
      Fixes: be98737a ("net/mlx5e: Use dynamic per-channel allocations in stats")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b5e23931
    • Dragos Tatulea's avatar
      net/mlx5e: IPoIB, Block PKEY interfaces with less rx queues than parent · 31c70bfe
      Dragos Tatulea authored
      A user is able to configure an arbitrary number of rx queues when
      creating an interface via netlink. This doesn't work for child PKEY
      interfaces because the child interface uses the parent receive channels.
      
      Although the child shares the parent's receive channels, the number of
      rx queues is important for the channel_stats array: the parent's rx
      channel index is used to access the child's channel_stats. So the array
      has to be at least as large as the parent's rx queue size for the
      counting to work correctly and to prevent out of bound accesses.
      
      This patch checks for the mentioned scenario and returns an error when
      trying to create the interface. The error is propagated to the user.
      
      Fixes: be98737a ("net/mlx5e: Use dynamic per-channel allocations in stats")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      31c70bfe
    • Dragos Tatulea's avatar
      net/mlx5e: IPoIB, Block queue count configuration when sub interfaces are present · 806a8df7
      Dragos Tatulea authored
      PKEY sub interfaces share the receive queues with the parent interface.
      While setting the sub interface queue count is not supported, it is
      currently possible to change the number of queues of the parent interface.
      Thus we can end up with inconsistent queue sizes between the parent and its
      sub interfaces.
      
      This change disallows setting the queue count on the parent interface when
      sub interfaces are present.
      
      This is achieved by introducing an explicit reference to the parent netdev
      in the mlx5i_priv of the child interface. An additional counter is also
      required on the parent side to detect when sub interfaces are attached and
      for proper cleanup.
      
      The rtnl lock is taken during the ethtool op and the sub interface
      ndo_init/uninit ops. There is no race here around counting the sub
      interfaces, reading the sub interfaces and setting the number of
      channels. The ASSERT_RTNL was added to document that.
      
      Fixes: be98737a ("net/mlx5e: Use dynamic per-channel allocations in stats")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      806a8df7
    • Roy Novich's avatar
      net/mlx5e: Verify dev is present for fix features ndo · ab4b01bf
      Roy Novich authored
      The native NIC port net device instance is being used as Uplink
      representor.  While changing profiles private resources are not
      available, fix features ndo does not check if the netdev is present.
      Add driver protection to verify private resources are ready.
      
      Fixes: 7a9fb35e ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
      Signed-off-by: default avatarRoy Novich <royno@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ab4b01bf
    • Moshe Shemesh's avatar
      net/mlx5: Fix command stats access after free · da2e552b
      Moshe Shemesh authored
      Command may fail while driver is reloading and can't accept FW commands
      till command interface is reinitialized. Such command failure is being
      logged to command stats. This results in NULL pointer access as command
      stats structure is being freed and reallocated during mlx5 devlink
      reload (see kernel log below).
      
      Fix it by making command stats statically allocated on driver probe.
      
      Kernel log:
      [ 2394.808802] BUG: unable to handle kernel paging request at 000000000002a9c0
      [ 2394.810610] PGD 0 P4D 0
      [ 2394.811811] Oops: 0002 [#1] SMP NOPTI
      ...
      [ 2394.815482] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
      ...
      [ 2394.829505] Call Trace:
      [ 2394.830667]  _raw_spin_lock_irq+0x23/0x26
      [ 2394.831858]  cmd_status_err+0x55/0x110 [mlx5_core]
      [ 2394.833020]  mlx5_access_reg+0xe7/0x150 [mlx5_core]
      [ 2394.834175]  mlx5_query_port_ptys+0x78/0xa0 [mlx5_core]
      [ 2394.835337]  mlx5e_ethtool_get_link_ksettings+0x74/0x590 [mlx5_core]
      [ 2394.836454]  ? kmem_cache_alloc_trace+0x140/0x1c0
      [ 2394.837562]  __rh_call_get_link_ksettings+0x33/0x100
      [ 2394.838663]  ? __rtnl_unlock+0x25/0x50
      [ 2394.839755]  __ethtool_get_link_ksettings+0x72/0x150
      [ 2394.840862]  duplex_show+0x6e/0xc0
      [ 2394.841963]  dev_attr_show+0x1c/0x40
      [ 2394.843048]  sysfs_kf_seq_show+0x9b/0x100
      [ 2394.844123]  seq_read+0x153/0x410
      [ 2394.845187]  vfs_read+0x91/0x140
      [ 2394.846226]  ksys_read+0x4f/0xb0
      [ 2394.847234]  do_syscall_64+0x5b/0x1a0
      [ 2394.848228]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      Fixes: 34f46ae0 ("net/mlx5: Add command failures data to debugfs")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      da2e552b
    • Ariel Levkovich's avatar
      net/mlx5e: TC, Keep mod hdr actions after mod hdr alloc · 5e72f3f1
      Ariel Levkovich authored
      When offloading TC NIC rule which has mod_hdr action, the
      mod_hdr actions list is freed upon mod_hdr allocation.
      
      In the new format of handling multi table actions and CT in
      particular, the mod_hdr actions list is still relevant when
      setting the pre and post rules and therefore, freeing the list
      may cause adding rules which don't set the FTE_ID.
      
      Therefore, the mod_hdr actions list needs to be kept for the
      pre/post flows as well and should be left for these handler to
      be freed.
      
      Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5e72f3f1
    • Ariel Levkovich's avatar
      net/mlx5: check attr pointer validity before dereferencing it · e0bf81bf
      Ariel Levkovich authored
      Fix attr pointer validity checks after it was already
      dereferenced.
      
      Fixes: cb0d54cb ("net/mlx5e: Fix wrong source vport matching on tunnel rule")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e0bf81bf
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Fix 'stack frame size exceeds limit' error in dr_rule · 17b3222e
      Yevgeny Kliteynik authored
      If the kernel configuration asks the compiler to check frame limit of 1K,
      dr_rule_create_rule_nic exceed this limit:
          "stack frame size (1184) exceeds limit (1024)"
      
      Fixing this issue by checking configured frame limit and using the
      optimization STE array only for cases with the usual 2K (or larger)
      stack size warning.
      
      Fixes: b9b81e1e ("net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      17b3222e
    • Heiner Kallweit's avatar
      Revert "r8169: disable detection of chip version 36" · 2ea26b4d
      Heiner Kallweit authored
      This reverts commit 42666b2c.
      
      This chip version seems to be very rare, but it exits in consumer
      devices, see linked report.
      
      Link: https://stackoverflow.com/questions/75049473/cant-setup-a-wired-network-in-archlinux-fresh-install
      Fixes: 42666b2c ("r8169: disable detection of chip version 36")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/42e9674c-d5d0-a65a-f578-e5c74f244739@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2ea26b4d
    • Ido Schimmel's avatar
      net/sched: act_mpls: Fix warning during failed attribute validation · 9e17f992
      Ido Schimmel authored
      The 'TCA_MPLS_LABEL' attribute is of 'NLA_U32' type, but has a
      validation type of 'NLA_VALIDATE_FUNCTION'. This is an invalid
      combination according to the comment above 'struct nla_policy':
      
      "
      Meaning of `validate' field, use via NLA_POLICY_VALIDATE_FN:
         NLA_BINARY           Validation function called for the attribute.
         All other            Unused - but note that it's a union
      "
      
      This can trigger the warning [1] in nla_get_range_unsigned() when
      validation of the attribute fails. Despite being of 'NLA_U32' type, the
      associated 'min'/'max' fields in the policy are negative as they are
      aliased by the 'validate' field.
      
      Fix by changing the attribute type to 'NLA_BINARY' which is consistent
      with the above comment and all other users of NLA_POLICY_VALIDATE_FN().
      As a result, move the length validation to the validation function.
      
      No regressions in MPLS tests:
      
       # ./tdc.py -f tc-tests/actions/mpls.json
       [...]
       # echo $?
       0
      
      [1]
      WARNING: CPU: 0 PID: 17743 at lib/nlattr.c:118
      nla_get_range_unsigned+0x1d8/0x1e0 lib/nlattr.c:117
      Modules linked in:
      CPU: 0 PID: 17743 Comm: syz-executor.0 Not tainted 6.1.0-rc8 #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
      RIP: 0010:nla_get_range_unsigned+0x1d8/0x1e0 lib/nlattr.c:117
      [...]
      Call Trace:
       <TASK>
       __netlink_policy_dump_write_attr+0x23d/0x990 net/netlink/policy.c:310
       netlink_policy_dump_write_attr+0x22/0x30 net/netlink/policy.c:411
       netlink_ack_tlv_fill net/netlink/af_netlink.c:2454 [inline]
       netlink_ack+0x546/0x760 net/netlink/af_netlink.c:2506
       netlink_rcv_skb+0x1b7/0x240 net/netlink/af_netlink.c:2546
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6109
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x5e9/0x6b0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x739/0x860 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0x38f/0x500 net/socket.c:2482
       ___sys_sendmsg net/socket.c:2536 [inline]
       __sys_sendmsg+0x197/0x230 net/socket.c:2565
       __do_sys_sendmsg net/socket.c:2574 [inline]
       __se_sys_sendmsg net/socket.c:2572 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2572
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Link: https://lore.kernel.org/netdev/CAO4mrfdmjvRUNbDyP0R03_DrD_eFCLCguz6OxZ2TYRSv0K9gxA@mail.gmail.com/
      Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
      Reported-by: default avatarWei Chen <harperchen1110@gmail.com>
      Tested-by: default avatarWei Chen <harperchen1110@gmail.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Link: https://lore.kernel.org/r/20230107171004.608436-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e17f992
  2. 09 Jan, 2023 7 commits
  3. 07 Jan, 2023 3 commits
    • David S. Miller's avatar
      Merge tag 'rxrpc-fixes-20230107' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 571f3dd0
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc: Fix race between call connection, data transmit and call disconnect
      
      Here are patches to fix an oops[1] caused by a race between call
      connection, initial packet transmission and call disconnection which
      results in something like:
      
              kernel BUG at net/rxrpc/peer_object.c:413!
      
      when the syzbot test is run.  The problem is that the connection procedure
      is effectively split across two threads and can get expanded by taking an
      interrupt, thereby adding the call to the peer error distribution list
      *after* it has been disconnected (say by the rxrpc socket shutting down).
      
      The easiest solution is to look at the fourth set of I/O thread
      conversion/SACK table expansion patches that didn't get applied[2] and take
      from it those patches that move call connection and disconnection into the
      I/O thread.  Moving these things into the I/O thread means that the
      sequencing is managed by all being done in the same thread - and the race
      can no longer happen.
      
      This is preferable to introducing an extra lock as adding an extra lock
      would make the I/O thread have to wait for the app thread in yet another
      place.
      
      The changes can be considered as a number of logical parts:
      
       (1) Move all of the call state changes into the I/O thread.
      
       (2) Make client connection ID space per-local endpoint so that the I/O
           thread doesn't need locks to access it.
      
       (3) Move actual abort generation into the I/O thread and clean it up.  If
           sendmsg or recvmsg want to cause an abort, they have to delegate it.
      
       (4) Offload the setting up of the security context on a connection to the
           thread of one of the apps that's starting a call.  We don't want to be
           doing any sort of crypto in the I/O thread.
      
       (5) Connect calls (ie. assign them to channel slots on connections) in the
           I/O thread.  Calls are set up by sendmsg/kafs and passed to the I/O
           thread to connect.  Connections are allocated in the I/O thread after
           this.
      
       (6) Disconnect calls in the I/O thread.
      
      I've also added a patch for an unrelated bug that cropped up during
      testing, whereby a race can occur between an incoming call and socket
      shutdown.
      
      Note that whilst this fixes the original syzbot bug, another bug may get
      triggered if this one is fixed:
      
              INFO: rcu detected stall in corrupted
              rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5792 } 2657 jiffies s: 2825 root: 0x0/T
              rcu: blocking rcu_node structures (internal RCU debug):
      
      It doesn't look this should be anything to do with rxrpc, though, as I've
      tested an additional patch[3] that removes practically all the RCU usage
      from rxrpc and it still occurs.  It seems likely that it is being caused by
      something in the tunnelling setup that the syzbot test does, but there's
      not enough info to go on.  It also seems unlikely to be anything to do with
      the afs driver as the test doesn't use that.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      571f3dd0
    • David Howells's avatar
      rxrpc: Fix incoming call setup race · 42f229c3
      David Howells authored
      An incoming call can race with rxrpc socket destruction, leading to a
      leaked call.  This may result in an oops when the call timer eventually
      expires:
      
         BUG: kernel NULL pointer dereference, address: 0000000000000874
         RIP: 0010:_raw_spin_lock_irqsave+0x2a/0x50
         Call Trace:
          <IRQ>
          try_to_wake_up+0x59/0x550
          ? __local_bh_enable_ip+0x37/0x80
          ? rxrpc_poke_call+0x52/0x110 [rxrpc]
          ? rxrpc_poke_call+0x110/0x110 [rxrpc]
          ? rxrpc_poke_call+0x110/0x110 [rxrpc]
          call_timer_fn+0x24/0x120
      
      with a warning in the kernel log looking something like:
      
         rxrpc: Call 00000000ba5e571a still in use (1,SvAwtACK,1061d,0)!
      
      incurred during rmmod of rxrpc.  The 1061d is the call flags:
      
         RECVMSG_READ_ALL, RX_HEARD, BEGAN_RX_TIMER, RX_LAST, EXPOSED,
         IS_SERVICE, RELEASED
      
      but no DISCONNECTED flag (0x800), so it's an incoming (service) call and
      it's still connected.
      
      The race appears to be that:
      
       (1) rxrpc_new_incoming_call() consults the service struct, checks sk_state
           and allocates a call - then pauses, possibly for an interrupt.
      
       (2) rxrpc_release_sock() sets RXRPC_CLOSE, nulls the service pointer,
           discards the prealloc and releases all calls attached to the socket.
      
       (3) rxrpc_new_incoming_call() resumes, launching the new call, including
           its timer and attaching it to the socket.
      
      Fix this by read-locking local->services_lock to access the AF_RXRPC socket
      providing the service rather than RCU in rxrpc_new_incoming_call().
      There's no real need to use RCU here as local->services_lock is only
      write-locked by the socket side in two places: when binding and when
      shutting down.
      
      Fixes: 5e6ef4f1 ("rxrpc: Make the I/O thread take over the call and local processor work")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-afs@lists.infradead.org
      42f229c3
    • Angela Czubak's avatar
      octeontx2-af: Fix LMAC config in cgx_lmac_rx_tx_enable · b4e9b876
      Angela Czubak authored
      PF netdev can request AF to enable or disable reception and transmission
      on assigned CGX::LMAC. The current code instead of disabling or enabling
      'reception and transmission' also disables/enable the LMAC. This patch
      fixes this issue.
      
      Fixes: 1435f66a ("octeontx2-af: CGX Rx/Tx enable/disable mbox handlers")
      Signed-off-by: default avatarAngela Czubak <aczubak@marvell.com>
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230105160107.17638-1-hkelam@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b4e9b876
  4. 06 Jan, 2023 18 commits
    • Tung Nguyen's avatar
      tipc: fix unexpected link reset due to discovery messages · c244c092
      Tung Nguyen authored
      This unexpected behavior is observed:
      
      node 1                    | node 2
      ------                    | ------
      link is established       | link is established
      reboot                    | link is reset
      up                        | send discovery message
      receive discovery message |
      link is established       | link is established
      send discovery message    |
                                | receive discovery message
                                | link is reset (unexpected)
                                | send reset message
      link is reset             |
      
      It is due to delayed re-discovery as described in function
      tipc_node_check_dest(): "this link endpoint has already reset
      and re-established contact with the peer, before receiving a
      discovery message from that node."
      
      However, commit 598411d7 has changed the condition for calling
      tipc_node_link_down() which was the acceptance of new media address.
      
      This commit fixes this by restoring the old and correct behavior.
      
      Fixes: 598411d7 ("tipc: make resetting of links non-atomic")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarTung Nguyen <tung.q.nguyen@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c244c092
    • David Howells's avatar
      rxrpc: Move client call connection to the I/O thread · 9d35d880
      David Howells authored
      Move the connection setup of client calls to the I/O thread so that a whole
      load of locking and barrierage can be eliminated.  This necessitates the
      app thread waiting for connection to complete before it can begin
      encrypting data.
      
      This also completes the fix for a race that exists between call connection
      and call disconnection whereby the data transmission code adds the call to
      the peer error distribution list after the call has been disconnected (say
      by the rxrpc socket getting closed).
      
      The fix is to complete the process of moving call connection, data
      transmission and call disconnection into the I/O thread and thus forcibly
      serialising them.
      
      Note that the issue may predate the overhaul to an I/O thread model that
      were included in the merge window for v6.2, but the timing is very much
      changed by the change given below.
      
      Fixes: cf37b598 ("rxrpc: Move DATA transmission into call processor work item")
      Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      9d35d880
    • David Howells's avatar
      rxrpc: Move the client conn cache management to the I/O thread · 0d6bf319
      David Howells authored
      Move the management of the client connection cache to the I/O thread rather
      than managing it from the namespace as an aggregate across all the local
      endpoints within the namespace.
      
      This will allow a load of locking to be got rid of in a future patch as
      only the I/O thread will be looking at the this.
      
      The downside is that the total number of cached connections on the system
      can get higher because the limit is now per-local rather than per-netns.
      We can, however, keep the number of client conns in use across the entire
      netfs and use that to reduce the expiration time of idle connection.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      0d6bf319
    • David Howells's avatar
      rxrpc: Remove call->state_lock · 96b4059f
      David Howells authored
      All the setters of call->state are now in the I/O thread and thus the state
      lock is now unnecessary.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      96b4059f
    • David Howells's avatar
      rxrpc: Move call state changes from recvmsg to I/O thread · 93368b6b
      David Howells authored
      Move the call state changes that are made in rxrpc_recvmsg() to the I/O
      thread.  This means that, thenceforth, only the I/O thread does this and
      the call state lock can be removed.
      
      This requires the Rx phase to be ended when the last packet is received,
      not when it is processed.
      
      Since this now changes the rxrpc call state to SUCCEEDED before we've
      consumed all the data from it, rxrpc_kernel_check_life() mustn't say the
      call is dead until the recvmsg queue is empty (unless the call has failed).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      93368b6b
    • David Howells's avatar
      rxrpc: Move call state changes from sendmsg to I/O thread · 2d689424
      David Howells authored
      Move all the call state changes that are made in rxrpc_sendmsg() to the I/O
      thread.  This is a step towards removing the call state lock.
      
      This requires the switch to the RXRPC_CALL_CLIENT_AWAIT_REPLY and
      RXRPC_CALL_SERVER_SEND_REPLY states to be done when the last packet is
      decanted from ->tx_sendmsg to ->tx_buffer in the I/O thread, not when it is
      added to ->tx_sendmsg by sendmsg().
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      2d689424
    • David Howells's avatar
      rxrpc: Wrap accesses to get call state to put the barrier in one place · d41b3f5b
      David Howells authored
      Wrap accesses to get the state of a call from outside of the I/O thread in
      a single place so that the barrier needed to order wrt the error code and
      abort code is in just that place.
      
      Also use a barrier when setting the call state and again when reading the
      call state such that the auxiliary completion info (error code, abort code)
      can be read without taking a read lock on the call state lock.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      d41b3f5b
    • David Howells's avatar
      rxrpc: Split out the call state changing functions into their own file · 0b9bb322
      David Howells authored
      Split out the functions that change the state of an rxrpc call into their
      own file.  The idea being to remove anything to do with changing the state
      of a call directly from the rxrpc sendmsg() and recvmsg() paths and have
      all that done in the I/O thread only, with the ultimate aim of removing the
      state lock entirely.  Moving the code out of sendmsg.c and recvmsg.c makes
      that easier to manage.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      0b9bb322
    • David Howells's avatar
      rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parameters · 1bab27af
      David Howells authored
      Use the information now stored in struct rxrpc_call to configure the
      connection bundle and thence the connection, rather than using the
      rxrpc_conn_parameters struct.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      1bab27af
    • David Howells's avatar
      rxrpc: Offload the completion of service conn security to the I/O thread · 2953d3b8
      David Howells authored
      Offload the completion of the challenge/response cycle on a service
      connection to the I/O thread.  After the RESPONSE packet has been
      successfully decrypted and verified by the work queue, offloading the
      changing of the call states to the I/O thread makes iteration over the
      conn's channel list simpler.
      
      Do this by marking the RESPONSE skbuff and putting it onto the receive
      queue for the I/O thread to collect.  We put it on the front of the queue
      as we've already received the packet for it.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      2953d3b8
    • David Howells's avatar
      rxrpc: Make the set of connection IDs per local endpoint · f06cb291
      David Howells authored
      Make the set of connection IDs per local endpoint so that endpoints don't
      cause each other's connections to get dismissed.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      f06cb291
    • David Howells's avatar
      rxrpc: Tidy up abort generation infrastructure · 57af281e
      David Howells authored
      Tidy up the abort generation infrastructure in the following ways:
      
       (1) Create an enum and string mapping table to list the reasons an abort
           might be generated in tracing.
      
       (2) Replace the 3-char string with the values from (1) in the places that
           use that to log the abort source.  This gets rid of a memcpy() in the
           tracepoint.
      
       (3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint
           and use values from (1) to indicate the trace reason.
      
       (4) Always make a call to an abort function at the point of the abort
           rather than stashing the values into variables and using goto to get
           to a place where it reported.  The C optimiser will collapse the calls
           together as appropriate.  The abort functions return a value that can
           be returned directly if appropriate.
      
      Note that this extends into afs also at the points where that generates an
      abort.  To aid with this, the afs sources need to #define
      RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header
      because they don't have access to the rxrpc internal structures that some
      of the tracepoints make use of.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      57af281e
    • David Howells's avatar
      rxrpc: Clean up connection abort · a00ce28b
      David Howells authored
      Clean up connection abort, using the connection state_lock to gate access
      to change that state, and use an rxrpc_call_completion value to indicate
      the difference between local and remote aborts as these can be pasted
      directly into the call state.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      a00ce28b
    • David Howells's avatar
      rxrpc: Implement a mechanism to send an event notification to a connection · f2cce89a
      David Howells authored
      Provide a means by which an event notification can be sent to a connection
      through such that the I/O thread can pick it up and handle it rather than
      doing it in a separate workqueue.
      
      This is then used to move the deferred final ACK of a call into the I/O
      thread rather than a separate work queue as part of the drive to do all
      transmission from the I/O thread.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      f2cce89a
    • David Howells's avatar
      rxrpc: Only disconnect calls in the I/O thread · 03fc55ad
      David Howells authored
      Only perform call disconnection in the I/O thread to reduce the locking
      requirement.
      
      This is the first part of a fix for a race that exists between call
      connection and call disconnection whereby the data transmission code adds
      the call to the peer error distribution list after the call has been
      disconnected (say by the rxrpc socket getting closed).
      
      The fix is to complete the process of moving call connection, data
      transmission and call disconnection into the I/O thread and thus forcibly
      serialising them.
      
      Note that the issue may predate the overhaul to an I/O thread model that
      were included in the merge window for v6.2, but the timing is very much
      changed by the change given below.
      
      Fixes: cf37b598 ("rxrpc: Move DATA transmission into call processor work item")
      Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      03fc55ad
    • David Howells's avatar
      rxrpc: Only set/transmit aborts in the I/O thread · a343b174
      David Howells authored
      Only set the abort call completion state in the I/O thread and only
      transmit ABORT packets from there.  rxrpc_abort_call() can then be made to
      actually send the packet.
      
      Further, ABORT packets should only be sent if the call has been exposed to
      the network (ie. at least one attempted DATA transmission has occurred for
      it).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      a343b174
    • David Howells's avatar
      rxrpc: Separate call retransmission from other conn events · 30df927b
      David Howells authored
      Call the rxrpc_conn_retransmit_call() directly from rxrpc_input_packet()
      rather than calling it via connection event handling.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      30df927b
    • David Howells's avatar
      rxrpc: Make the local endpoint hold a ref on a connected call · 5040011d
      David Howells authored
      Make the local endpoint and it's I/O thread hold a reference on a connected
      call until that call is disconnected.  Without this, we're reliant on
      either the AF_RXRPC socket to hold a ref (which is dropped when the call is
      released) or a queued work item to hold a ref (the work item is being
      replaced with the I/O thread).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      5040011d