1. 19 Oct, 2023 18 commits
  2. 18 Oct, 2023 22 commits
    • Phil Sutter's avatar
      net: skb_find_text: Ignore patterns extending past 'to' · c4eee56e
      Phil Sutter authored
      Assume that caller's 'to' offset really represents an upper boundary for
      the pattern search, so patterns extending past this offset are to be
      rejected.
      
      The old behaviour also was kind of inconsistent when it comes to
      fragmentation (or otherwise non-linear skbs): If the pattern started in
      between 'to' and 'from' offsets but extended to the next fragment, it
      was not found if 'to' offset was still within the current fragment.
      
      Test the new behaviour in a kselftest using iptables' string match.
      Suggested-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Fixes: f72b948d ("[NET]: skb_find_text ignores to argument")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4eee56e
    • David S. Miller's avatar
      Merge tag 'nf-next-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · 37fb1c81
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      netfilter next pull request 2023-10-18
      
      This series contains initial netfilter skb drop_reason support, from
      myself.
      
      First few patches fix up a few spots to make sure we won't trip
      when followup patches embed error numbers in the upper bits
      (we already do this in some places).
      
      Then, nftables and bridge netfilter get converted to call kfree_skb_reason
      directly to let tooling pinpoint exact location of packet drops,
      rather than the existing NF_DROP catchall in nf_hook_slow().
      
      I would like to eventually convert all netfilter modules, but as some
      callers cannot deal with NF_STOLEN (notably act_ct), more preparation
      work is needed for this.
      
      Last patch gets rid of an ugly 'de-const' cast in nftables.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37fb1c81
    • David S. Miller's avatar
      Merge branch 'ethtool-forced-speed' · 810799a0
      David S. Miller authored
      Paul Greenwalt says:
      
      ====================
      ethtool: Add link mode maps for forced speeds
      
      The following patch set was initially a part of [1]. As the purpose of the
      original series was to add the support of the new hardware to the intel ice
      driver, the refactoring of advertised link modes mapping was extracted to a
      new set.
      
      The patch set adds a common mechanism for mapping Ethtool forced speeds
      with Ethtool supported link modes, which can be used in drivers code.
      
      [1] https://lore.kernel.org/netdev/20230823180633.2450617-1-pawel.chmielewski@intel.com
      
      Changelog:
      v4->v5:
      Separated ethtool and qede changes into two patches, fixed indentation,
      and moved ethtool_forced_speed_maps_init() from ioctl.c to ethtool.h
      
      v3->v4:
      Moved the macro for setting fields into the common header file
      
      v2->v3:
      Fixed whitespaces, added missing line at end of file
      
      v1->v2:
      Fixed formatting, typo, moved declaration of iterator to loop line.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      810799a0
    • Pawel Chmielewski's avatar
      ice: Refactor finding advertised link speed · 982b0192
      Pawel Chmielewski authored
      Refactor ice_get_link_ksettings to using forced speed to link modes
      mapping.
      
      Suggested-by : Alexander Lobakin <aleksander.lobakin@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarPawel Chmielewski <pawel.chmielewski@intel.com>
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      982b0192
    • Paul Greenwalt's avatar
      qede: Refactor qede_forced_speed_maps_init() · a5b65cd2
      Paul Greenwalt authored
      Refactor qede_forced_speed_maps_init() to use commen implementation
      ethtool_forced_speed_maps_init().
      
      The qede driver was compile tested only.
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarPawel Chmielewski <pawel.chmielewski@intel.com>
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5b65cd2
    • Paul Greenwalt's avatar
      ethtool: Add forced speed to supported link modes maps · 26c5334d
      Paul Greenwalt authored
      The need to map Ethtool forced speeds to Ethtool supported link modes is
      common among drivers. To support this, add a common structure for forced
      speed maps and a function to init them.  This is solution was originally
      introduced in commit 1d4e4ecc ("qede: populate supported link modes
      maps on module init") for qede driver.
      
      ethtool_forced_speed_maps_init() should be called during driver init
      with an array of struct ethtool_forced_speed_map to populate the mapping.
      
      Definitions for maps themselves are left in the driver code, as the sets
      of supported link modes may vary between the devices.
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarPawel Chmielewski <pawel.chmielewski@intel.com>
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26c5334d
    • Florian Westphal's avatar
      netfilter: nf_tables: de-constify set commit ops function argument · 25600167
      Florian Westphal authored
      The set backend using this already has to work around this via ugly
      cast, don't spread this pattern.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      25600167
    • Florian Westphal's avatar
      netfilter: bridge: convert br_netfilter to NF_DROP_REASON · cf8b7c1a
      Florian Westphal authored
      errno is 0 because these hooks are called from prerouting and forward.
      There is no socket that the errno would ever be propagated to.
      
      Other netfilter modules (e.g. nf_nat, conntrack, ...) can be converted
      in a similar way.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      cf8b7c1a
    • Florian Westphal's avatar
      netfilter: make nftables drops visible in net dropmonitor · e0d45931
      Florian Westphal authored
      net_dropmonitor blames core.c:nf_hook_slow.
      Add NF_DROP_REASON() helper and use it in nft_do_chain().
      
      The helper releases the skb, so exact drop location becomes
      available. Calling code will observe the NF_STOLEN verdict
      instead.
      
      Adjust nf_hook_slow so we can embed an erro value wih
      NF_STOLEN verdicts, just like we do for NF_DROP.
      
      After this, drop in nftables can be pinpointed to a drop due
      to a rule or the chain policy.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      e0d45931
    • Florian Westphal's avatar
      netfilter: nf_nat: mask out non-verdict bits when checking return value · 35c038b0
      Florian Westphal authored
      Same as previous change: we need to mask out the non-verdict bits, as
      upcoming patches may embed an errno value in NF_STOLEN verdicts too.
      
      NF_DROP could already do this, but not all called functions do this.
      
      Checks that only test ret vs NF_ACCEPT are fine, the 'errno parts'
      are always 0 for those.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      35c038b0
    • Florian Westphal's avatar
      netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts · 6291b3a6
      Florian Westphal authored
      This function calls helpers that can return nf-verdicts, but then
      those get converted to -1/0 as thats what the caller expects.
      
      Theoretically NF_DROP could have an errno number set in the upper 24
      bits of the return value. Or any of those helpers could return
      NF_STOLEN, which would result in use-after-free.
      
      This is fine as-is, the called functions don't do this yet.
      
      But its better to avoid possible future problems if the upcoming
      patchset to add NF_DROP_REASON() support gains further users, so remove
      the 0/-1 translation from the picture and pass the verdicts down to
      the caller.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      6291b3a6
    • Florian Westphal's avatar
      netfilter: nf_tables: mask out non-verdict bits when checking return value · 4d26ab00
      Florian Westphal authored
      nftables trace infra must mask out the non-verdict bit parts of the
      return value, else followup changes that 'return errno << 8 | NF_STOLEN'
      will cause breakage.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      4d26ab00
    • Florian Westphal's avatar
      netfilter: xt_mangle: only check verdict part of return value · e15e5027
      Florian Westphal authored
      These checks assume that the caller only returns NF_DROP without
      any errno embedded in the upper bits.
      
      This is fine right now, but followup patches will start to propagate
      such errors to allow kfree_skb_drop_reason() in the called functions,
      those would then indicate 'errno << 8 | NF_STOLEN'.
      
      To not break things we have to mask those parts out.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      e15e5027
    • David S. Miller's avatar
      Merge branch 'devlink-deadlock' · a0a86022
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      devlink: fix a deadlock when taking devlink instance lock while holding RTNL lock
      
      devlink_port_fill() may be called sometimes with RTNL lock held.
      When putting the nested port function devlink instance attrs,
      current code takes nested devlink instance lock. In that case lock
      ordering is wrong.
      
      Patch #1 is a dependency of patch #2.
      Patch #2 converts the peernet2id_alloc() call to rely in RCU so it could
               called without devlink instance lock.
      Patch #3 takes device reference for devlink instance making sure that
               device does not disappear before devlink_release() is called.
      Patch #4 benefits from the preparations done in patches #2 and #3 and
               removes the problematic nested devlink lock aquisition.
      Patched #5-#7 improve documentation to reflect this issue so it is
                    avoided in the future.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0a86022
    • Jiri Pirko's avatar
      devlink: document devlink_rel_nested_in_notify() function · 5d77371e
      Jiri Pirko authored
      Add a documentation for devlink_rel_nested_in_notify() describing the
      devlink instance locking consequences.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d77371e
    • Jiri Pirko's avatar
      Documentation: devlink: add a note about RTNL lock into locking section · bb11cf9b
      Jiri Pirko authored
      Add a note describing the locking order of taking RTNL lock with devlink
      instance lock.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb11cf9b
    • Jiri Pirko's avatar
      Documentation: devlink: add nested instance section · b6f23b31
      Jiri Pirko authored
      Add a part talking about nested devlink instances describing
      the helpers and locking ordering.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6f23b31
    • Jiri Pirko's avatar
      devlink: don't take instance lock for nested handle put · b5f4e371
      Jiri Pirko authored
      Lockdep reports following issue:
      
      WARNING: possible circular locking dependency detected
      ------------------------------------------------------
      devlink/8191 is trying to acquire lock:
      ffff88813f32c250 (&devlink->lock_key#14){+.+.}-{3:3}, at: devlink_rel_devlink_handle_put+0x11e/0x2d0
      
                                 but task is already holding lock:
      ffffffff8511eca8 (rtnl_mutex){+.+.}-{3:3}, at: unregister_netdev+0xe/0x20
      
                                 which lock already depends on the new lock.
      
                                 the existing dependency chain (in reverse order) is:
      
                                 -> #3 (rtnl_mutex){+.+.}-{3:3}:
             lock_acquire+0x1c3/0x500
             __mutex_lock+0x14c/0x1b20
             register_netdevice_notifier_net+0x13/0x30
             mlx5_lag_add_mdev+0x51c/0xa00 [mlx5_core]
             mlx5_load+0x222/0xc70 [mlx5_core]
             mlx5_init_one_devl_locked+0x4a0/0x1310 [mlx5_core]
             mlx5_init_one+0x3b/0x60 [mlx5_core]
             probe_one+0x786/0xd00 [mlx5_core]
             local_pci_probe+0xd7/0x180
             pci_device_probe+0x231/0x720
             really_probe+0x1e4/0xb60
             __driver_probe_device+0x261/0x470
             driver_probe_device+0x49/0x130
             __driver_attach+0x215/0x4c0
             bus_for_each_dev+0xf0/0x170
             bus_add_driver+0x21d/0x590
             driver_register+0x133/0x460
             vdpa_match_remove+0x89/0xc0 [vdpa]
             do_one_initcall+0xc4/0x360
             do_init_module+0x22d/0x760
             load_module+0x51d7/0x6750
             init_module_from_file+0xd2/0x130
             idempotent_init_module+0x326/0x5a0
             __x64_sys_finit_module+0xc1/0x130
             do_syscall_64+0x3d/0x90
             entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
                                 -> #2 (mlx5_intf_mutex){+.+.}-{3:3}:
             lock_acquire+0x1c3/0x500
             __mutex_lock+0x14c/0x1b20
             mlx5_register_device+0x3e/0xd0 [mlx5_core]
             mlx5_init_one_devl_locked+0x8fa/0x1310 [mlx5_core]
             mlx5_devlink_reload_up+0x147/0x170 [mlx5_core]
             devlink_reload+0x203/0x380
             devlink_nl_cmd_reload+0xb84/0x10e0
             genl_family_rcv_msg_doit+0x1cc/0x2a0
             genl_rcv_msg+0x3c9/0x670
             netlink_rcv_skb+0x12c/0x360
             genl_rcv+0x24/0x40
             netlink_unicast+0x435/0x6f0
             netlink_sendmsg+0x7a0/0xc70
             sock_sendmsg+0xc5/0x190
             __sys_sendto+0x1c8/0x290
             __x64_sys_sendto+0xdc/0x1b0
             do_syscall_64+0x3d/0x90
             entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
                                 -> #1 (&dev->lock_key#8){+.+.}-{3:3}:
             lock_acquire+0x1c3/0x500
             __mutex_lock+0x14c/0x1b20
             mlx5_init_one_devl_locked+0x45/0x1310 [mlx5_core]
             mlx5_devlink_reload_up+0x147/0x170 [mlx5_core]
             devlink_reload+0x203/0x380
             devlink_nl_cmd_reload+0xb84/0x10e0
             genl_family_rcv_msg_doit+0x1cc/0x2a0
             genl_rcv_msg+0x3c9/0x670
             netlink_rcv_skb+0x12c/0x360
             genl_rcv+0x24/0x40
             netlink_unicast+0x435/0x6f0
             netlink_sendmsg+0x7a0/0xc70
             sock_sendmsg+0xc5/0x190
             __sys_sendto+0x1c8/0x290
             __x64_sys_sendto+0xdc/0x1b0
             do_syscall_64+0x3d/0x90
             entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
                                 -> #0 (&devlink->lock_key#14){+.+.}-{3:3}:
             check_prev_add+0x1af/0x2300
             __lock_acquire+0x31d7/0x4eb0
             lock_acquire+0x1c3/0x500
             __mutex_lock+0x14c/0x1b20
             devlink_rel_devlink_handle_put+0x11e/0x2d0
             devlink_nl_port_fill+0xddf/0x1b00
             devlink_port_notify+0xb5/0x220
             __devlink_port_type_set+0x151/0x510
             devlink_port_netdevice_event+0x17c/0x220
             notifier_call_chain+0x97/0x240
             unregister_netdevice_many_notify+0x876/0x1790
             unregister_netdevice_queue+0x274/0x350
             unregister_netdev+0x18/0x20
             mlx5e_vport_rep_unload+0xc5/0x1c0 [mlx5_core]
             __esw_offloads_unload_rep+0xd8/0x130 [mlx5_core]
             mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core]
             mlx5_esw_offloads_unload_rep+0x85/0xc0 [mlx5_core]
             mlx5_eswitch_unload_sf_vport+0x41/0x90 [mlx5_core]
             mlx5_devlink_sf_port_del+0x120/0x280 [mlx5_core]
             genl_family_rcv_msg_doit+0x1cc/0x2a0
             genl_rcv_msg+0x3c9/0x670
             netlink_rcv_skb+0x12c/0x360
             genl_rcv+0x24/0x40
             netlink_unicast+0x435/0x6f0
             netlink_sendmsg+0x7a0/0xc70
             sock_sendmsg+0xc5/0x190
             __sys_sendto+0x1c8/0x290
             __x64_sys_sendto+0xdc/0x1b0
             do_syscall_64+0x3d/0x90
             entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
                                 other info that might help us debug this:
      
      Chain exists of:
                                   &devlink->lock_key#14 --> mlx5_intf_mutex --> rtnl_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock(mlx5_intf_mutex);
                                     lock(rtnl_mutex);
        lock(&devlink->lock_key#14);
      
      Problem is taking the devlink instance lock of nested instance when RTNL
      is already held.
      
      To fix this, don't take the devlink instance lock when putting nested
      handle. Instead, rely on the preparations done by previous two patches
      to be able to access device pointer and obtain netns id without devlink
      instance lock held.
      
      Fixes: c137743b ("devlink: introduce object and nested devlink relationship infra")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5f4e371
    • Jiri Pirko's avatar
      devlink: take device reference for devlink object · a3806872
      Jiri Pirko authored
      In preparation to allow to access device pointer without devlink
      instance lock held, make sure the device pointer is usable until
      devlink_release() is called.
      
      Fixes: c137743b ("devlink: introduce object and nested devlink relationship infra")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3806872
    • Jiri Pirko's avatar
      devlink: call peernet2id_alloc() with net pointer under RCU read lock · c503bc7d
      Jiri Pirko authored
      peernet2id_alloc() allows to be called lockless with peer net pointer
      obtained in RCU critical section and makes sure to return ns ID if net
      namespaces is not being removed concurrently. Benefit from
      read_pnet_rcu() helper addition, use it to obtain net pointer under RCU
      read lock and pass it to peernet2id_alloc() to get ns ID.
      
      Fixes: c137743b ("devlink: introduce object and nested devlink relationship infra")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c503bc7d
    • Jiri Pirko's avatar
      net: treat possible_net_t net pointer as an RCU one and add read_pnet_rcu() · 2034d90a
      Jiri Pirko authored
      Make the net pointer stored in possible_net_t structure annotated as
      an RCU pointer. Change the access helpers to treat it as such.
      Introduce read_pnet_rcu() helper to allow caller to dereference
      the net pointer under RCU read lock.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2034d90a
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ee2a35fe
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2023-10-10
      
      1) Adham Faris, Increase max supported channels number to 256
      
      2) Leon Romanovsky, Allow IPsec soft/hard limits in bytes
      
      3) Shay Drory, Replace global mlx5_intf_lock with
         HCA devcom component lock
      
      4) Wei Zhang, Optimize SF creation flow
      
      During SF creation, HCA state gets changed from INVALID to
      IN_USE step by step. Accordingly, FW sends vhca event to
      driver to inform about this state change asynchronously.
      Each vhca event is critical because all related SW/FW
      operations are triggered by it.
      
      Currently there is only a single mlx5 general event handler
      which not only handles vhca event but many other events.
      This incurs huge bottleneck because all events are forced
      to be handled in serial manner.
      
      Moreover, all SFs share same table_lock which inevitably
      impacts each other when they are created in parallel.
      
      This series will solve this issue by:
      
      1. A dedicated vhca event handler is introduced to eliminate
         the mutual impact with other mlx5 events.
      2. Max FW threads work queues are employed in the vhca event
         handler to fully utilize FW capability.
      3. Redesign SF active work logic to completely remove
         table_lock.
      
      With above optimization, SF creation time is reduced by 25%,
      i.e. from 80s to 60s when creating 100 SFs.
      
      Patches summary:
      
      Patch 1 - implement dedicated vhca event handler with max FW
                cmd threads of work queues.
      Patch 2 - remove table_lock by redesigning SF active work
                logic.
      
      * tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5e: Allow IPsec soft/hard limits in bytes
        net/mlx5e: Increase max supported channels number to 256
        net/mlx5e: Preparations for supporting larger number of channels
        net/mlx5e: Refactor mlx5e_rss_init() and mlx5e_rss_free() API's
        net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh()
        net/mlx5e: Refactor rx_res_init() and rx_res_free() APIs
        net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify code
        net/mlx5: Use PTR_ERR_OR_ZERO() to simplify code
        net/mlx5: fix config name in Kconfig parameter documentation
        net/mlx5: Remove unused declaration
        net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock
        net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom
        net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
        net/mlx5: Redesign SF active work to remove table_lock
        net/mlx5: Parallelize vhca event handling
      ====================
      
      Link: https://lore.kernel.org/r/20231014171908.290428-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ee2a35fe