1. 13 Jun, 2024 5 commits
  2. 12 Jun, 2024 13 commits
  3. 11 Jun, 2024 8 commits
    • Kenta Tada's avatar
      bpftool: Query only cgroup-related attach types · 98b303c9
      Kenta Tada authored
      When CONFIG_NETKIT=y,
      bpftool-cgroup shows error even if the cgroup's path is correct:
      
      $ bpftool cgroup tree /sys/fs/cgroup
      CgroupPath
      ID       AttachType      AttachFlags     Name
      Error: can't query bpf programs attached to /sys/fs/cgroup: No such device or address
      
      >From strace and kernel tracing, I found netkit returned ENXIO and this command failed.
      I think this AttachType(BPF_NETKIT_PRIMARY) is not relevant to cgroup.
      
      bpftool-cgroup should query just only cgroup-related attach types.
      
      v2->v3:
        - removed an unnecessary check
      
      v1->v2:
        - used an array of cgroup attach types
      Signed-off-by: default avatarKenta Tada <tadakentaso@gmail.com>
      Reviewed-by: default avatarQuentin Monnet <qmo@kernel.org>
      Link: https://lore.kernel.org/r/20240607111704.6716-1-tadakentaso@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      98b303c9
    • Jakub Kicinski's avatar
      Merge branch 'intel-wired-lan-driver-updates-2024-06-03' · bb678f01
      Jakub Kicinski authored
      Jacob Keller says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-06-03
      
      This series includes miscellaneous improvements for the ice as well as a
      cleanup to the Makefiles for all Intel net drivers.
      
      Andy fixes all of the Intel net driver Makefiles to use the documented
      '*-y' syntax for specifying object files to link into kernel driver
      modules, rather than the '*-objs' syntax which works but is documented as
      reserved for user-space host programs.
      
      Jacob has a cleanup to refactor rounding logic in the ice driver into a
      common roundup_u64 helper function.
      
      Michal Schmidt replaces irq_set_affinity_hint() to use
      irq_update_affinity_hint() which behaves better with user-applied affinity
      settings.
      
      v2: https://lore.kernel.org/r/20240605-next-2024-06-03-intel-next-batch-v2-0-39c23963fa78@intel.com
      v1: https://lore.kernel.org/r/20240603-next-2024-06-03-intel-next-batch-v1-0-e0523b28f325@intel.com
      ====================
      
      Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-0-d1470cee3347@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bb678f01
    • Michal Schmidt's avatar
      ice: use irq_update_affinity_hint() · dee55767
      Michal Schmidt authored
      irq_set_affinity_hint() is deprecated. Use irq_update_affinity_hint()
      instead. This removes the side-effect of actually applying the affinity.
      
      The driver does not really need to worry about spreading its IRQs across
      CPUs. The core code already takes care of that.
      On the contrary, when the driver applies affinities by itself, it breaks
      the users' expectations:
       1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in
          order to prevent IRQs from being moved to certain CPUs that run a
          real-time workload.
       2. ice reconfigures VSIs at runtime due to a MIB change
          (ice_dcb_process_lldp_set_mib_change). Reopening a VSI resets the
          affinity in ice_vsi_req_irq_msix().
       3. ice has no idea about irqbalance's config, so it may move an IRQ to
          a banned CPU. The real-time workload suffers unacceptable latency.
      
      I am not sure if updating the affinity hints is at all useful, because
      irqbalance ignores them since 2016 ([1]), but at least it's harmless.
      
      This ice change is similar to i40e commit d34c54d1 ("i40e: Use
      irq_update_affinity_hint()").
      
      [1] https://github.com/Irqbalance/irqbalance/commit/dcc411e7bfddSigned-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-3-d1470cee3347@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dee55767
    • Jacob Keller's avatar
      ice: add and use roundup_u64 instead of open coding equivalent · 1d4ce389
      Jacob Keller authored
      In ice_ptp_cfg_clkout(), the ice driver needs to calculate the nearest next
      second of a current time value specified in nanoseconds. It implements this
      using div64_u64, because the time value is a u64. It could use div_u64
      since NSEC_PER_SEC is smaller than 32-bits.
      
      Ideally this would be implemented directly with roundup(), but that can't
      work on all platforms due to a division which requires using the specific
      macros and functions due to platform restrictions, and to ensure that the
      most appropriate and fast instructions are used.
      
      The kernel doesn't currently provide any 64-bit equivalents for doing
      roundup. Attempting to use roundup() on a 32-bit platform will result in a
      link failure due to not having a direct 64-bit division.
      
      The closest equivalent for this is DIV64_U64_ROUND_UP, which does a
      division always rounding up. However, this only computes the division, and
      forces use of the div64_u64 in cases where the divisor is a 32bit value and
      could make use of div_u64.
      
      Introduce DIV_U64_ROUND_UP based on div_u64, and then use it to implement
      roundup_u64 which takes a u64 input value and a u32 rounding value.
      
      The name roundup_u64 matches the naming scheme of div_u64, and future
      patches could implement roundup64_u64 if they need to round by a multiple
      that is greater than 32-bits.
      
      Replace the logic in ice_ptp.c which does this equivalent with the newly
      added roundup_u64.
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-2-d1470cee3347@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d4ce389
    • Andy Shevchenko's avatar
      net: intel: Use *-y instead of *-objs in Makefile · a2fe35df
      Andy Shevchenko authored
      *-objs suffix is reserved rather for (user-space) host programs while
      usually *-y suffix is used for kernel drivers (although *-objs works
      for that purpose for now).
      
      Let's correct the old usages of *-objs in Makefiles.
      Reviewed-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Reviewed-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-1-d1470cee3347@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a2fe35df
    • Jeff Johnson's avatar
      isdn: add missing MODULE_DESCRIPTION() macros · 2ebb87f4
      Jeff Johnson authored
      make allmodconfig && make W=1 C=1 reports:
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcpci.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcmulti.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcsusb.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/avmfritz.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/speedfax.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNinfineon.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/w6692.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/netjet.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNipac.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNisar.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/mISDN_core.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/mISDN_dsp.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/l1oip.o
      
      Add the missing invocations of the MODULE_DESCRIPTION() macro.
      Signed-off-by: default avatarJeff Johnson <quic_jjohnson@quicinc.com>
      Link: https://lore.kernel.org/r/20240607-md-drivers-isdn-v1-1-81fb7001bc3a@quicinc.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2ebb87f4
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · b1156532
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2024-06-06
      
      We've added 54 non-merge commits during the last 10 day(s) which contain
      a total of 50 files changed, 1887 insertions(+), 527 deletions(-).
      
      The main changes are:
      
      1) Add a user space notification mechanism via epoll when a struct_ops
         object is getting detached/unregistered, from Kui-Feng Lee.
      
      2) Big batch of BPF selftest refactoring for sockmap and BPF congctl
         tests, from Geliang Tang.
      
      3) Add BTF field (type and string fields, right now) iterator support
         to libbpf instead of using existing callback-based approaches,
         from Andrii Nakryiko.
      
      4) Extend BPF selftests for the latter with a new btf_field_iter
         selftest, from Alan Maguire.
      
      5) Add new kfuncs for a generic, open-coded bits iterator,
         from Yafang Shao.
      
      6) Fix BPF selftests' kallsyms_find() helper under kernels configured
         with CONFIG_LTO_CLANG_THIN, from Yonghong Song.
      
      7) Remove a bunch of unused structs in BPF selftests,
         from David Alan Gilbert.
      
      8) Convert test_sockmap section names into names understood by libbpf
         so it can deduce program type and attach type, from Jakub Sitnicki.
      
      9) Extend libbpf with the ability to configure log verbosity
         via LIBBPF_LOG_LEVEL environment variable, from Mykyta Yatsenko.
      
      10) Fix BPF selftests with regards to bpf_cookie and find_vma flakiness
          in nested VMs, from Song Liu.
      
      11) Extend riscv32/64 JITs to introduce shift/add helpers to generate Zba
          optimization, from Xiao Wang.
      
      12) Enable BPF programs to declare arrays and struct fields with kptr,
          bpf_rb_root, and bpf_list_head, from Kui-Feng Lee.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
        selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
        selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
        selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
        selftests/bpf: Add start_test helper in bpf_tcp_ca
        selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
        libbpf: Auto-attach struct_ops BPF maps in BPF skeleton
        selftests/bpf: Add btf_field_iter selftests
        selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT
        libbpf: Remove callback-based type/string BTF field visitor helpers
        bpftool: Use BTF field iterator in btfgen
        libbpf: Make use of BTF field iterator in BTF handling code
        libbpf: Make use of BTF field iterator in BPF linker code
        libbpf: Add BTF field iterator
        selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find()
        selftests/bpf: Fix bpf_cookie and find_vma in nested VM
        selftests/bpf: Test global bpf_list_head arrays.
        selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.
        selftests/bpf: Test kptr arrays and kptrs in nested struct fields.
        bpf: limit the number of levels of a nested struct type.
        bpf: look into the types of the fields of a struct type recursively.
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20240606223146.23020-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b1156532
    • Jakub Kicinski's avatar
      Merge tag 'wireless-next-2024-06-07' of... · 93d4e8bb
      Jakub Kicinski authored
      Merge tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
      
      Kalle Valo says:
      
      ====================
      wireless-next patches for v6.11
      
      The first "new features" pull request for v6.11 with changes both in
      stack and in drivers. Nothing out of ordinary, except that we have
      two conflicts this time:
      
      net/mac80211/cfg.c
        https://lore.kernel.org/all/20240531124415.05b25e7a@canb.auug.org.au
      
      drivers/net/wireless/microchip/wilc1000/netdev.c
        https://lore.kernel.org/all/20240603110023.23572803@canb.auug.org.au
      
      Major changes:
      
      cfg80211/mac80211
       * parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers
      
      wilc1000
       * read MAC address during probe to make it visible to user space
      
      iwlwifi
       * bump FW API to 91 for BZ/SC devices
       * report 64-bit radiotap timestamp
       * enable P2P low latency by default
       * handle Transmit Power Envelope (TPE) advertised by AP
       * start using guard()
      
      rtlwifi
       * RTL8192DU support
      
      ath12k
       * remove unsupported tx monitor handling
       * channel 2 in 6 GHz band support
       * Spatial Multiplexing Power Save (SMPS) in 6 GHz band support
       * multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA)
         support
       * dynamic VLAN support
       * add panic handler for resetting the firmware state
      
      ath10k
       * add qcom,no-msa-ready-indicator Device Tree property
       * LED support for various chipsets
      
      * tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (194 commits)
        wifi: ath12k: add hw_link_id in ath12k_pdev
        wifi: ath12k: add panic handler
        wifi: rtw89: chan: Use swap() in rtw89_swap_sub_entity()
        wifi: brcm80211: remove unused structs
        wifi: brcm80211: use sizeof(*pointer) instead of sizeof(type)
        wifi: ath12k: do not process consecutive RDDM event
        dt-bindings: net: wireless: ath11k: Drop "qcom,ipq8074-wcss-pil" from example
        wifi: ath12k: fix memory leak in ath12k_dp_rx_peer_frag_setup()
        wifi: rtlwifi: handle return value of usb init TX/RX
        wifi: rtlwifi: Enable the new rtl8192du driver
        wifi: rtlwifi: Add rtl8192du/sw.c
        wifi: rtlwifi: Constify rtl_hal_cfg.{ops,usb_interface_cfg} and rtl_priv.cfg
        wifi: rtlwifi: Add rtl8192du/dm.{c,h}
        wifi: rtlwifi: Add rtl8192du/fw.{c,h} and rtl8192du/led.{c,h}
        wifi: rtlwifi: Add rtl8192du/rf.{c,h}
        wifi: rtlwifi: Add rtl8192du/trx.{c,h}
        wifi: rtlwifi: Add rtl8192du/phy.{c,h}
        wifi: rtlwifi: Add rtl8192du/hw.{c,h}
        wifi: rtlwifi: Add new members to struct rtl_priv for RTL8192DU
        wifi: rtlwifi: Add rtl8192du/table.{c,h}
        ...
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      
      ====================
      
      Link: https://lore.kernel.org/r/20240607093517.41394C2BBFC@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      93d4e8bb
  4. 10 Jun, 2024 14 commits
    • David S. Miller's avatar
      Merge branch 'fix-changing-dsa-conduit' · 2ba6d157
      David S. Miller authored
      Marek Behún says:
      
      ====================
      Fix changing DSA conduit
      
      This series fixes an issue in the DSA code related to host interface UC
      address installed into port FDB and port conduit address database when
      live-changing port conduit.
      
      The first patch refactores/deduplicates the installation/uninstallation
      of the interface's MAC address and the second patch fixes the issue.
      
      Cover letter for v1 and v2:
        https://patchwork.kernel.org/project/netdevbpf/cover/20240429163627.16031-1-kabel@kernel.org/
        https://patchwork.kernel.org/project/netdevbpf/cover/20240502122922.28139-1-kabel@kernel.org/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ba6d157
    • Marek Behún's avatar
      net: dsa: update the unicast MAC address when changing conduit · eef8e906
      Marek Behún authored
      When changing DSA user interface conduit while the user interface is up,
      DSA exhibits different behavior in comparison to when the interface is
      down. This different behavior concerns the primary unicast MAC address
      stored in the port standalone FDB and in the conduit device UC database.
      
      If we put a switch port down while changing the conduit with
        ip link set sw0p0 down
        ip link set sw0p0 type dsa conduit conduit1
        ip link set sw0p0 up
      we delete the address in dsa_user_close() and install the (possibly
      different) address in dsa_user_open().
      
      But when changing the conduit on the fly, the old address is not
      deleted and the new one is not installed.
      
      Since we explicitly want to support live-changing the conduit, uninstall
      the old address before calling dsa_port_assign_conduit() and install the
      (possibly different) new address after the call.
      
      Because conduit change might also trigger address change (the user
      interface is supposed to inherit the conduit interface MAC address if no
      address is defined in hardware (dp->mac is a zero address)), move the
      eth_hw_addr_inherit() call from dsa_user_change_conduit() to
      dsa_port_change_conduit(), just before installing the new address.
      
      Although this is in theory a flaw in DSA core, it needs not be
      backported, since there is currently no DSA driver that can be affected
      by this. The only DSA driver that supports changing conduit is felix,
      and, as explained by Vladimir Oltean [1]:
      
        There are 2 reasons why with felix the bug does not manifest itself.
      
        First is because both the 'ocelot' and the alternate 'ocelot-8021q'
        tagging protocols have the 'promisc_on_conduit = true' flag. So the
        unicast address doesn't have to be in the conduit's RX filter -
        neither the old or the new conduit.
      
        Second, dsa_user_host_uc_install() theoretically leaves behind host
        FDB entries installed towards the wrong (old) CPU port. But in
        felix_fdb_add(), we treat any FDB entry requested towards any CPU port
        as if it was a multicast FDB entry programmed towards _all_ CPU ports.
        For that reason, it is installed towards the port mask of the PGID_CPU
        port group ID:
      
      	if (dsa_port_is_cpu(dp))
      		port = PGID_CPU;
      
      Therefore no Fixes tag for this change.
      
      [1] https://lore.kernel.org/netdev/20240507201827.47suw4fwcjrbungy@skbuf/Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Tested-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eef8e906
    • Marek Behún's avatar
      net: dsa: deduplicate code adding / deleting the port address to fdb · 77f75412
      Marek Behún authored
      The sequence
        if (dsa_switch_supports_uc_filtering(ds))
          dsa_port_standalone_host_fdb_add(dp, addr, 0);
        if (!ether_addr_equal(addr, conduit->dev_addr))
          dev_uc_add(conduit, addr);
      is executed both in dsa_user_open() and dsa_user_set_mac_addr().
      
      Its reverse is executed both in dsa_user_close() and
      dsa_user_set_mac_addr().
      
      Refactor these sequences into new functions dsa_user_host_uc_install()
      and dsa_user_host_uc_uninstall().
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77f75412
    • David S. Miller's avatar
      Merge branch 'rtnetlink-rtnl_lock' · 395059c5
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      rtnetlink: move rtnl_lock handling out of af_netlink
      
      With the changes done in commit 5b4b62a1 ("rtnetlink: make
      the "split" NLM_DONE handling generic") we can also move the
      rtnl locking out of af_netlink.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      395059c5
    • Jakub Kicinski's avatar
      net: netlink: remove the cb_mutex "injection" from netlink core · 5fbf57a9
      Jakub Kicinski authored
      Back in 2007, in commit af65bdfc ("[NETLINK]: Switch cb_lock spinlock
      to mutex and allow to override it") netlink core was extended to allow
      subsystems to replace the dump mutex lock with its own lock.
      
      The mechanism was used by rtnetlink to take rtnl_lock but it isn't
      sufficiently flexible for other users. Over the 17 years since
      it was added no other user appeared. Since rtnetlink needs conditional
      locking now, and doesn't use it either, axe this feature complete.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fbf57a9
    • Jakub Kicinski's avatar
      rtnetlink: move rtnl_lock handling out of af_netlink · 5380d64f
      Jakub Kicinski authored
      Now that we have an intermediate layer of code for handling
      rtnl-level netlink dump quirks, we can move the rtnl_lock
      taking there.
      
      For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can
      avoid taking rtnl_lock just to generate NLM_DONE, once again.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5380d64f
    • Andy Shevchenko's avatar
      net: dsa: hellcreek: Replace kernel.h with what is used · c917b26e
      Andy Shevchenko authored
      kernel.h is included solely for some other existing headers.
      Include them directly and get rid of kernel.h.
      
      While at it, sort headers alphabetically for easier maintenance.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c917b26e
    • David S. Miller's avatar
      Merge branch 'tcp-up-pin-tw-timer' · a9522664
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      net: tcp: un-pin tw timer
      
      Changes since previous iteration:
       - Patch 1: update a comment, I copied Erics v7 RvB tag.
       - Patch 2: move bh off/on into hashdance_schedule and get rid of
         comment mentioning pinned tw timer.
         I did not copy Erics RvB tag over from v7 because of the change.
       - Patch 3 is unchanged, so I kept Erics RvB tag.
      
      This is v8 of the series where the tw_timer is un-pinned to get rid of
      interferences in isolated CPUs setups.
      
      First patch makes necessary preparations, existing code relies on
      TIMER_PINNED to avoid races.
      
      Second patch un-pins the TW timer. Could be folded into the first one,
      but it might help wrt. bisection.
      
      Third patch is a minor cleanup to move a helper from .h to the only
      remaining compilation unit.
      
      Tested with iperf3 and stress-ng socket mode.
      ====================
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9522664
    • Florian Westphal's avatar
      tcp: move inet_twsk_schedule helper out of header · f81d0dd2
      Florian Westphal authored
      Its no longer used outside inet_timewait_sock.c, so move it there.
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f81d0dd2
    • Florian Westphal's avatar
      net: tcp: un-pin the tw_timer · c75ad7c7
      Florian Westphal authored
      After previous patch, even if timer fires immediately on another CPU,
      context that schedules the timer now holds the ehash spinlock, so timer
      cannot reap tw socket until ehash lock is released.
      
      BH disable is moved into hashdance_schedule.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c75ad7c7
    • Valentin Schneider's avatar
      net: tcp/dccp: prepare for tw_timer un-pinning · b334b924
      Valentin Schneider authored
      The TCP timewait timer is proving to be problematic for setups where
      scheduler CPU isolation is achieved at runtime via cpusets (as opposed to
      statically via isolcpus=domains).
      
      What happens there is a CPU goes through tcp_time_wait(), arming the
      time_wait timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer
      fires, causing interference for the now-isolated CPU. This is conceptually
      similar to the issue described in commit e02b9312 ("workqueue: Unbind
      kworkers before sending them to exit()")
      
      Move inet_twsk_schedule() to within inet_twsk_hashdance(), with the ehash
      lock held. Expand the lock's critical section from inet_twsk_kill() to
      inet_twsk_deschedule_put(), serializing the scheduling vs descheduling of
      the timer. IOW, this prevents the following race:
      
      			     tcp_time_wait()
      			       inet_twsk_hashdance()
        inet_twsk_deschedule_put()
          del_timer_sync()
      			       inet_twsk_schedule()
      
      Thanks to Paolo Abeni for suggesting to leverage the ehash lock.
      
      This also restores a comment from commit ec94c269 ("tcp/dccp: avoid
      one atomic operation for timewait hashdance") as inet_twsk_hashdance() had
      a "Step 1" and "Step 3" comment, but the "Step 2" had gone missing.
      
      inet_twsk_deschedule_put() now acquires the ehash spinlock to synchronize
      with inet_twsk_hashdance_schedule().
      
      To ease possible regression search, actual un-pin is done in next patch.
      
      Link: https://lore.kernel.org/all/ZPhpfMjSiHVjQkTk@localhost.localdomain/Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Co-developed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b334b924
    • David S. Miller's avatar
      Merge branch 'mlxsw-acl-fixes' · 8d466c8f
      David S. Miller authored
      Petr Machata says:
      
      ====================
      mlxsw: ACL fixes
      
      Ido Schimmel writes:
      
      Patches #1-#3 fix various spelling mistakes I noticed while working on
      the code base.
      
      Patch #4 fixes a general protection fault by bailing out when the error
      occurs and warning.
      
      Patch #5 fixes the warning.
      
      Patch #6 fixes ACL scale regression and firmware errors.
      
      See the commit messages for more info.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d466c8f
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors · 75d8d7a6
      Ido Schimmel authored
      ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and
      newer ASICs can share the same mask if their masks only differ in up to
      8 consecutive bits. For example, consider the following filters:
      
       # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop
       # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop
      
      The second filter can use the same mask as the first (dst_ip/24) with a
      delta of 1 bit.
      
      However, the above only works because the two filters have different
      values in the common unmasked part (dst_ip/24). When entries have the
      same value in the common unmasked part they create undesired collisions
      in the device since many entries now have the same key. This leads to
      firmware errors such as [1] and to a reduced scale.
      
      Fix by adjusting the hash table key to only include the value in the
      common unmasked part. That is, without including the delta bits. That
      way the driver will detect the collision during filter insertion and
      spill the filter into the circuit TCAM (C-TCAM).
      
      Add a test case that fails without the fix and adjust existing cases
      that check C-TCAM spillage according to the above limitation.
      
      [1]
      mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available))
      
      Fixes: c22291f7 ("mlxsw: spectrum: acl: Implement delta for ERP")
      Reported-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75d8d7a6
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_erp: Fix object nesting warning · 97d833ce
      Ido Schimmel authored
      ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
      (A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
      contain more ACLs (i.e., tc filters), but the number of masks in each
      region (i.e., tc chain) is limited.
      
      In order to mitigate the effects of the above limitation, the device
      allows filters to share a single mask if their masks only differ in up
      to 8 consecutive bits. For example, dst_ip/25 can be represented using
      dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
      number of masks being used (and therefore does not support mask
      aggregation), but can contain a limited number of filters.
      
      The driver uses the "objagg" library to perform the mask aggregation by
      passing it objects that consist of the filter's mask and whether the
      filter is to be inserted into the A-TCAM or the C-TCAM since filters in
      different TCAMs cannot share a mask.
      
      The set of created objects is dependent on the insertion order of the
      filters and is not necessarily optimal. Therefore, the driver will
      periodically ask the library to compute a more optimal set ("hints") by
      looking at all the existing objects.
      
      When the library asks the driver whether two objects can be aggregated
      the driver only compares the provided masks and ignores the A-TCAM /
      C-TCAM indication. This is the right thing to do since the goal is to
      move as many filters as possible to the A-TCAM. The driver also forbids
      two identical masks from being aggregated since this can only happen if
      one was intentionally put in the C-TCAM to avoid a conflict in the
      A-TCAM.
      
      The above can result in the following set of hints:
      
      H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
      H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
      
      After getting the hints from the library the driver will start migrating
      filters from one region to another while consulting the computed hints
      and instructing the device to perform a lookup in both regions during
      the transition.
      
      Assuming a filter with mask X is being migrated into the A-TCAM in the
      new region, the hints lookup will return H1. Since H2 is the parent of
      H1, the library will try to find the object associated with it and
      create it if necessary in which case another hints lookup (recursive)
      will be performed. This hints lookup for {mask Y, A-TCAM} will either
      return H2 or H3 since the driver passes the library an object comparison
      function that ignores the A-TCAM / C-TCAM indication.
      
      This can eventually lead to nested objects which are not supported by
      the library [1].
      
      Fix by removing the object comparison function from both the driver and
      the library as the driver was the only user. That way the lookup will
      only return exact matches.
      
      I do not have a reliable reproducer that can reproduce the issue in a
      timely manner, but before the fix the issue would reproduce in several
      minutes and with the fix it does not reproduce in over an hour.
      
      Note that the current usefulness of the hints is limited because they
      include the C-TCAM indication and represent aggregation that cannot
      actually happen. This will be addressed in net-next.
      
      [1]
      WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
      Modules linked in:
      CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
      Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
      Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
      RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
      [...]
      Call Trace:
       <TASK>
       __objagg_obj_get+0x2bb/0x580
       objagg_obj_get+0xe/0x80
       mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
       mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
       mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
       mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
       mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
       process_one_work+0x151/0x370
      
      Fixes: 9069a381 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97d833ce