1. 21 Feb, 2023 16 commits
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-updates-2023-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 05b953a5
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2023-02-15
      
      1) From Gal Tariq and Parav, Few cleanups for mlx5 driver.
      
      2) From Vlad: Allow offloading of ct 'new' match based on [1]
      
      [1] https://lore.kernel.org/netdev/20230201163100.1001180-1-vladbu@nvidia.com/
      
      * tag 'mlx5-updates-2023-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5e: RX, Remove doubtful unlikely call
        net/mlx5e: Fix outdated TLS comment
        net/mlx5e: Remove unused function mlx5e_sq_xmit_simple
        net/mlx5e: Allow offloading of ct 'new' match
        net/mlx5e: Implement CT entry update
        net/mlx5: Simplify eq list traversal
        net/mlx5e: Remove redundant page argument in mlx5e_xdp_handle()
        net/mlx5e: Remove redundant page argument in mlx5e_xmit_xdp_buff()
        net/mlx5e: Switch to using napi_build_skb()
      ====================
      
      Link: https://lore.kernel.org/r/20230218090513.284718-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      05b953a5
    • Jakub Kicinski's avatar
      Merge branch 'net-sched-cls_api-support-hardware-miss-to-tc-action' · 981f4045
      Jakub Kicinski authored
      Paul Blakey says:
      
      ====================
      net/sched: cls_api: Support hardware miss to tc action
      
      This series adds support for hardware miss to instruct tc to continue execution
      in a specific tc action instance on a filter's action list. The mlx5 driver patch
      (besides the refactors) shows its usage instead of using just chain restore.
      
      Currently a filter's action list must be executed all together or
      not at all as driver are only able to tell tc to continue executing from a
      specific tc chain, and not a specific filter/action.
      
      This is troublesome with regards to action CT, where new connections should
      be sent to software (via tc chain restore), and established connections can
      be handled in hardware.
      
      Checking for new connections is done when executing the ct action in hardware
      (by checking the packet's tuple against known established tuples).
      But if there is a packet modification (pedit) action before action CT and the
      checked tuple is a new connection, hardware will need to revert the previous
      packet modifications before sending it back to software so it can
      re-match the same tc filter in software and re-execute its CT action.
      
      The following is an example configuration of stateless nat
      on mlx5 driver that isn't supported before this patchet:
      
       #Setup corrosponding mlx5 VFs in namespaces
       $ ip netns add ns0
       $ ip netns add ns1
       $ ip link set dev enp8s0f0v0 netns ns0
       $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
       $ ip link set dev enp8s0f0v1 netns ns1
       $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up
      
       #Setup tc arp and ct rules on mxl5 VF representors
       $ tc qdisc add dev enp8s0f0_0 ingress
       $ tc qdisc add dev enp8s0f0_1 ingress
       $ ifconfig enp8s0f0_0 up
       $ ifconfig enp8s0f0_1 up
      
       #Original side
       $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
          ct_state -trk ip_proto tcp dst_port 8888 \
            action pedit ex munge tcp dport set 5001 pipe \
            action csum ip tcp pipe \
            action ct pipe \
            action goto chain 1
       $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
          ct_state +trk+est \
            action mirred egress redirect dev enp8s0f0_1
       $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
          ct_state +trk+new \
            action ct commit pipe \
            action mirred egress redirect dev enp8s0f0_1
       $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
            action mirred egress redirect dev enp8s0f0_1
      
       #Reply side
       $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
            action mirred egress redirect dev enp8s0f0_0
       $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
          ct_state -trk ip_proto tcp \
            action ct pipe \
            action pedit ex munge tcp sport set 8888 pipe \
            action csum ip tcp pipe \
            action mirred egress redirect dev enp8s0f0_0
      
       #Run traffic
       $ ip netns exec ns1 iperf -s -p 5001&
       $ sleep 2 #wait for iperf to fully open
       $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888
      
       #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
       $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
              Sent hardware 9310116832 bytes 6149672 pkt
              Sent hardware 9310116832 bytes 6149672 pkt
              Sent hardware 9310116832 bytes 6149672 pkt
      
      A new connection executing the first filter in hardware will first rewrite
      the dst port to the new port, and then the ct action is executed,
      because this is a new connection, hardware will need to be send this back
      to software, on chain 0, to execute the first filter again in software.
      The dst port needs to be reverted otherwise it won't re-match the old
      dst port in the first filter. Because of that, currently mlx5 driver will
      reject offloading the above action ct rule.
      
      This series adds support for hardware partially executing a filter's action list,
      and letting tc software continue processing in the specific action instance
      where hardware left off (in the above case after the "action pedit ex munge tcp
      dport... of the first rule") allowing support for scenarios such as the above.
      ====================
      
      Link: https://lore.kernel.org/r/20230217223620.28508-1-paulb@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      981f4045
    • Paul Blakey's avatar
      net/mlx5e: TC, Set CT miss to the specific ct action instance · 67027828
      Paul Blakey authored
      Currently, CT misses restore the missed chain on the tc skb extension so
      tc will continue from the relevant chain. Instead, restore the CT action's
      miss cookie on the extension, which will instruct tc to continue from the
      this specific CT action instance on the relevant filter's action list.
      
      Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use
      this miss mapping instead of the current chain miss object (CHAIN_MISS)
      for CT action misses.
      
      To restore this new miss mapping value, add a RX restore rule for each
      such mapping value.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarOz Sholmo <ozsh@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      67027828
    • Paul Blakey's avatar
      net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG · 235ff07d
      Paul Blakey authored
      This reg usage is always a mapped object, not necessarily
      containing chain info.
      
      Rename to properly convey what it stores.
      This patch doesn't change any functionality.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      235ff07d
    • Paul Blakey's avatar
      net/mlx5: Refactor tc miss handling to a single function · 93a1ab2c
      Paul Blakey authored
      Move tc miss handling code to en_tc.c, and remove
      duplicate code.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      93a1ab2c
    • Paul Blakey's avatar
      net/mlx5: Kconfig: Make tc offload depend on tc skb extension · 03a283cd
      Paul Blakey authored
      Tc skb extension is a basic requirement for using tc
      offload to support correct restoration on action miss.
      
      Depend on it.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      03a283cd
    • Paul Blakey's avatar
      net/sched: flower: Support hardware miss to tc action · 606c7c43
      Paul Blakey authored
      To support hardware miss to tc action in actions on the flower
      classifier, implement the required getting of filter actions,
      and setup filter exts (actions) miss by giving it the filter's
      handle and actions.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      606c7c43
    • Paul Blakey's avatar
      net/sched: flower: Move filter handle initialization earlier · 08a0063d
      Paul Blakey authored
      To support miss to action during hardware offload the filter's
      handle is needed when setting up the actions (tcf_exts_init()),
      and before offloading.
      
      Move filter handle initialization earlier.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08a0063d
    • Paul Blakey's avatar
      net/sched: cls_api: Support hardware miss to tc action · 80cd22c3
      Paul Blakey authored
      For drivers to support partial offload of a filter's action list,
      add support for action miss to specify an action instance to
      continue from in sw.
      
      CT action in particular can't be fully offloaded, as new connections
      need to be handled in software. This imposes other limitations on
      the actions that can be offloaded together with the CT action, such
      as packet modifications.
      
      Assign each action on a filter's action list a unique miss_cookie
      which drivers can then use to fill action_miss part of the tc skb
      extension. On getting back this miss_cookie, find the action
      instance with relevant cookie and continue classifying from there.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      80cd22c3
    • Paul Blakey's avatar
      net/sched: Rename user cookie and act cookie · db4b4902
      Paul Blakey authored
      struct tc_action->act_cookie is a user defined cookie,
      and the related struct flow_action_entry->act_cookie is
      used as an handle similar to struct flow_cls_offload->cookie.
      
      Rename tc_action->act_cookie to user_cookie, and
      flow_action_entry->act_cookie to cookie so their names
      would better fit their usage.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db4b4902
    • Jakub Kicinski's avatar
      Merge tag 'ieee802154-for-net-next-2023-02-20' of... · 871489dd
      Jakub Kicinski authored
      Merge tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154-next 2023-02-20
      
      Miquel Raynal build upon his earlier work and introduced two new
      features into the ieee802154 stack. Beaconing to announce existing
      PAN's and passive scanning to discover the beacons and associated
      PAN's. The matching changes to the userspace configuration tool
      have been posted as well and will be released together with the
      kernel release.
      
      Arnd Bergmann and Dmitry Torokhov worked on converting the
      at86rf230 and cc2520 drivers away from the unused platform_data
      usage and towards the new gpiod API. (I had to add a revert as
      Dmitry found a regression on an already pushed tree on my side).
      
      Changes since v1 (pull request 2023-02-02)
      - Netlink API extack and NLA_POLICY* usage as suggested by Jakub
      - Removed always true condition found by kernel test robot
      - Simplify device removal with running background job for scanning
      - Fix problems with beacon sending in some cases by using the MLME
        tx path
      
      * tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
        ieee802154: Drop device trackers
        mac802154: Fix an always true condition
        mac802154: Send beacons using the MLME Tx path
        ieee802154: Change error code on monitor scan netlink request
        ieee802154: Convert scan error messages to extack
        ieee802154: Use netlink policies when relevant on scan parameters
        ieee802154: at86rf230: switch to using gpiod API
        ieee802154: at86rf230: drop support for platform data
        Revert "at86rf230: convert to gpio descriptors"
        cc2520: move to gpio descriptors
        mac802154: Avoid superfluous endianness handling
        at86rf230: convert to gpio descriptors
        mac802154: Handle basic beaconing
        ieee802154: Add support for user beaconing requests
        mac802154: Handle passive scanning
        mac802154: Add MLME Tx locked helpers
        mac802154: Prepare forcing specific symbol duration
        ieee802154: Introduce a helper to validate a channel
        ieee802154: Define a beacon frame header
        ieee802154: Add support for user scanning requests
      ====================
      
      Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      871489dd
    • Alejandro Lucero's avatar
      sfc: fix builds without CONFIG_RTC_LIB · 5f22c3b6
      Alejandro Lucero authored
      Add an embarrassingly missed semicolon plus and embarrassingly missed
      parenthesis breaking kernel building when CONFIG_RTC_LIB is not set
      like the one reported with ia64 config.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202302170047.EjCPizu3-lkp@intel.com/
      Fixes: 14743ddd ("sfc: add devlink info support for ef100")
      Signed-off-by: default avatarAlejandro Lucero <alejandro.lucero-palau@amd.com>
      Link: https://lore.kernel.org/r/20230220110133.29645-1-alejandro.lucero-palau@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5f22c3b6
    • Yang Li's avatar
      sfc: clean up some inconsistent indentings · 5feeaba1
      Yang Li authored
      Fix some indentngs and remove the warning below:
      drivers/net/ethernet/sfc/mae.c:657 efx_mae_enumerate_mports() warn: inconsistent indenting
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4117Signed-off-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Link: https://lore.kernel.org/r/20230220065958.52941-1-yang.lee@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5feeaba1
    • Kees Cook's avatar
      net/mlx4_en: Introduce flexible array to silence overflow warning · f8f185e3
      Kees Cook authored
      The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY
      memcpy() warning on ppc64 platform:
      
      In function ‘fortify_memcpy_chk’,
          inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2,
          inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4,
          inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3:
      ./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with
      attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()?
      [-Werror=attribute-warning]
        513 |                         __write_overflow_field(p_size_field, size);
            |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Same behaviour on x86 you can get if you use "__always_inline" instead of
      "inline" for skb_copy_from_linear_data() in skbuff.h
      
      The call here copies data into inlined tx destricptor, which has 104
      bytes (MAX_INLINE) space for data payload. In this case "spc" is known
      in compile-time but the destination is used with hidden knowledge
      (real structure of destination is different from that the compiler
      can see). That cause the fortify warning because compiler can check
      bounds, but the real bounds are different.  "spc" can't be bigger than
      64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined
      tx descriptor. The fact that "inl" points into inlined tx descriptor is
      determined earlier in mlx4_en_xmit().
      
      Avoid confusing the compiler with "inl + 1" constructions to get to past
      the inl header by introducing a flexible array "data" to the struct so
      that the compiler can see that we are not dealing with an array of inl
      structs, but rather, arbitrary data following the structure. There are
      no changes to the structure layout reported by pahole, and the resulting
      machine code is actually smaller.
      Reported-by: default avatarJosef Oskera <joskera@redhat.com>
      Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com
      Fixes: f68f2ff9 ("fortify: Detect struct member overflows in memcpy() at compile-time")
      Cc: Yishai Hadas <yishaih@nvidia.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f8f185e3
    • Kuniyuki Iwashima's avatar
      net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). · be9832c2
      Kuniyuki Iwashima authored
      Commit 2c02d41d ("net/ulp: prevent ULP without clone op from entering
      the LISTEN status") guarantees that all ULP listeners have clone() op, so
      we no longer need to test it in inet_clone_ulp().
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be9832c2
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · ee8d72a1
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2023-02-17
      
      We've added 64 non-merge commits during the last 7 day(s) which contain
      a total of 158 files changed, 4190 insertions(+), 988 deletions(-).
      
      The main changes are:
      
      1) Add a rbtree data structure following the "next-gen data structure"
         precedent set by recently-added linked-list, that is, by using
         kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.
      
      2) Add a new benchmark for hashmap lookups to BPF selftests,
         from Anton Protopopov.
      
      3) Fix bpf_fib_lookup to only return valid neighbors and add an option
         to skip the neigh table lookup, from Martin KaFai Lau.
      
      4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
         accouting for container environments, from Yafang Shao.
      
      5) Batch of ice multi-buffer and driver performance fixes,
         from Alexander Lobakin.
      
      6) Fix a bug in determining whether global subprog's argument is
         PTR_TO_CTX, which is based on type names which breaks kprobe progs,
         from Andrii Nakryiko.
      
      7) Prep work for future -mcpu=v4 LLVM option which includes usage of
         BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
         from Eduard Zingerman.
      
      8) More prep work for later building selftests with Memory Sanitizer
         in order to detect usages of undefined memory, from Ilya Leoshkevich.
      
      9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
         dereference via sendmsg(), from Maciej Fijalkowski.
      
      10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.
      
      11) Fix BPF memory allocator in combination with BPF hashtab where it could
          corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.
      
      12) Fix LoongArch BPF JIT to always use 4 instructions for function
          address so that instruction sequences don't change between passes,
          from Hengqi Chen.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
        selftests/bpf: Add bpf_fib_lookup test
        bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
        riscv, bpf: Add bpf trampoline support for RV64
        riscv, bpf: Add bpf_arch_text_poke support for RV64
        riscv, bpf: Factor out emit_call for kernel and bpf context
        riscv: Extend patch_text for multiple instructions
        Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
        selftests/bpf: Add global subprog context passing tests
        selftests/bpf: Convert test_global_funcs test to test_loader framework
        bpf: Fix global subprog context argument resolution logic
        LoongArch, bpf: Use 4 instructions for function address in JIT
        bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
        bpf: Disable bh in bpf_test_run for xdp and tc prog
        xsk: check IFF_UP earlier in Tx path
        Fix typos in selftest/bpf files
        selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
        samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
        bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
        libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
        libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ee8d72a1
  2. 20 Feb, 2023 24 commits