Commits · d1fabc68f8e0541d41657096dc713cb01775652d · Kirill Smelkov / linux

21 Feb, 2023 29 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d1fabc68

Jakub Kicinski authored Feb 21, 2023

Per-next-PR merge.

net/smc/af_smc.c
b5dd4d69 ("net/smc: llc_conf_mutex refactor, replace it with rw_semaphore")
e40b801b ("net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()")
https://lore.kernel.org/all/20230221124008.6303c330@canb.auug.org.au/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d1fabc68

page_pool: add a comment explaining the fragment counter usage · 4d4266e3

Ilias Apalodimas authored Feb 18, 2023

When reading the page_pool code the first impression is that keeping
two separate counters, one being the page refcnt and the other being
fragment pp_frag_count, is counter-intuitive.

However without that fragment counter we don't know when to reliably
destroy or sync the outstanding DMA mappings. So let's add a comment
explaining this part.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20230217222130.85205-1-ilias.apalodimas@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4d4266e3

net: ethtool: fix __ethtool_dev_mm_supported() implementation · a00da30c

Vladimir Oltean authored Feb 20, 2023

The MAC Merge layer is supported when ops->get_mm() returns 0.
The implementation was changed during review, and in this process, a bug
was introduced.

Link: https://lore.kernel.org/netdev/20230111161706.1465242-5-vladimir.oltean@nxp.com/
Fixes: 04692c90 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Link: https://lore.kernel.org/all/20230220122343.1156614-2-vladimir.oltean@nxp.com/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a00da30c

ethtool: pse-pd: Fix double word in comments · 7ec07774

Bo Liu authored Feb 21, 2023

Remove the repeated word "for" in comments.
Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20230221083036.2414-1-liubo03@inspur.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7ec07774

xsk: add linux/vmalloc.h to xsk.c · 951bce29

Xuan Zhuo authored Feb 21, 2023

Fix the failure of the compilation under the sh4.

Because we introduced remap_vmalloc_range() earlier, this has caused
the compilation failure on the sh4 platform. So this introduction of the
header file of linux/vmalloc.h.

config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20230221/202302210041.kpPQLlNQ-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=9f78bf330a66cd400b3e00f370f597e9fa939207
        git remote add net-next https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
        git fetch --no-tags net-next master
        git checkout 9f78bf33
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh SHELL=/bin/bash net/

Fixes: 9f78bf33 ("xsk: support use vaddr as ring")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302210041.kpPQLlNQ-lkp@intel.com/
Link: https://lore.kernel.org/r/20230221075140.46988-1-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

951bce29

sefltests: netdevsim: wait for devlink instance after netns removal · f922c7b1

Jiri Pirko authored Feb 20, 2023

When devlink instance is put into network namespace and that network
namespace gets deleted, devlink instance is moved back into init_ns.
This is done as a part of cleanup_net() routine. Since cleanup_net()
is called asynchronously from workqueue, there is no guarantee that
the devlink instance move is done after "ip netns del" returns.

So fix this race by making sure that the devlink instance is present
before any other operation.
Reported-by: Amir Tzin <amirtz@nvidia.com>
Fixes: b74c37fd ("selftests: netdevsim: add tests for devlink reload with resources")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230220132336.198597-1-jiri@resnulli.usSigned-off-by: Paolo Abeni <pabeni@redhat.com>

f922c7b1

selftest: fib_tests: Always cleanup before exit · b60417a9

Roxana Nicolescu authored Feb 20, 2023

Usage of `set -e` before executing a command causes immediate exit
on failure, without cleanup up the resources allocated at setup.
This can affect the next tests that use the same resources,
leading to a chain of failures.

A simple fix is to always call cleanup function when the script exists.
This approach is already used by other existing tests.

Fixes: 1056691b ("selftests: fib_tests: Make test results more verbose")
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Link: https://lore.kernel.org/r/20230220110400.26737-2-roxana.nicolescu@canonical.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

b60417a9

net/mlx5e: Align IPsec ASO result memory to be as required by hardware · f2b6cfda

Leon Romanovsky authored Feb 19, 2023

Hardware requires an alignment to 64 bytes to return ASO data. Missing
this alignment caused to unpredictable results while ASO events were
generated.

Fixes: 8518d05b ("net/mlx5e: Create Advanced Steering Operation object for IPsec")
Reported-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/de0302c572b90c9224a72868d4e0d657b6313c4b.1676797613.git.leon@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f2b6cfda

Merge tag 'mlx5-updates-2023-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 05b953a5

Jakub Kicinski authored Feb 20, 2023

Saeed Mahameed says:

====================
mlx5-updates-2023-02-15

1) From Gal Tariq and Parav, Few cleanups for mlx5 driver.

2) From Vlad: Allow offloading of ct 'new' match based on [1]

[1] https://lore.kernel.org/netdev/20230201163100.1001180-1-vladbu@nvidia.com/

* tag 'mlx5-updates-2023-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: RX, Remove doubtful unlikely call
  net/mlx5e: Fix outdated TLS comment
  net/mlx5e: Remove unused function mlx5e_sq_xmit_simple
  net/mlx5e: Allow offloading of ct 'new' match
  net/mlx5e: Implement CT entry update
  net/mlx5: Simplify eq list traversal
  net/mlx5e: Remove redundant page argument in mlx5e_xdp_handle()
  net/mlx5e: Remove redundant page argument in mlx5e_xmit_xdp_buff()
  net/mlx5e: Switch to using napi_build_skb()
====================

Link: https://lore.kernel.org/r/20230218090513.284718-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

05b953a5

Merge branch 'net-sched-cls_api-support-hardware-miss-to-tc-action' · 981f4045

Jakub Kicinski authored Feb 20, 2023

Paul Blakey says:

====================
net/sched: cls_api: Support hardware miss to tc action

This series adds support for hardware miss to instruct tc to continue execution
in a specific tc action instance on a filter's action list. The mlx5 driver patch
(besides the refactors) shows its usage instead of using just chain restore.

Currently a filter's action list must be executed all together or
not at all as driver are only able to tell tc to continue executing from a
specific tc chain, and not a specific filter/action.

This is troublesome with regards to action CT, where new connections should
be sent to software (via tc chain restore), and established connections can
be handled in hardware.

Checking for new connections is done when executing the ct action in hardware
(by checking the packet's tuple against known established tuples).
But if there is a packet modification (pedit) action before action CT and the
checked tuple is a new connection, hardware will need to revert the previous
packet modifications before sending it back to software so it can
re-match the same tc filter in software and re-execute its CT action.

The following is an example configuration of stateless nat
on mlx5 driver that isn't supported before this patchet:

 #Setup corrosponding mlx5 VFs in namespaces
 $ ip netns add ns0
 $ ip netns add ns1
 $ ip link set dev enp8s0f0v0 netns ns0
 $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
 $ ip link set dev enp8s0f0v1 netns ns1
 $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up

 #Setup tc arp and ct rules on mxl5 VF representors
 $ tc qdisc add dev enp8s0f0_0 ingress
 $ tc qdisc add dev enp8s0f0_1 ingress
 $ ifconfig enp8s0f0_0 up
 $ ifconfig enp8s0f0_1 up

 #Original side
 $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
    ct_state -trk ip_proto tcp dst_port 8888 \
      action pedit ex munge tcp dport set 5001 pipe \
      action csum ip tcp pipe \
      action ct pipe \
      action goto chain 1
 $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
    ct_state +trk+est \
      action mirred egress redirect dev enp8s0f0_1
 $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
    ct_state +trk+new \
      action ct commit pipe \
      action mirred egress redirect dev enp8s0f0_1
 $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
      action mirred egress redirect dev enp8s0f0_1

 #Reply side
 $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
      action mirred egress redirect dev enp8s0f0_0
 $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
    ct_state -trk ip_proto tcp \
      action ct pipe \
      action pedit ex munge tcp sport set 8888 pipe \
      action csum ip tcp pipe \
      action mirred egress redirect dev enp8s0f0_0

 #Run traffic
 $ ip netns exec ns1 iperf -s -p 5001&
 $ sleep 2 #wait for iperf to fully open
 $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888

 #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
 $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
        Sent hardware 9310116832 bytes 6149672 pkt
        Sent hardware 9310116832 bytes 6149672 pkt
        Sent hardware 9310116832 bytes 6149672 pkt

A new connection executing the first filter in hardware will first rewrite
the dst port to the new port, and then the ct action is executed,
because this is a new connection, hardware will need to be send this back
to software, on chain 0, to execute the first filter again in software.
The dst port needs to be reverted otherwise it won't re-match the old
dst port in the first filter. Because of that, currently mlx5 driver will
reject offloading the above action ct rule.

This series adds support for hardware partially executing a filter's action list,
and letting tc software continue processing in the specific action instance
where hardware left off (in the above case after the "action pedit ex munge tcp
dport... of the first rule") allowing support for scenarios such as the above.
====================

Link: https://lore.kernel.org/r/20230217223620.28508-1-paulb@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

981f4045

net/mlx5e: TC, Set CT miss to the specific ct action instance · 67027828

Paul Blakey authored Feb 18, 2023

Currently, CT misses restore the missed chain on the tc skb extension so
tc will continue from the relevant chain. Instead, restore the CT action's
miss cookie on the extension, which will instruct tc to continue from the
this specific CT action instance on the relevant filter's action list.

Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use
this miss mapping instead of the current chain miss object (CHAIN_MISS)
for CT action misses.

To restore this new miss mapping value, add a RX restore rule for each
such mapping value.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Sholmo <ozsh@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

67027828

net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG · 235ff07d

Paul Blakey authored Feb 18, 2023

This reg usage is always a mapped object, not necessarily
containing chain info.

Rename to properly convey what it stores.
This patch doesn't change any functionality.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

235ff07d

net/mlx5: Refactor tc miss handling to a single function · 93a1ab2c

Paul Blakey authored Feb 18, 2023

Move tc miss handling code to en_tc.c, and remove
duplicate code.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

93a1ab2c

net/mlx5: Kconfig: Make tc offload depend on tc skb extension · 03a283cd

Paul Blakey authored Feb 18, 2023

Tc skb extension is a basic requirement for using tc
offload to support correct restoration on action miss.

Depend on it.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

03a283cd

net/sched: flower: Support hardware miss to tc action · 606c7c43

Paul Blakey authored Feb 18, 2023

To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

606c7c43

net/sched: flower: Move filter handle initialization earlier · 08a0063d

Paul Blakey authored Feb 18, 2023

To support miss to action during hardware offload the filter's
handle is needed when setting up the actions (tcf_exts_init()),
and before offloading.

Move filter handle initialization earlier.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

08a0063d

net/sched: cls_api: Support hardware miss to tc action · 80cd22c3

Paul Blakey authored Feb 18, 2023

For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.

CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.

Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

80cd22c3

net/sched: Rename user cookie and act cookie · db4b4902

Paul Blakey authored Feb 18, 2023

struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.

Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

db4b4902

Merge tag 'ieee802154-for-net-next-2023-02-20' of... · 871489dd

Jakub Kicinski authored Feb 20, 2023

Merge tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================
pull-request: ieee802154-next 2023-02-20

Miquel Raynal build upon his earlier work and introduced two new
features into the ieee802154 stack. Beaconing to announce existing
PAN's and passive scanning to discover the beacons and associated
PAN's. The matching changes to the userspace configuration tool
have been posted as well and will be released together with the
kernel release.

Arnd Bergmann and Dmitry Torokhov worked on converting the
at86rf230 and cc2520 drivers away from the unused platform_data
usage and towards the new gpiod API. (I had to add a revert as
Dmitry found a regression on an already pushed tree on my side).

Changes since v1 (pull request 2023-02-02)
- Netlink API extack and NLA_POLICY* usage as suggested by Jakub
- Removed always true condition found by kernel test robot
- Simplify device removal with running background job for scanning
- Fix problems with beacon sending in some cases by using the MLME
  tx path

* tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
  ieee802154: Drop device trackers
  mac802154: Fix an always true condition
  mac802154: Send beacons using the MLME Tx path
  ieee802154: Change error code on monitor scan netlink request
  ieee802154: Convert scan error messages to extack
  ieee802154: Use netlink policies when relevant on scan parameters
  ieee802154: at86rf230: switch to using gpiod API
  ieee802154: at86rf230: drop support for platform data
  Revert "at86rf230: convert to gpio descriptors"
  cc2520: move to gpio descriptors
  mac802154: Avoid superfluous endianness handling
  at86rf230: convert to gpio descriptors
  mac802154: Handle basic beaconing
  ieee802154: Add support for user beaconing requests
  mac802154: Handle passive scanning
  mac802154: Add MLME Tx locked helpers
  mac802154: Prepare forcing specific symbol duration
  ieee802154: Introduce a helper to validate a channel
  ieee802154: Define a beacon frame header
  ieee802154: Add support for user scanning requests
====================

Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

871489dd

sfc: fix builds without CONFIG_RTC_LIB · 5f22c3b6

Alejandro Lucero authored Feb 20, 2023

Add an embarrassingly missed semicolon plus and embarrassingly missed
parenthesis breaking kernel building when CONFIG_RTC_LIB is not set
like the one reported with ia64 config.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302170047.EjCPizu3-lkp@intel.com/
Fixes: 14743ddd ("sfc: add devlink info support for ef100")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Link: https://lore.kernel.org/r/20230220110133.29645-1-alejandro.lucero-palau@amd.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5f22c3b6

sfc: clean up some inconsistent indentings · 5feeaba1

Yang Li authored Feb 20, 2023

Fix some indentngs and remove the warning below:
drivers/net/ethernet/sfc/mae.c:657 efx_mae_enumerate_mports() warn: inconsistent indenting
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4117Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230220065958.52941-1-yang.lee@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5feeaba1

net/mlx4_en: Introduce flexible array to silence overflow warning · f8f185e3

Kees Cook authored Feb 18, 2023

The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY
memcpy() warning on ppc64 platform:

In function ‘fortify_memcpy_chk’,
    inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2,
    inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4,
    inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3:
./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with
attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()?
[-Werror=attribute-warning]
  513 |                         __write_overflow_field(p_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Same behaviour on x86 you can get if you use "__always_inline" instead of
"inline" for skb_copy_from_linear_data() in skbuff.h

The call here copies data into inlined tx destricptor, which has 104
bytes (MAX_INLINE) space for data payload. In this case "spc" is known
in compile-time but the destination is used with hidden knowledge
(real structure of destination is different from that the compiler
can see). That cause the fortify warning because compiler can check
bounds, but the real bounds are different.  "spc" can't be bigger than
64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined
tx descriptor. The fact that "inl" points into inlined tx descriptor is
determined earlier in mlx4_en_xmit().

Avoid confusing the compiler with "inl + 1" constructions to get to past
the inl header by introducing a flexible array "data" to the struct so
that the compiler can see that we are not dealing with an array of inl
structs, but rather, arbitrary data following the structure. There are
no changes to the structure layout reported by pahole, and the resulting
machine code is actually smaller.
Reported-by: Josef Oskera <joskera@redhat.com>
Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com
Fixes: f68f2ff9 ("fortify: Detect struct member overflows in memcpy() at compile-time")
Cc: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f8f185e3

net: lan966x: Fix possible deadlock inside PTP · 3a70e0d4

Horatiu Vultur authored Feb 17, 2023

When doing timestamping in lan966x and having PROVE_LOCKING
enabled the following warning is shown.

========================================================
WARNING: possible irq lock inversion dependency detected
6.2.0-rc7-01749-gc54e1f7f7e36 #2786 Tainted: G                 N
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
c2609f50 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x16c/0x2e8
but this lock took another, SOFTIRQ-unsafe lock in the past:
 (&lan966x->ptp_ts_id_lock){+.+.}-{2:2}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lan966x->ptp_ts_id_lock);
                               local_irq_disable();
                               lock(_xmit_ETHER#2);
                               lock(&lan966x->ptp_ts_id_lock);
  <Interrupt>
    lock(_xmit_ETHER#2);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0: c1001e18 ((&ndev->rs_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x33c
 #1: c105e7c4 (rcu_read_lock){....}-{1:2}, at: ndisc_send_skb+0x134/0x81c
 #2: c105e7d8 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x17c/0xc64
 #3: c105e7d8 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x4c/0x1224
 #4: c3056174 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x354/0x1224

the shortest dependencies between 2nd lock and 1st lock:
 -> (&lan966x->ptp_ts_id_lock){+.+.}-{2:2} {
    HARDIRQ-ON-W at:
                      lock_acquire.part.0+0xb0/0x248
                      _raw_spin_lock+0x38/0x48
                      lan966x_ptp_irq_handler+0x164/0x2a8
                      irq_thread_fn+0x1c/0x78
                      irq_thread+0x130/0x278
                      kthread+0xec/0x110
                      ret_from_fork+0x14/0x28
    SOFTIRQ-ON-W at:
                      lock_acquire.part.0+0xb0/0x248
                      _raw_spin_lock+0x38/0x48
                      lan966x_ptp_irq_handler+0x164/0x2a8
                      irq_thread_fn+0x1c/0x78
                      irq_thread+0x130/0x278
                      kthread+0xec/0x110
                      ret_from_fork+0x14/0x28
    INITIAL USE at:
                     lock_acquire.part.0+0xb0/0x248
                     _raw_spin_lock_irqsave+0x4c/0x68
                     lan966x_ptp_txtstamp_request+0x128/0x1cc
                     lan966x_port_xmit+0x224/0x43c
                     dev_hard_start_xmit+0xa8/0x2f0
                     sch_direct_xmit+0x108/0x2e8
                     __dev_queue_xmit+0x41c/0x1224
                     packet_sendmsg+0xdb4/0x134c
                     __sys_sendto+0xd0/0x154
                     sys_send+0x18/0x20
                     ret_fast_syscall+0x0/0x1c
  }
  ... key      at: [<c174ba0c>] __key.2+0x0/0x8
  ... acquired at:
   _raw_spin_lock_irqsave+0x4c/0x68
   lan966x_ptp_txtstamp_request+0x128/0x1cc
   lan966x_port_xmit+0x224/0x43c
   dev_hard_start_xmit+0xa8/0x2f0
   sch_direct_xmit+0x108/0x2e8
   __dev_queue_xmit+0x41c/0x1224
   packet_sendmsg+0xdb4/0x134c
   __sys_sendto+0xd0/0x154
   sys_send+0x18/0x20
   ret_fast_syscall+0x0/0x1c

-> (_xmit_ETHER#2){+.-.}-{2:2} {
   HARDIRQ-ON-W at:
                    lock_acquire.part.0+0xb0/0x248
                    _raw_spin_lock+0x38/0x48
                    netif_freeze_queues+0x38/0x68
                    dev_deactivate_many+0xac/0x388
                    dev_deactivate+0x38/0x6c
                    linkwatch_do_dev+0x70/0x8c
                    __linkwatch_run_queue+0xd4/0x1e8
                    linkwatch_event+0x24/0x34
                    process_one_work+0x284/0x744
                    worker_thread+0x28/0x4bc
                    kthread+0xec/0x110
                    ret_from_fork+0x14/0x28
   IN-SOFTIRQ-W at:
                    lock_acquire.part.0+0xb0/0x248
                    _raw_spin_lock+0x38/0x48
                    sch_direct_xmit+0x16c/0x2e8
                    __dev_queue_xmit+0x41c/0x1224
                    ip6_finish_output2+0x5f4/0xc64
                    ndisc_send_skb+0x4cc/0x81c
                    addrconf_rs_timer+0xb0/0x2f8
                    call_timer_fn+0xb4/0x33c
                    expire_timers+0xb4/0x10c
                    run_timer_softirq+0xf8/0x2a8
                    __do_softirq+0xd4/0x5fc
                    __irq_exit_rcu+0x138/0x17c
                    irq_exit+0x8/0x28
                    __irq_svc+0x90/0xbc
                    arch_cpu_idle+0x30/0x3c
                    default_idle_call+0x44/0xac
                    do_idle+0xc8/0x138
                    cpu_startup_entry+0x18/0x1c
                    rest_init+0xcc/0x168
                    arch_post_acpi_subsys_init+0x0/0x8
   INITIAL USE at:
                   lock_acquire.part.0+0xb0/0x248
                   _raw_spin_lock+0x38/0x48
                   netif_freeze_queues+0x38/0x68
                   dev_deactivate_many+0xac/0x388
                   dev_deactivate+0x38/0x6c
                   linkwatch_do_dev+0x70/0x8c
                   __linkwatch_run_queue+0xd4/0x1e8
                   linkwatch_event+0x24/0x34
                   process_one_work+0x284/0x744
                   worker_thread+0x28/0x4bc
                   kthread+0xec/0x110
                   ret_from_fork+0x14/0x28
 }
 ... key      at: [<c175974c>] netdev_xmit_lock_key+0x8/0x1c8
 ... acquired at:
   __lock_acquire+0x978/0x2978
   lock_acquire.part.0+0xb0/0x248
   _raw_spin_lock+0x38/0x48
   sch_direct_xmit+0x16c/0x2e8
   __dev_queue_xmit+0x41c/0x1224
   ip6_finish_output2+0x5f4/0xc64
   ndisc_send_skb+0x4cc/0x81c
   addrconf_rs_timer+0xb0/0x2f8
   call_timer_fn+0xb4/0x33c
   expire_timers+0xb4/0x10c
   run_timer_softirq+0xf8/0x2a8
   __do_softirq+0xd4/0x5fc
   __irq_exit_rcu+0x138/0x17c
   irq_exit+0x8/0x28
   __irq_svc+0x90/0xbc
   arch_cpu_idle+0x30/0x3c
   default_idle_call+0x44/0xac
   do_idle+0xc8/0x138
   cpu_startup_entry+0x18/0x1c
   rest_init+0xcc/0x168
   arch_post_acpi_subsys_init+0x0/0x8

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G                 N 6.2.0-rc7-01749-gc54e1f7f7e36 #2786
Hardware name: Generic DT based system
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x58/0x70
 dump_stack_lvl from mark_lock.part.0+0x59c/0x93c
 mark_lock.part.0 from __lock_acquire+0x978/0x2978
 __lock_acquire from lock_acquire.part.0+0xb0/0x248
 lock_acquire.part.0 from _raw_spin_lock+0x38/0x48
 _raw_spin_lock from sch_direct_xmit+0x16c/0x2e8
 sch_direct_xmit from __dev_queue_xmit+0x41c/0x1224
 __dev_queue_xmit from ip6_finish_output2+0x5f4/0xc64
 ip6_finish_output2 from ndisc_send_skb+0x4cc/0x81c
 ndisc_send_skb from addrconf_rs_timer+0xb0/0x2f8
 addrconf_rs_timer from call_timer_fn+0xb4/0x33c
 call_timer_fn from expire_timers+0xb4/0x10c
 expire_timers from run_timer_softirq+0xf8/0x2a8
 run_timer_softirq from __do_softirq+0xd4/0x5fc
 __do_softirq from __irq_exit_rcu+0x138/0x17c
 __irq_exit_rcu from irq_exit+0x8/0x28
 irq_exit from __irq_svc+0x90/0xbc
Exception stack(0xc1001f20 to 0xc1001f68)
1f20: ffffffff ffffffff 00000001 c011f840 c100e000 c100e000 c1009314 c1009370
1f40: c10f0c1a c0d5e564 c0f5da8c 00000000 00000000 c1001f70 c010f0bc c010f0c0
1f60: 600f0013 ffffffff
 __irq_svc from arch_cpu_idle+0x30/0x3c
 arch_cpu_idle from default_idle_call+0x44/0xac
 default_idle_call from do_idle+0xc8/0x138
 do_idle from cpu_startup_entry+0x18/0x1c
 cpu_startup_entry from rest_init+0xcc/0x168
 rest_init from arch_post_acpi_subsys_init+0x0/0x8

Fix this by using spin_lock_irqsave/spin_lock_irqrestore also
inside lan966x_ptp_irq_handler.

Fixes: e85a96e4 ("net: lan966x: Add support for ptp interrupts")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230217210917.2649365-1-horatiu.vultur@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3a70e0d4

net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). · be9832c2

Kuniyuki Iwashima authored Feb 17, 2023

Commit 2c02d41d ("net/ulp: prevent ULP without clone op from entering
the LISTEN status") guarantees that all ULP listeners have clone() op, so
we no longer need to test it in inet_clone_ulp().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

be9832c2

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · ee8d72a1

Jakub Kicinski authored Feb 20, 2023

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-02-17

We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).

The main changes are:

1) Add a rbtree data structure following the "next-gen data structure"
   precedent set by recently-added linked-list, that is, by using
   kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.

2) Add a new benchmark for hashmap lookups to BPF selftests,
   from Anton Protopopov.

3) Fix bpf_fib_lookup to only return valid neighbors and add an option
   to skip the neigh table lookup, from Martin KaFai Lau.

4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
   accouting for container environments, from Yafang Shao.

5) Batch of ice multi-buffer and driver performance fixes,
   from Alexander Lobakin.

6) Fix a bug in determining whether global subprog's argument is
   PTR_TO_CTX, which is based on type names which breaks kprobe progs,
   from Andrii Nakryiko.

7) Prep work for future -mcpu=v4 LLVM option which includes usage of
   BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
   from Eduard Zingerman.

8) More prep work for later building selftests with Memory Sanitizer
   in order to detect usages of undefined memory, from Ilya Leoshkevich.

9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
   dereference via sendmsg(), from Maciej Fijalkowski.

10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.

11) Fix BPF memory allocator in combination with BPF hashtab where it could
    corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.

12) Fix LoongArch BPF JIT to always use 4 instructions for function
    address so that instruction sequences don't change between passes,
    from Hengqi Chen.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
  selftests/bpf: Add bpf_fib_lookup test
  bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
  riscv, bpf: Add bpf trampoline support for RV64
  riscv, bpf: Add bpf_arch_text_poke support for RV64
  riscv, bpf: Factor out emit_call for kernel and bpf context
  riscv: Extend patch_text for multiple instructions
  Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
  selftests/bpf: Add global subprog context passing tests
  selftests/bpf: Convert test_global_funcs test to test_loader framework
  bpf: Fix global subprog context argument resolution logic
  LoongArch, bpf: Use 4 instructions for function address in JIT
  bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
  bpf: Disable bh in bpf_test_run for xdp and tc prog
  xsk: check IFF_UP earlier in Tx path
  Fix typos in selftest/bpf files
  selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
  ...
====================

Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ee8d72a1

MAINTAINERS: Add Miquel Raynal as additional maintainer for ieee802154 · 195d6cc9

Stefan Schmidt authored Feb 18, 2023

We are growing the maintainer team for ieee802154 to spread the load for
review and general maintenance. Miquel has been driving the subsystem
forward over the last year and we would like to welcome him as a
maintainer.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230218211317.284889-4-stefan@datenfreihafen.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

195d6cc9

MAINTAINERS: Switch maintenance for mrf24j40 driver over · 6b441772

Stefan Schmidt authored Feb 18, 2023

Alan Ott has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Alan for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-3-stefan@datenfreihafen.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6b441772

MAINTAINERS: Switch maintenance for mcr20a driver over · d1b4b411

Stefan Schmidt authored Feb 18, 2023

Xue Liu has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Xue Liu for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-2-stefan@datenfreihafen.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d1b4b411

MAINTAINERS: Switch maintenance for cc2520 driver over · c551c569

Stefan Schmidt authored Feb 18, 2023

Varka Bhadram has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Varka for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-1-stefan@datenfreihafen.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c551c569

20 Feb, 2023 11 commits

sched/topology: fix KASAN warning in hop_cmp() · 01bb11ad

Yury Norov authored Feb 16, 2023

Despite that prev_hop is used conditionally on cur_hop
is not the first hop, it's initialized unconditionally.

Because initialization implies dereferencing, it might happen
that the code dereferences uninitialized memory, which has been
spotted by KASAN. Fix it by reorganizing hop_cmp() logic.
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Fixes: cd7f5535 ("sched: add sched_numa_find_nth_cpu()")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/Y+7avK6V9SyAWsXi@yury-laptop/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

01bb11ad

net: bcmgenet: Support wake-up from s2idle · 3fcdf2df

Florian Fainelli authored Feb 17, 2023

When we suspend into s2idle we also need to enable the interrupt line
that generates the MPD and HFB interrupts towards the host CPU interrupt
controller (typically the ARM GIC or MIPS L1) to make it exit s2idle.

When we suspend into other modes such as "standby" or "mem" we engage a
power management state machine which will gate off the CPU L1 controller
(priv->irq0) and ungate the side band wake-up interrupt (priv->wol_irq).
It is safe to have both enabled as wake-up sources because they are
mutually exclusive given any suspend mode.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fcdf2df

scm: add user copy checks to put_cmsg() · 5f1eb1ff

Eric Dumazet authored Feb 17, 2023

This is a followup of commit 2558b803 ("net: use a bounce
buffer for copying skb->mark")

x86 and powerpc define user_access_begin, meaning
that they are not able to perform user copy checks
when using user_write_access_begin() / unsafe_copy_to_user()
and friends [1]

Instead of waiting bugs to trigger on other arches,
add a check_object_size() in put_cmsg() to make sure
that new code tested on x86 with CONFIG_HARDENED_USERCOPY=y
will perform more security checks.

[1] We can not generically call check_object_size() from
unsafe_copy_to_user() because UACCESS is enabled at this point.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f1eb1ff

devlink: drop leftover duplicate/unused code · fce10282

Paolo Abeni authored Feb 17, 2023

The recent merge from net left-over some unused code in
leftover.c - nomen omen.

Just drop the unused bits.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fce10282

Merge tag 'linux-can-next-for-6.3-20230217' of... · f6aa90a7

David S. Miller authored Feb 20, 2023

Merge tag 'linux-can-next-for-6.3-20230217' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:

====================
pull-request: can-next 2023-02-17 - fixed

this is a pull request of 4 patches for net-next/master.

The first patch is by Yang Li and converts the ctucanfd driver to
devm_platform_ioremap_resource().

The last 3 patches are by Frank Jungclaus, target the esd_usb driver
and contains preparations for the upcoming support of the esd
CAN-USB/3 hardware.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f6aa90a7

net: lan966x: Use automatic selection of VCAP rule actionset · 4d3e050b

Horatiu Vultur authored Feb 17, 2023

Since commit 81e164c4 ("net: microchip: sparx5: Add automatic
selection of VCAP rule actionset") the VCAP API has the capability to
select automatically the actionset based on the actions that are attached
to the rule. So it is not needed anymore to hardcode the actionset in the
driver, therefore it is OK to remove this.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d3e050b

Merge branch 'default_rps_mask-follow-up' · 38d711aa

David S. Miller authored Feb 20, 2023

Paolo Abeni says:

====================
net: default_rps_mask follow-up

The first patch namespacify the setting. In the common case, once
proper isolation is in place in the main namespace, forwarding
to/from each child netns will allways happen on the desidered CPUs.

Any additional RPS stage inside the child namespace will not provide
additional isolation and could hurt performance badly if picking a
CPU on a remote node.

The 2nd patch adds more self-tests coverage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

38d711aa

self-tests: more rps self tests · 3a7d84ea

Paolo Abeni authored Feb 17, 2023

Explicitly check for child netns and main ns independency
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a7d84ea

net: make default_rps_mask a per netns attribute · 50bcfe8d

Paolo Abeni authored Feb 17, 2023

That really was meant to be a per netns attribute from the beginning.

The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.

To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50bcfe8d

Merge tag 'wireless-next-2023-02-17' of... · e469b626

David S. Miller authored Feb 20, 2023

Merge tag 'wireless-next-2023-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.3

Third set of patches for v6.3. This time only a set of small fixes
submitted during the last day or two.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e469b626

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · 1155a228

David S. Miller authored Feb 20, 2023

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add safeguard to check for NULL tupe in objects updates via
   NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari.

2) Incorrect pointer check in the new destroy rule command,
   from Yang Yingliang.

3) Incorrect status bitcheck in nf_conntrack_udp_packet(),
   from Florian Westphal.

4) Simplify seq_print_acct(), from Ilia Gavrilov.

5) Use 2-arg optimal variant of kfree_rcu() in IPVS,
   from Julian Anastasov.

6) TCP connection enters CLOSE state in conntrack for locally
   originated TCP reset packet from the reject target,
   from Florian Westphal.

The fixes #2 and #3 in this series address issues from the previous pull
nf-next request in this net-next cycle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1155a228