Commits · 3ed247e789114c5bbb3380c3666eb819336b94e5 · Kirill Smelkov / linux

24 Aug, 2023 19 commits

igc: Add support for multiple in-flight TX timestamps · 3ed247e7

Vinicius Costa Gomes authored Jul 28, 2023

Add support for using the four sets of timestamping registers that
i225/i226 have available for TX.

In some workloads, where multiple applications request hardware
transmission timestamps, it was possible that some of those requests
were denied because the only in use register was already occupied.

This is also in preparation to future support for hardware
timestamping with multiple PTP domains. With multiple domains chances
of multiple TX timestamps being requested at the same time increase.

Before:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37
               |          responses            |     TX timestamp offset (ns)
rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
1000       100   0.00%   0.00%   0.00% 100.00%       +1     +41     +73     13
1500       150   0.00%   0.00%   0.00% 100.00%       +9     +49     +87     15
2250       225   0.00%   0.00%   0.00% 100.00%       +9     +42     +79     13
3375       337   0.00%   0.00%   0.00% 100.00%      +11     +46     +81     13
5062       506   0.00%   0.00%   0.00% 100.00%       +7     +44     +80     13
7593       759   0.00%   0.00%   0.00% 100.00%       +9     +44     +79     12
11389     1138   0.00%   0.00%   0.00% 100.00%      +14     +51     +87     13
17083     1708   0.00%   0.00%   0.00% 100.00%       +1     +41     +80     14
25624     2562   0.00%   0.00%   0.00% 100.00%      +11     +50   +5107     51
38436     3843   0.00%   0.00%   0.00% 100.00%       -2     +36   +7843     38
57654     5765   0.00%   0.00%   0.00% 100.00%       +4     +42  +10503     69
86481     8648   0.00%   0.00%   0.00% 100.00%      +11     +54   +5492     65
129721   12972   0.00%   0.00%   0.00% 100.00%      +31   +2680   +6942   2606
194581   16384  16.79%   0.00%   0.87%  82.34%      +73   +4444  +15879   3116
291871   16384  35.05%   0.00%   1.53%  63.42%     +188   +5381  +17019   3035
437806   16384  54.95%   0.00%   2.55%  42.50%     +233   +6302  +13885   2846

After:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37
               |          responses            |     TX timestamp offset (ns)
rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
1000       100   0.00%   0.00%   0.00% 100.00%      -20     +12     +43     13
1500       150   0.00%   0.00%   0.00% 100.00%      -23     +18     +57     14
2250       225   0.00%   0.00%   0.00% 100.00%       -2     +33     +67     13
3375       337   0.00%   0.00%   0.00% 100.00%       +1     +38     +76     13
5062       506   0.00%   0.00%   0.00% 100.00%       +9     +52     +93     14
7593       759   0.00%   0.00%   0.00% 100.00%      +11     +47     +82     13
11389     1138   0.00%   0.00%   0.00% 100.00%       -9     +27     +74     13
17083     1708   0.00%   0.00%   0.00% 100.00%      -13     +25     +66     14
25624     2562   0.00%   0.00%   0.00% 100.00%       -8     +28     +65     13
38436     3843   0.00%   0.00%   0.00% 100.00%      -13     +28     +69     13
57654     5765   0.00%   0.00%   0.00% 100.00%      -11     +32     +71     14
86481     8648   0.00%   0.00%   0.00% 100.00%       +2     +44     +83     14
129721   12972  15.36%   0.00%   0.35%  84.29%       -2   +2248  +22907   4252
194581   16384  42.98%   0.00%   1.98%  55.04%       -4   +5278  +65039   5856
291871   16384  54.33%   0.00%   2.21%  43.46%       -3   +6306  +22608   5665

We can see that with 4 registers, as expected, we are able to handle a
increasing number of requests more consistently, but as soon as all
registers are in use, the decrease in quality of service happens in a
sharp step.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

3ed247e7

docs: netdev: recommend against --in-reply-to · 35b4b6d0

Jakub Kicinski authored Aug 23, 2023

It's somewhat unfortunate but with (my?) the current tooling
if people post new versions of a set in reply to an old version
managing the review queue gets difficult. So recommend against it.
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230823154922.1162644-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

35b4b6d0

net: generalize calculation of skb extensions length · 5d21d0a6

Thomas Weißschuh authored Aug 23, 2023

Remove the necessity to modify skb_ext_total_length() when new extension
types are added.
Also reduces the line count a bit.

With optimizations enabled the function is folded down to the same
constant value as before during compilation.
This has been validated on x86 with GCC 6.5.0 and 13.2.1.
Also a similar construct has been validated on godbolt.org with GCC 5.1.
In any case the compiler has to be able to evaluate the construct at
compile-time for the BUILD_BUG_ON() in skb_extensions_init().

Even if not evaluated at compile-time this function would only ever
be executed once at run-time, so the overhead would be very minuscule.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230823-skb_ext-simplify-v2-1-66e26cd66860@weissschuh.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5d21d0a6

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 57ce6427

Jakub Kicinski authored Aug 24, 2023

Cross-merge networking fixes after downstream PR.

Conflicts:

include/net/inet_sock.h
  f866fbc8 ("ipv4: fix data-races around inet->inet_id")
  c274af22 ("inet: introduce inet->inet_flags")
https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/

Adjacent changes:

drivers/net/bonding/bond_alb.c
  e74216b8 ("bonding: fix macvlan over alb bond support")
  f11e5bd1 ("bonding: support balance-alb with openvswitch")

drivers/net/ethernet/broadcom/bgmac.c
  d6499f0b ("net: bgmac: Return PTR_ERR() for fixed_phy_register()")
  23a14488 ("net: bgmac: Fix return value check for fixed_phy_register()")

drivers/net/ethernet/broadcom/genet/bcmmii.c
  32bbe64a ("net: bcmgenet: Fix return value check for fixed_phy_register()")
  acf50d1a ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()")

net/sctp/socket.c
  f866fbc8 ("ipv4: fix data-races around inet->inet_id")
  b09bde5c ("inet: move inet->mc_loop to inet->inet_frags")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

57ce6427

Merge tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b5cc3833

Linus Torvalds authored Aug 24, 2023

Pull networking fixes from Paolo Abeni:
 "Including fixes from wifi, can and netfilter.

  Fixes to fixes:

   - nf_tables:
       - GC transaction race with abort path
       - defer gc run if previous batch is still pending

  Previous releases - regressions:

   - ipv4: fix data-races around inet->inet_id

   - phy: fix deadlocking in phy_error() invocation

   - mdio: fix C45 read/write protocol

   - ipvlan: fix a reference count leak warning in ipvlan_ns_exit()

   - ice: fix NULL pointer deref during VF reset

   - i40e: fix potential NULL pointer dereferencing of pf->vf in
     i40e_sync_vsi_filters()

   - tg3: use slab_build_skb() when needed

   - mtk_eth_soc: fix NULL pointer on hw reset

  Previous releases - always broken:

   - core: validate veth and vxcan peer ifindexes

   - sched: fix a qdisc modification with ambiguous command request

   - devlink: add missing unregister linecard notification

   - wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning

   - batman:
      - do not get eth header before batadv_check_management_packet
      - fix batadv_v_ogm_aggr_send memory leak

   - bonding: fix macvlan over alb bond support

   - mlxsw: set time stamp fields also when its type is MIRROR_UTC"

* tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
  selftests: bonding: add macvlan over bond testing
  selftest: bond: add new topo bond_topo_2d1c.sh
  bonding: fix macvlan over alb bond support
  rtnetlink: Reject negative ifindexes in RTM_NEWLINK
  netfilter: nf_tables: defer gc run if previous batch is still pending
  netfilter: nf_tables: fix out of memory error handling
  netfilter: nf_tables: use correct lock to protect gc_list
  netfilter: nf_tables: GC transaction race with abort path
  netfilter: nf_tables: flush pending destroy work before netlink notifier
  netfilter: nf_tables: validate all pending tables
  ibmveth: Use dcbf rather than dcbfl
  i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
  net/sched: fix a qdisc modification with ambiguous command request
  igc: Fix the typo in the PTM Control macro
  batman-adv: Hold rtnl lock during MTU update via netlink
  igb: Avoid starting unnecessary workqueues
  can: raw: add missing refcount for memory leak fix
  can: isotp: fix support for transmission of SF without flow control
  bnx2x: new flag for track HW resource allocation
  sfc: allocate a big enough SKB for loopback selftest packet
  ...

b5cc3833

Merge tag 'mlx5-updates-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 9f6708a6

Paolo Abeni authored Aug 24, 2023

Saeed Mahameed says:

====================
mlx5-updates-2023-08-22

1) Patches #1..#13 From Jiri:

The goal of this patchset is to make the SF code cleaner.

Benefit from previously introduced devlink_port struct containerization
to avoid unnecessary lookups in devlink port ops.

Also, benefit from the devlink locking changes and avoid unnecessary
reference counting.

2) Patches #14,#15:

Add ability to configure proto both UDP and TCP selectors in RX and TX
directions.

* tag 'mlx5-updates-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Support IPsec upper TCP protocol selector
  net/mlx5e: Support IPsec upper protocol selector field offload for RX
  net/mlx5: Store vport in struct mlx5_devlink_port and use it in port ops
  net/mlx5: Check vhca_resource_manager capability in each op and add extack msg
  net/mlx5: Relax mlx5_devlink_eswitch_get() return value checking
  net/mlx5: Return -EOPNOTSUPP in mlx5_devlink_port_fn_migratable_set() directly
  net/mlx5: Reduce number of vport lookups passing vport pointer instead of index
  net/mlx5: Embed struct devlink_port into driver structure
  net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in ops
  net/mlx5: Remove no longer used mlx5_esw_offloads_sf_vport_enable/disable()
  net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF code
  net/mlx5: Allow mlx5_esw_offloads_devlink_port_register() to register SFs
  net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister()
  net/mlx5: Push out SF devlink port init and cleanup code to separate helpers
  net/mlx5: Rework devlink port alloc/free into init/cleanup
====================

Link: https://lore.kernel.org/all/20230823051012.162483-1-saeed@kernel.org/Signed-off-by: Paolo Abeni <pabeni@redhat.com>

9f6708a6

Merge tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 8938fc0c

Paolo Abeni authored Aug 24, 2023

Florian Westphal says:

====================
netfilter updates for net

This PR contains nf_tables updates for your *net* tree.

First patch fixes table validation, I broke this in 6.4 when tracking
validation state per table, reported by Pablo, fixup from myself.

Second patch makes sure objects waiting for memory release have been
released, this was broken in 6.1, patch from Pablo Neira Ayuso.

Patch three is a fix-for-fix from previous PR: In case a transaction
gets aborted, gc sequence counter needs to be incremented so pending
gc requests are invalidated, from Pablo.

Same for patch 4: gc list needs to use gc list lock, not destroy lock,
also from Pablo.

Patch 5 fixes a UaF in a set backend, but this should only occur when
failslab is enabled for GFP_KERNEL allocations, broken since feature
was added in 5.6, from myself.

Patch 6 fixes a double-free bug that was also added via previous PR:
We must not schedule gc work if the previous batch is still queued.

netfilter pull request 2023-08-23

* tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: defer gc run if previous batch is still pending
  netfilter: nf_tables: fix out of memory error handling
  netfilter: nf_tables: use correct lock to protect gc_list
  netfilter: nf_tables: GC transaction race with abort path
  netfilter: nf_tables: flush pending destroy work before netlink notifier
  netfilter: nf_tables: validate all pending tables
====================

Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>

8938fc0c

Merge branch 'fix-macvlan-over-alb-bond-support' · b251610c

Paolo Abeni authored Aug 24, 2023

Hangbin Liu says:

====================
fix macvlan over alb bond support

Currently, the macvlan over alb bond is broken after commit
14af9963 ("bonding: Support macvlans on top of tlb/rlb mode bonds").
Fix this and add relate tests.
====================

Link: https://lore.kernel.org/r/20230823071907.3027782-1-liuhangbin@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

b251610c

selftests: bonding: add macvlan over bond testing · 246af950

Hangbin Liu authored Aug 23, 2023

Add a macvlan over bonding test with mode active-backup, balance-tlb
and balance-alb.

]# ./bond_macvlan.sh
TEST: active-backup: IPv4: client->server                           [ OK ]
TEST: active-backup: IPv6: client->server                           [ OK ]
TEST: active-backup: IPv4: client->macvlan_1                        [ OK ]
TEST: active-backup: IPv6: client->macvlan_1                        [ OK ]
TEST: active-backup: IPv4: client->macvlan_2                        [ OK ]
TEST: active-backup: IPv6: client->macvlan_2                        [ OK ]
TEST: active-backup: IPv4: macvlan_1->macvlan_2                     [ OK ]
TEST: active-backup: IPv6: macvlan_1->macvlan_2                     [ OK ]
TEST: active-backup: IPv4: server->client                           [ OK ]
TEST: active-backup: IPv6: server->client                           [ OK ]
TEST: active-backup: IPv4: macvlan_1->client                        [ OK ]
TEST: active-backup: IPv6: macvlan_1->client                        [ OK ]
TEST: active-backup: IPv4: macvlan_2->client                        [ OK ]
TEST: active-backup: IPv6: macvlan_2->client                        [ OK ]
TEST: active-backup: IPv4: macvlan_2->macvlan_2                     [ OK ]
TEST: active-backup: IPv6: macvlan_2->macvlan_2                     [ OK ]
[...]
TEST: balance-alb: IPv4: client->server                             [ OK ]
TEST: balance-alb: IPv6: client->server                             [ OK ]
TEST: balance-alb: IPv4: client->macvlan_1                          [ OK ]
TEST: balance-alb: IPv6: client->macvlan_1                          [ OK ]
TEST: balance-alb: IPv4: client->macvlan_2                          [ OK ]
TEST: balance-alb: IPv6: client->macvlan_2                          [ OK ]
TEST: balance-alb: IPv4: macvlan_1->macvlan_2                       [ OK ]
TEST: balance-alb: IPv6: macvlan_1->macvlan_2                       [ OK ]
TEST: balance-alb: IPv4: server->client                             [ OK ]
TEST: balance-alb: IPv6: server->client                             [ OK ]
TEST: balance-alb: IPv4: macvlan_1->client                          [ OK ]
TEST: balance-alb: IPv6: macvlan_1->client                          [ OK ]
TEST: balance-alb: IPv4: macvlan_2->client                          [ OK ]
TEST: balance-alb: IPv6: macvlan_2->client                          [ OK ]
TEST: balance-alb: IPv4: macvlan_2->macvlan_2                       [ OK ]
TEST: balance-alb: IPv6: macvlan_2->macvlan_2                       [ OK ]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

246af950

selftest: bond: add new topo bond_topo_2d1c.sh · 27aa43f8

Hangbin Liu authored Aug 23, 2023

Add a new testing topo bond_topo_2d1c.sh which is used more commonly.
Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the
extra link.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

27aa43f8

bonding: fix macvlan over alb bond support · e74216b8

Hangbin Liu authored Aug 23, 2023

The commit 14af9963 ("bonding: Support macvlans on top of tlb/rlb mode
bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
the current rlb bond mode only handles ARP packets to update remote neighbor
entries. This causes an issue when a macvlan is on top of the bond, and
remote devices send packets to the macvlan using the bond's MAC address
as the destination. After delivering the packets to the macvlan, the macvlan
will rejects them as the MAC address is incorrect. Consequently, this commit
makes macvlan over bond non-functional.

To address this problem, one potential solution is to check for the presence
of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
and return NULL in the rlb_arp_xmit() function. However, this approach
doesn't fully resolve the situation when a VLAN exists between the bond and
macvlan.

So let's just do a partial revert for commit 14af9963 in rlb_arp_xmit().
As the comment said, Don't modify or load balance ARPs that do not originate
locally.

Fixes: 14af9963 ("bonding: Support macvlans on top of tlb/rlb mode bonds")
Reported-by: susan.zheng@veritas.com
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

e74216b8

rtnetlink: Reject negative ifindexes in RTM_NEWLINK · 30188bd7

Ido Schimmel authored Aug 23, 2023

Negative ifindexes are illegal, but the kernel does not validate the
ifindex in the ancillary header of RTM_NEWLINK messages, resulting in
the kernel generating a warning [1] when such an ifindex is specified.

Fix by rejecting negative ifindexes.

[1]
WARNING: CPU: 0 PID: 5031 at net/core/dev.c:9593 dev_index_reserve+0x1a2/0x1c0 net/core/dev.c:9593
[...]
Call Trace:
 <TASK>
 register_netdevice+0x69a/0x1490 net/core/dev.c:10081
 br_dev_newlink+0x27/0x110 net/bridge/br_netlink.c:1552
 rtnl_newlink_create net/core/rtnetlink.c:3471 [inline]
 __rtnl_newlink+0x115e/0x18c0 net/core/rtnetlink.c:3688
 rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3701
 rtnetlink_rcv_msg+0x439/0xd30 net/core/rtnetlink.c:6427
 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545
 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
 netlink_unicast+0x536/0x810 net/netlink/af_netlink.c:1368
 netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910
 sock_sendmsg_nosec net/socket.c:728 [inline]
 sock_sendmsg+0xd9/0x180 net/socket.c:751
 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2538
 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2592
 __sys_sendmsg+0x117/0x1e0 net/socket.c:2621
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 38f7b870 ("[RTNETLINK]: Link creation API")
Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230823064348.2252280-1-idosch@nvidia.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

30188bd7

Merge branch 'net-ethernet-mtk_eth_soc-improve-support-for-mt7988' · 23c167af

Jakub Kicinski authored Aug 23, 2023

Daniel Golle says:

====================
net: ethernet: mtk_eth_soc: improve support for MT7988

This series fixes and completes commit 445eb644 ("net: ethernet:
mtk_eth_soc: add basic support for MT7988 SoC") and also adds support
for using the in-SoC SRAM to previous MT7986 and MT7981 SoCs.
====================

Link: https://lore.kernel.org/r/cover.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

23c167af

net: ethernet: mtk_eth_soc: support 36-bit DMA addressing on MT7988 · 2d75891e

Daniel Golle authored Aug 22, 2023

Systems having 4 GiB of RAM and more require DMA addressing beyond the
current 32-bit limit. Starting from MT7988 the hardware now supports
36-bit DMA addressing, let's use that new capability in the driver to
avoid running into swiotlb on systems with 4 GiB of RAM or more.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/95b919c98876c9e49761e44662e7c937479eecb8.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2d75891e

net: ethernet: mtk_eth_soc: add support for in-SoC SRAM · ebb1e4f9

Daniel Golle authored Aug 22, 2023

MT7981, MT7986 and MT7988 come with in-SoC SRAM dedicated for Ethernet
DMA rings. Support using the SRAM without breaking existing device tree
bindings, ie. only new SoC starting from MT7988 will have the SRAM
declared as additional resource in device tree. For MT7981 and MT7986
an offset on top of the main I/O base is used.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/e45e0f230c63ad58869e8fe35b95a2fb8925b625.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ebb1e4f9

net: ethernet: mtk_eth_soc: add reset bits for MT7988 · 88c1e6ef

Daniel Golle authored Aug 22, 2023

Add bits needed to reset the frame engine on MT7988.

Fixes: 445eb644 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/89b6c38380e7a3800c1362aa7575600717bc7543.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

88c1e6ef

net: ethernet: mtk_eth_soc: fix register definitions for MT7988 · cfb5677d

Daniel Golle authored Aug 22, 2023

More register macros need to be adjusted for the 3rd GMAC on MT7988.
Account for added bit in SYSCFG0_SGMII_MASK.

Fixes: 445eb644 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1c8da012e2ca80939906d85f314138c552139f0f.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cfb5677d

net: fec: add exception tracing for XDP · e83fabb7

Wei Fang authored Aug 22, 2023

As we already added the exception tracing for XDP_TX, I think it is
necessary to add the exception tracing for other XDP actions, such
as XDP_REDIRECT, XDP_ABORTED and unknown error actions.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230822065255.606739-1-wei.fang@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e83fabb7

net: dm9051: Use PTR_ERR_OR_ZERO() to simplify code · 664c84c2

Yu Liao authored Aug 22, 2023

Use the standard error pointer macro to shorten the code and simplify.
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Link: https://lore.kernel.org/r/20230822021455.205101-2-liaoyu15@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

664c84c2

23 Aug, 2023 21 commits

Merge tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 93f5de5f

Linus Torvalds authored Aug 23, 2023

Pull ACPI fix from Rafael Wysocki:
 "Make an existing ACPI IRQ override quirk for PCSpecialist Elimina Pro
  16 M work as intended (Hans de Goede)"

* tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M

93f5de5f

Merge tag 'platform-drivers-x86-v6.5-5' of... · a5e505a9

Linus Torvalds authored Aug 23, 2023

Merge tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "Final set of three small fixes for 6.5"

* tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications
  platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
  platform/x86: lenovo-ymc: Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table

a5e505a9

platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications · 0848cab7

Shih-Yi Chen authored Aug 21, 2023

rshim console does not show all entries of dmesg.

Fixed by setting MLXBF_TM_TX_LWM_IRQ for every CONSOLE notification.
Signed-off-by: Shih-Yi Chen <shihyic@nvidia.com>
Reviewed-by: Liming Sung <limings@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20230821150627.26075-1-shihyic@nvidia.comReviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

0848cab7

netfilter: nf_tables: defer gc run if previous batch is still pending · 8e51830e

Florian Westphal authored Aug 22, 2023

Don't queue more gc work, else we may queue the same elements multiple
times.

If an element is flagged as dead, this can mean that either the previous
gc request was invalidated/discarded by a transaction or that the previous
request is still pending in the system work queue.

The latter will happen if the gc interval is set to a very low value,
e.g. 1ms, and system work queue is backlogged.

The sets refcount is 1 if no previous gc requeusts are queued, so add
a helper for this and skip gc run if old requests are pending.

Add a helper for this and skip the gc run in this case.

Fixes: f6c383b8 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

8e51830e

netfilter: nf_tables: fix out of memory error handling · 5e1be4cd

Florian Westphal authored Aug 22, 2023

Several instances of pipapo_resize() don't propagate allocation failures,
this causes a crash when fault injection is enabled for gfp_kernel slabs.

Fixes: 3c4287f6 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

5e1be4cd

netfilter: nf_tables: use correct lock to protect gc_list · 8357bc94

Pablo Neira Ayuso authored Aug 21, 2023

Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to
protect the gc list.

Fixes: 5f68718b ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

8357bc94

netfilter: nf_tables: GC transaction race with abort path · 72034434

Pablo Neira Ayuso authored Aug 18, 2023

Abort path is missing a synchronization point with GC transactions. Add
GC sequence number hence any GC transaction losing race will be
discarded.

Fixes: 5f68718b ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

72034434

netfilter: nf_tables: flush pending destroy work before netlink notifier · 2c9f0293

Pablo Neira Ayuso authored Aug 18, 2023

Destroy work waits for the RCU grace period then it releases the objects
with no mutex held. All releases objects follow this path for
transactions, therefore, order is guaranteed and references to top-level
objects in the hierarchy remain valid.

However, netlink notifier might interfer with pending destroy work.
rcu_barrier() is not correct because objects are not release via RCU
callback. Flush destroy work before releasing objects from netlink
notifier path.

Fixes: d4bc8271 ("netfilter: nf_tables: netlink notifier might race to release objects")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

2c9f0293

netfilter: nf_tables: validate all pending tables · 4b80ced9

Florian Westphal authored Aug 17, 2023

We have to validate all tables in the transaction that are in
VALIDATE_DO state, the blamed commit below did not move the break
statement to its right location so we only validate one table.

Moreover, we can't init table->validate to _SKIP when a table object
is allocated.

If we do, then if a transcaction creates a new table and then
fails the transaction, nfnetlink will loop and nft will hang until
user cancels the command.

Add back the pernet state as a place to stash the last state encountered.
This is either _DO (we hit an error during commit validation) or _SKIP
(transaction passed all checks).

Fixes: 00c320f9 ("netfilter: nf_tables: make validation state per table")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

4b80ced9

ibmveth: Use dcbf rather than dcbfl · bfedba3b

Michael Ellerman authored Aug 23, 2023

When building for power4, newer binutils don't recognise the "dcbfl"
extended mnemonic.

dcbfl RA, RB is equivalent to dcbf RA, RB, 1.

Switch to "dcbf" to avoid the build error.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

bfedba3b

i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters() · 9525a3c3

Andrii Staikov authored Aug 22, 2023

Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.

Fixes: c87c938f ("i40e: Add VF VLAN pruning")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9525a3c3

bnxt: use the NAPI skb allocation cache · e3b3a879

Jakub Kicinski authored Aug 22, 2023

All callers of build_skb() (*) in bnxt are in NAPI context.
The budget checking is somewhat convoluted because in the shared
completion queue cases Rx packets are discarded by netpoll
by forcing an error (E). But that happens before skb allocation.
Only a call chain starting at __bnxt_poll_work() can lead to
an skb allocation and it checks budget (b).

* bnxt_rx_multi_page_skb
* bnxt_rx_skb
  ` bp->rx_skb_func
  * bnxt_tpa_end
    ` bnxt_rx_pkt
      E bnxt_force_rx_discard
      E bnxt_poll_nitroa0
      b __bnxt_poll_work

Use napi_build_skb() to take advantage of the skb cache.
In iperf tests with HW-GRO enabled it barely makes a difference
but in cases where HW-GRO is not as effective (or disabled) it
can give even a >10% boost (20.7Gbps -> 23.1Gbps).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3b3a879

net/sched: fix a qdisc modification with ambiguous command request · da71714e

Jamal Hadi Salim authored Aug 22, 2023

When replacing an existing root qdisc, with one that is of the same kind, the
request boils down to essentially a parameterization change  i.e not one that
requires allocation and grafting of a new qdisc. syzbot was able to create a
scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
create and graft scenario.
The fix ensures that only when the qdisc kinds are different that we should
allow a create and graft, otherwise it goes into the "change" codepath.

While at it, fix the code and comments to improve readability.

While syzbot was able to create the issue, it did not zone on the root cause.
Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.

v1->V2 changes:
- remove "inline" function definition (Vladmir)
- remove extrenous braces in branches (Vladmir)
- change inline function names (Pedro)
- Run tdc tests (Victor)
v2->v3 changes:
- dont break else/if (Simon)

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

da71714e

net: dsa: rzn1-a5psw: remove redundant logs · 2e0c8ee2

Alexis Lothoré authored Aug 22, 2023

Remove debug logs in port vlan management, since there are already multiple
tracepoints defined for those operations in DSA
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e0c8ee2

net: Avoid address overwrite in kernel_connect · 0bdf3993

Jordan Rife authored Aug 21, 2023

BPF programs that run on connect can rewrite the connect address. For
the connect system call this isn't a problem, because a copy of the address
is made when it is moved into kernel space. However, kernel_connect
simply passes through the address it is given, so the caller may observe
its address value unexpectedly change.

A practical example where this is problematic is where NFS is combined
with a system such as Cilium which implements BPF-based load balancing.
A common pattern in software-defined storage systems is to have an NFS
mount that connects to a persistent virtual IP which in turn maps to an
ephemeral server IP. This is usually done to achieve high availability:
if your server goes down you can quickly spin up a replacement and remap
the virtual IP to that endpoint. With BPF-based load balancing, mounts
will forget the virtual IP address when the address rewrite occurs
because a pointer to the only copy of that address is passed down the
stack. Server failover then breaks, because clients have forgotten the
virtual IP address. Reconnects fail and mounts remain broken. This patch
was tested by setting up a scenario like this and ensuring that NFS
reconnects worked after applying the patch.
Signed-off-by: Jordan Rife <jrife@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0bdf3993

virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting · dae64749

Feng Liu authored Aug 21, 2023

The virtio_net driver currently deals with different versions and types
of virtio net headers, such as virtio_net_hdr_mrg_rxbuf,
virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies
on multiple type casts to convert memory between different structures,
potentially leading to bugs when there are changes in these structures.

Introduces the "struct skb_vnet_common_hdr" as a unifying header
structure using a union. With this approach, various virtio net header
structures can be converted by accessing different members of this
structure, thus eliminating the need for type casting and reducing the
risk of potential bugs.

For example following code:
static struct sk_buff *page_to_skb(struct virtnet_info *vi,
		struct receive_queue *rq,
		struct page *page, unsigned int offset,
		unsigned int len, unsigned int truesize,
		unsigned int headroom)
{
[...]
	struct virtio_net_hdr_mrg_rxbuf *hdr;
[...]
	hdr_len = vi->hdr_len;
[...]
ok:
	hdr = skb_vnet_hdr(skb);
	memcpy(hdr, hdr_p, hdr_len);
[...]
}

When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20
But the sizeof(*hdr) is 12,
memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr,
which make a potential risk of bug. And this risk can be avoided by
introducing struct skb_vnet_common_hdr.

Change log
v1->v2
feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com>
feedback from Simon Horman <horms@kernel.org>
1. change to use net-next tree.
2. move skb_vnet_common_hdr inside kernel file instead of the UAPI header.

v2->v3
feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com>
1. fix typo in commit message.
2. add original struct virtio_net_hdr into union
3. remove virtio_net_hdr_mrg_rxbuf variable in receive_buf;
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dae64749

dp83640: Use list_for_each_entry() helper · 45f9cb6b

Jinjie Ruan authored Aug 21, 2023

Convert list_for_each() to list_for_each_entry() where applicable.

No functional changed.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45f9cb6b

Merge branch 'mlx4-aux-bus' · 5c42b66d

David S. Miller authored Aug 23, 2023

Petr Pavlu says:

====================
Convert mlx4 to use auxiliary bus

This series converts the mlx4 drivers to use auxiliary bus, similarly to
how mlx5 was converted [1]. The first 6 patches are preparatory changes,
the remaining 4 are the final conversion.

Initial motivation for this change was to address a problem related to
loading mlx4_en/mlx4_ib by mlx4_core using request_module_nowait(). When
doing such a load in initrd, the operation is asynchronous to any init
control and can get unexpectedly affected/interrupted by an eventual
root switch. Using an auxiliary bus leaves these module loads to udevd
which better integrates with systemd processing. [2]

General benefit is to get rid of custom interface logic and instead use
a common facility available for this task. An obvious risk is that some
new bug is introduced by the conversion.

Leon Romanovsky was kind enough to check for me that the series passes
their verification tests.

Changes since v2 [3]:
* Use 'void *' as the event param of mlx4_dispatch_event().

Changes since v1 [4]:
* Fix a missing definition of the err variable in mlx4_en_add().
* Remove not needed comments about the event type in mlx4_en_event()
  and mlx4_ib_event().

[1] https://lore.kernel.org/netdev/20201101201542.2027568-1-leon@kernel.org/
[2] https://lore.kernel.org/netdev/0a361ac2-c6bd-2b18-4841-b1b991f0635e@suse.com/
[3] https://lore.kernel.org/netdev/20230813145127.10653-1-petr.pavlu@suse.com/
[4] https://lore.kernel.org/netdev/20230804150527.6117-1-petr.pavlu@suse.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5c42b66d

mlx4: Delete custom device management logic · c138cdb8

Petr Pavlu authored Aug 21, 2023

After the conversion to use the auxiliary bus, the custom device
management is not needed anymore and can be deleted.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c138cdb8

mlx4: Connect the infiniband part to the auxiliary bus · 7d22b1cb

Petr Pavlu authored Aug 21, 2023

Use the auxiliary bus to perform device management of the infiniband
part of the mlx4 driver.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d22b1cb

mlx4: Connect the ethernet part to the auxiliary bus · eb93ae49

Petr Pavlu authored Aug 21, 2023

Use the auxiliary bus to perform device management of the ethernet part
of the mlx4 driver.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb93ae49