Commits · f771314d6b75181de7079c3c7d666293e4ed2b22 · Kirill Smelkov / linux

10 Jul, 2024 23 commits

idpf: compile singleq code only under default-n CONFIG_IDPF_SINGLEQ · f771314d

Alexander Lobakin authored Jun 20, 2024

Currently, all HW supporting idpf supports the singleq model, but none
of it advertises it by default, as splitq is supported and preferred
for multiple reasons. Still, this almost dead code often times adds
hotpath branches and redundant cacheline accesses.
While it can't currently be removed, add CONFIG_IDPF_SINGLEQ and build
the singleq code only when it's enabled manually. This corresponds to
-10 Kb of object code size and a good bunch of hotpath checks.
idpf_is_queue_model_split() works as a gate and compiles out to `true`
when the config option is disabled.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

f771314d

idpf: merge singleq and splitq &net_device_ops · 14f662b4

Alexander Lobakin authored Jun 20, 2024

It makes no sense to have a second &net_device_ops struct (800 bytes of
rodata) with only one difference in .ndo_start_xmit, which can easily
be just one `if`. This `if` is a drop in the ocean and you won't see
any difference.
Define unified idpf_xmit_start(). The preparation for sending is the
same, just call either idpf_tx_splitq_frame() or idpf_tx_singleq_frame()
depending on the active model to actually map and send the skb.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

14f662b4

idpf: strictly assert cachelines of queue and queue vector structures · 5a816aae

Alexander Lobakin authored Jun 20, 2024

Now that the queue and queue vector structures are separated and laid
out optimally, group the fields as read-mostly, read-write, and cold
cachelines and add size assertions to make sure new features won't push
something out of its place and provoke perf regression.
Despite looking innocent, this gives up to 2% of perf bump on Rx.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

5a816aae

idpf: avoid bloating &idpf_q_vector with big %NR_CPUS · bf9bf704

Alexander Lobakin authored Jun 20, 2024

With CONFIG_MAXSMP, sizeof(cpumask_t) is 1 Kb. The queue vector
structure has them embedded, which means 1 additional Kb of not
really hotpath data.
We have cpumask_var_t, which is either an embedded cpumask or a pointer
for allocating it dynamically when it's big. Use it instead of plain
cpumasks and put &idpf_q_vector on a good diet.
Also remove redundant pointer to the interrupt name from the structure.
request_irq() saves it and free_irq() returns it on deinit, so that you
can free the memory.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

bf9bf704

idpf: split &idpf_queue into 4 strictly-typed queue structures · e4891e46

Alexander Lobakin authored Jun 20, 2024

Currently, sizeof(struct idpf_queue) is 32 Kb.
This is due to the 12-bit hashtable declaration at the end of the queue.
This HT is needed only for Tx queues when the flow scheduling mode is
enabled. But &idpf_queue is unified for all of the queue types,
provoking excessive memory usage.
The unified structure in general makes the code less effective via
suboptimal fields placement. You can't avoid that unless you make unions
each 2 fields. Even then, different field alignment etc., doesn't allow
you to optimize things to the limit.
Split &idpf_queue into 4 structures corresponding to the queue types:
RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion
queue). Place only needed fields there and shortcuts handy for hotpath.
Allocate the abovementioned hashtable dynamically and only when needed,
keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT
is used only for OOO completions, which aren't really hotpath anyway.
Note that this change must be done atomically, otherwise it's really
easy to get lost and miss something.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e4891e46

idpf: stop using macros for accessing queue descriptors · 66c27e3b

Alexander Lobakin authored Jun 20, 2024

In C, we have structures and unions.
Casting `void *` via macros is not only error-prone, but also looks
confusing and awful in general.
In preparation for splitting the queue structs, replace it with a
union and direct array dereferences.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

66c27e3b

libeth: add cacheline / struct layout assertion helpers · 62c88425

Alexander Lobakin authored Jun 20, 2024

Add helpers to assert struct field layout, a bit more crazy and
networking-specific than in <linux/cache.h>. They assume you have
3 CL-aligned groups (read-mostly, read-write, cold) in a struct
you want to assert, and nothing besides them.
For 64-bit with 64-byte cachelines, the assertions are as strict
as possible, as the size can then be easily predicted.
For the rest, make sure they don't cross the specified bound.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

62c88425

page_pool: use __cacheline_group_{begin, end}_aligned() · 39daa09d

Alexander Lobakin authored Jun 20, 2024

Instead of doing __cacheline_group_begin() __aligned(), use the new
__cacheline_group_{begin,end}_aligned(), so that it will take care
of the group alignment itself.
Also replace open-coded `4 * sizeof(long)` in two places with
a definition.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

39daa09d

cache: add __cacheline_group_{begin, end}_aligned() (+ couple more) · 2cb13dec

Alexander Lobakin authored Jun 20, 2024

__cacheline_group_begin(), unfortunately, doesn't align the group
anyhow. If it is wanted, then you need to do something like

__cacheline_group_begin(grp) __aligned(ALIGN)

which isn't really convenient nor compact.
Add the _aligned() counterparts to align the groups automatically to
either the specified alignment (optional) or ``SMP_CACHE_BYTES``.
Note that the actual struct layout will then be (on x64 with 64-byte CL):

struct x {
	u32 y;				// offset 0, size 4, padding 56
	__cacheline_group_begin__grp;	// offset 64, size 0
	u32 z;				// offset 64, size 4, padding 4
	__cacheline_group_end__grp;	// offset 72, size 0
	__cacheline_group_pad__grp;	// offset 72, size 0, padding 56
	u32 w;				// offset 128
};

The end marker is aligned to long, so that you can assert the struct
size more strictly, but the offset of the next field in the structure
will be aligned to the group alignment, so that the next field won't
fall into the group it's not intended to.

Add __LARGEST_ALIGN definition and LARGEST_ALIGN() macro.
__LARGEST_ALIGN is the value to which the compilers align fields when
__aligned_largest is specified. Sometimes, it might be needed to get
this value outside of variable definitions. LARGEST_ALIGN() is macro
which just aligns a value to __LARGEST_ALIGN.
Also add SMP_CACHE_ALIGN(), similar to L1_CACHE_ALIGN(), but using
``SMP_CACHE_BYTES`` instead of ``L1_CACHE_BYTES`` as the former
also accounts L2, needed in some cases.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

2cb13dec

Merge branch 'aquantia-phy-aqr115c' into main · ce2f84eb

David S. Miller authored Jul 10, 2024

Bartosz Golaszewski says:

====================
net: phy: aquantia: enable support for aqr115c

This series addesses two issues with the aqr115c PHY on Qualcomm
sa8775p-ride-r3 board and adds support for this PHY to the aquantia driver.

While the manufacturer calls the 2.5G PHY mode OCSGMII, we reuse the existing
2500BASEX mode in the kernel to avoid extending the uAPI.

It took me a while to resend because I noticed an issue with the PHY coming
out of suspend with no possible interfaces listed and tracked it to the
GLOBAL_CFG registers for different modes returning 0. A workaround has been
added to the series. Unfortunately the HPG doesn't mention a proper way of
doing it or even mention any such issue at all.

Changes since v2:
- add a patch that addresses an issue with GLOBAL_CFG registers returning 0
- reuse aqr113c_config_init() for aqr115c
- improve commit messages, give more details on the 2500BASEX mode reuse
Link to v2: https://lore.kernel.org/lkml/Zn4Nq1QvhjAUaogb@makrotopia.org/T/

Changes since v1:
- split out the PHY patches into their own series
- don't introduce new mode (OCSGMII) but use existing 2500BASEX instead
- split the wait-for-FW patch into two: one renaming and exporting the
  relevant function and the second using it before checking the FW ID
Link to v1: https://lore.kernel.org/linux-arm-kernel/20240619184550.34524-1-brgl@bgdev.pl/T/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ce2f84eb

net: phy: aquantia: add support for aqr115c · 0ebc581f

Bartosz Golaszewski authored Jul 08, 2024

Add support for a new model to the Aquantia driver. This PHY supports
2.5 gigabit speeds. The PHY mode is referred to by the manufacturer as
Overclocked SGMII (OCSGMII) but this actually is just 2500BASEX without
in-band signalling so reuse the existing mode to avoid changing the
uAPI.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ebc581f

net: phy: aquantia: wait for the GLOBAL_CFG to start returning real values · 708405f3

Bartosz Golaszewski authored Jul 08, 2024

When the PHY is first coming up (or resuming from suspend), it's
possible that although the FW status shows as running, we still see
zeroes in the GLOBAL_CFG set of registers and cannot determine available
modes. Since all models support 10M, add a poll and wait the config to
become available.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

708405f3

net: phy: aquantia: wait for FW reset before checking the vendor ID · ad649a1f

Bartosz Golaszewski authored Jul 08, 2024

Checking the firmware register before it complete the boot process makes
no sense, it will report 0 even if FW is available from internal memory.
Always wait for FW to boot before continuing or we'll unnecessarily try
to load it from nvmem/filesystem and fail.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad649a1f

net: phy: aquantia: rename and export aqr107_wait_reset_complete() · 66311732

Bartosz Golaszewski authored Jul 08, 2024

This function is quite generic in this driver and not limited to aqr107.
We will use it outside its current compilation unit soon so rename it
and declare it in the header.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

66311732

netxen_nic: Use {low,upp}er_32_bits() helpers · 40ab9e0d

Geert Uytterhoeven authored Jul 08, 2024

Use the existing {low,upp}er_32_bits() helpers instead of defining
custom variants.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/319d4a5313ac75f7bbbb6b230b6802b18075c3e0.1720430602.git.geert+renesas@glider.beSigned-off-by: Jakub Kicinski <kuba@kernel.org>

40ab9e0d

Merge branch 'mlx5-misc-patches-2023-07-08' · fe3e9489

Jakub Kicinski authored Jul 09, 2024

Tariq Toukan says:

====================
mlx5 misc patches 2023-07-08

This patchset contains features and small enhancements from the team
to the mlx5 core and Eth drivers.
====================

Link: https://patch.msgid.link/20240708080025.1593555-1-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fe3e9489

net/mlx5e: CT: Initialize err to 0 to avoid warning · f1ac0b7d

Cosmin Ratiu authored Jul 08, 2024

It is theoretically possible to return bogus uninitialized values from
mlx5_tc_ct_entry_replace_rules, even though in practice this will never
be the case as the flow rule will be part of at least the regular ct
table or the ct nat table, if not both.

But to reduce noise, initialize err to 0.

Fixes: 49d37d05 ("net/mlx5: CT: Separate CT and CT-NAT tuple entries")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20240708080025.1593555-11-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f1ac0b7d

net/mlx5e: SHAMPO, Add missing aggregate counter · 7204730b

Dragos Tatulea authored Jul 08, 2024

When the rx_hds_nodata_packets/bytes counters were added, the aggregate
counters were omitted. This patch adds them.

Fixes: e95c5b9e ("net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20240708080025.1593555-10-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7204730b

net/mlx5: DR, Remove definer functions from SW Steering API · e829a331

Yevgeny Kliteynik authored Jul 08, 2024

No need to expose definer get/put functions as part of
SW Steering API - they are internal functions.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20240708080025.1593555-9-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e829a331

Merge branch 'mlxsw-improvements' · 8ce2dddb

Jakub Kicinski authored Jul 09, 2024

Petr Machata says:

====================
mlxsw: Improvements

This patchset contains assortments of improvements to the mlxsw driver.
Please see individual patches for details.
====================

Link: https://patch.msgid.link/cover.1720447210.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8ce2dddb

mlxsw: pci: Lock configuration space of upstream bridge during reset · 0970836c

Ido Schimmel authored Jul 08, 2024

The driver triggers a "Secondary Bus Reset" (SBR) by calling
__pci_reset_function_locked() which asserts the SBR bit in the "Bridge
Control Register" in the configuration space of the upstream bridge for
2ms. This is done without locking the configuration space of the
upstream bridge port, allowing user space to access it concurrently.

Linux 6.11 will start warning about such unlocked resets [1][2]:

pcieport 0000:00:01.0: unlocked secondary bus reset via: pci_reset_bus_function+0x51c/0x6a0

Avoid the warning and the concurrent access by locking the configuration
space of the upstream bridge prior to the reset and unlocking it
afterwards.

[1] https://lore.kernel.org/all/171711746953.1628941.4692125082286867825.stgit@dwillia2-xfh.jf.intel.com/
[2] https://lore.kernel.org/all/20240531213150.GA610983@bhelgaas/Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/9937b0afdb50f2f2825945393c94c093c04a5897.1720447210.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0970836c

mlxsw: core_thermal: Report valid current state during cooling device registration · a22f3bc8

Ido Schimmel authored Jul 08, 2024

Commit 31a0fa00 ("thermal/debugfs: Pass cooling device state to
thermal_debug_cdev_add()") changed the thermal core to read the current
state of the cooling device as part of the cooling device's
registration. This is incompatible with the current implementation of
the cooling device operations in mlxsw, leading to initialization
failure with errors such as:

mlxsw_spectrum 0000:01:00.0: Failed to register cooling device
mlxsw_spectrum 0000:01:00.0: cannot register bus device

The reason for the failure is that when the get current state operation
is invoked the driver tries to derive the index of the cooling device by
walking a per thermal zone array and looking for the matching cooling
device pointer. However, the pointer is returned from the registration
function and therefore only set in the array after the registration.

The issue was later fixed by commit 1af89ded ("thermal: core: Do not
fail cdev registration because of invalid initial state") by not failing
the registration of the cooling device if it cannot report a valid
current state during registration, although drivers are responsible for
ensuring that this will not happen.

Therefore, make sure the driver is able to report a valid current state
for the cooling device during registration by passing to the
registration function a per cooling device private data that already has
the cooling device index populated.

While at it, call thermal_cooling_device_unregister() unconditionally
since the function returns immediately if the cooling device pointer is
NULL.
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/c823c4678b6b7afb902c35b3551c81a053afd110.1720447210.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a22f3bc8

mlxsw: Warn about invalid accesses to array fields · b45c76e5

Petr Machata authored Jul 08, 2024

A forgotten or buggy variable initialization can cause out-of-bounds access
to a register or other item array field. For an overflow, such access would
mangle adjacent parts of the register payload. For an underflow, due to all
variables being unsigned, the access would likely trample unrelated memory.
Since neither is correct, replace these accesses with accesses at the index
of 0, and warn about the issue.
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/b988fb265c2f6c1206fe12d5bfdcfa188b7672d1.1720447210.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b45c76e5

09 Jul, 2024 17 commits

Merge branch 'selftests-drv-net-rss_ctx-more-tests' · 746d684e

Jakub Kicinski authored Jul 09, 2024

Jakub Kicinski says:

====================
selftests: drv-net: rss_ctx: more tests

Add a few more tests for RSS.

v1: https://lore.kernel.org/all/20240705015725.680275-1-kuba@kernel.org/
====================

Link: https://patch.msgid.link/20240708213627.226025-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

746d684e

selftests: drv-net: rss_ctx: test flow rehashing without impacting traffic · 933048fe

Jakub Kicinski authored Jul 08, 2024

Some workloads may want to rehash the flows in response to an imbalance.
Most effective way to do that is changing the RSS key. Check that changing
the key does not cause link flaps or traffic disruption.

Disrupting traffic for key update is not incorrect, but makes the key
update unusable for rehashing under load.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-6-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

933048fe

selftests: drv-net: rss_ctx: check behavior of indirection table resizing · 7e3e5b0b

Jakub Kicinski authored Jul 08, 2024

Some devices dynamically increase and decrease the size of the RSS
indirection table based on the number of enabled queues.
When that happens driver must maintain the balance of entries
(preferably duplicating the smaller table).
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-5-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7e3e5b0b

selftests: drv-net: rss_ctx: test queue changes vs user RSS config · e2c9703d

Jakub Kicinski authored Jul 08, 2024

By default main RSS table should change to include all queues.
When user sets a specific RSS config the driver should preserve it,
even when queue count changes. Driver should refuse to deactivate
queues used in the user-set RSS config.

For additional contexts driver should still refuse to deactivate
queues in use. Whether the contexts should get resized like
context 0 when queue count increases is a bit unclear. I anticipate
most drivers today don't do that. Since main use case for additional
contexts is to set the indir table - it doesn't seem worthwhile to
care about behavior of the default table too much. Don't test that.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-4-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e2c9703d

selftests: drv-net: rss_ctx: factor out send traffic and check · 847aa551

Jakub Kicinski authored Jul 08, 2024

Wrap up sending traffic and checking in which queues it landed
in a helper.

The method used for testing is to send a lot of iperf traffic
and check which queues received the most packets. Those should
be the queues where we expect iperf to land - either because we
installed a filter for the port iperf uses, or we didn't and
expect it to use context 0.

Contexts get disjoint queue sets, but the main context (AKA context 0)
may receive some background traffic (noise).
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-3-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

847aa551

selftests: drv-net: rss_ctx: fix cleanup in the basic test · a0aab7d7

Jakub Kicinski authored Jul 08, 2024

The basic test may fail without resetting the RSS indir table.
Use the .exec() method to run cleanup early since we re-test
with traffic that returning to default state works.
While at it reformat the doc a tiny bit.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a0aab7d7

net: ti: icssg-prueth: add missing deps · d6947113

Guillaume La Roque authored Jul 08, 2024

Add missing dependency on NET_SWITCHDEV.

Fixes: abd5576b ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guillaume La Roque <glaroque@baylibre.com>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20240708-net-deps-v2-1-b22fb74da2a3@baylibre.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d6947113

dt-bindings: net: fsl,fman: add ptimer-handle property · 5618ced0

Frank Li authored Jul 08, 2024

Add ptimer-handle property to link to ptp-timer node handle.
Fix below warning:
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: fman@1a00000: 'ptimer-handle' do not match any of the regexes: '^ethernet@[a-f0-9]+$', '^mdio@[a-f0-9]+$', '^muram@[a-f0-9]+$', '^phc@[a-f0-9]+$', '^port@[a-f0-9]+$', 'pinctrl-[0-9]+'
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20240708180949.1898495-2-Frank.Li@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5618ced0

dt-bindings: net: fsl,fman: allow dma-coherent property · dd84d831

Frank Li authored Jul 08, 2024

Add dma-coherent property to fix below warning.
arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: fman@1a00000: 'dma-coherent', 'ptimer-handle' do not match any of the regexes: '^ethernet@[a-f0-9]+$', '^mdio@[a-f0-9]+$', '^muram@[a-f0-9]+$', '^phc@[a-f0-9]+$', '^port@[a-f0-9]+$', 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/net/fsl,fman.yaml#Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20240708180949.1898495-1-Frank.Li@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dd84d831

net: tls: Pass union tls_crypto_context pointer to memzero_explicit · 0d9e699d

Simon Horman authored Jul 08, 2024

Pass union tls_crypto_context pointer, rather than struct
tls_crypto_info pointer, to memzero_explicit().

The address of the pointer is the same before and after.
But the new construct means that the size of the dereferenced pointer type
matches the size being zeroed. Which aids static analysis.

As reported by Smatch:

  .../tls_main.c:842 do_tls_setsockopt_conf() error: memzero_explicit() 'crypto_info' too small (4 vs 56)

No functional change intended.
Compile tested only.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240708-tls-memzero-v2-1-9694eaf31b79@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0d9e699d

selftests: forwarding: Make vxlan-bridge-1d pass on debug kernels · 3699e57a

Ido Schimmel authored Jul 07, 2024

The ageing time used by the test is too short for debug kernels and
results in entries being aged out prematurely [1].

Fix by increasing the ageing time.

The same change was done for the VLAN-aware version of the test in
commit dfbab740 ("selftests: forwarding: Make vxlan-bridge-1q pass
on debug kernels").

[1]
 # ./vxlan_bridge_1d.sh
 [...]
 # TEST: VXLAN: flood before learning                              [ OK ]
 # TEST: VXLAN: show learned FDB entry                             [ OK ]
 # TEST: VXLAN: learned FDB entry                                  [FAIL]
 # veth3: Expected to capture 0 packets, got 4.
 # RTNETLINK answers: No such file or directory
 # TEST: VXLAN: deletion of learned FDB entry                      [ OK ]
 # TEST: VXLAN: Ageing of learned FDB entry                        [FAIL]
 # veth3: Expected to capture 0 packets, got 2.
 [...]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240707095458.2870260-1-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3699e57a

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 7b769adc

Paolo Abeni authored Jul 09, 2024

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-07-08

The following pull-request contains BPF updates for your *net-next* tree.

We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).

The main changes are:

1) Support resilient split BTF which cuts down on duplication and makes BTF
   as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman.

2) Add support for dumping kfunc prototypes from BTF which enables both detecting
   as well as dumping compilable prototypes for kfuncs, from Daniel Xu.

3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
   support for BPF exceptions, from Ilya Leoshkevich.

4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
   for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.

5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
   for skbs and add coverage along with it, from Vadim Fedorenko.

6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
   a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.

7) Extend the BPF verifier to track the delta between linked registers in order
   to better deal with recent LLVM code optimizations, from Alexei Starovoitov.

8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
   have been a pointer to the map value, from Benjamin Tissoires.

9) Extend BPF selftests to add regular expression support for test output matching
   and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.

10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
    iterates exactly once anyway, from Dan Carpenter.

11) Add the capability to offload the netfilter flowtable in XDP layer through
    kfuncs, from Florian Westphal & Lorenzo Bianconi.

12) Various cleanups in networking helpers in BPF selftests to shave off a few
    lines of open-coded functions on client/server handling, from Geliang Tang.

13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so
    that x86 JIT does not need to implement detection, from Leon Hwang.

14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
    out-of-bounds memory access for dynpointers, from Matt Bobrowski.

15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
    it might lead to problems on 32-bit archs, from Jiri Olsa.

16) Enhance traffic validation and dynamic batch size support in xsk selftests,
    from Tushar Vyavahare.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
  selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
  bpf: helpers: fix bpf_wq_set_callback_impl signature
  libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
  selftests/bpf: Remove exceptions tests from DENYLIST.s390x
  s390/bpf: Implement exceptions
  s390/bpf: Change seen_reg to a mask
  bpf: Remove unnecessary loop in task_file_seq_get_next()
  riscv, bpf: Optimize stack usage of trampoline
  bpf, devmap: Add .map_alloc_check
  selftests/bpf: Remove arena tests from DENYLIST.s390x
  selftests/bpf: Add UAF tests for arena atomics
  selftests/bpf: Introduce __arena_global
  s390/bpf: Support arena atomics
  s390/bpf: Enable arena
  s390/bpf: Support address space cast instruction
  s390/bpf: Support BPF_PROBE_MEM32
  s390/bpf: Land on the next JITed instruction after exception
  s390/bpf: Introduce pre- and post- probe functions
  s390/bpf: Get rid of get_probe_mem_regno()
  ...
====================

Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.netSigned-off-by: Paolo Abeni <pabeni@redhat.com>

7b769adc

net: phy: microchip: lan937x: add support for 100BaseTX PHY · 870a1dbc

Oleksij Rempel authored Jul 06, 2024

Add support of 100BaseTX PHY build in to LAN9371 and LAN9372 switches.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://patch.msgid.link/20240706154201.1456098-1-o.rempel@pengutronix.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>

870a1dbc

udp: Remove duplicate included header file trace/events/udp.h · 0787ab20

Thorsten Blum authored Jul 06, 2024

Remove duplicate included header file trace/events/udp.h and the
following warning reported by make includecheck:

  trace/events/udp.h is included more than once

Compile-tested only.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240706071132.274352-2-thorsten.blum@toblux.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

0787ab20

net: tn40xx: add per queue netdev-genl stats support · 6c2a4c2f

FUJITA Tomonori authored Jul 06, 2024

Add support for the netdev-genl per queue stats API.

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope":"queue"}'
[{'ifindex': 4,
  'queue-id': 0,
  'queue-type': 'rx',
  'rx-alloc-fail': 0,
  'rx-bytes': 266613,
  'rx-packets': 3325},
 {'ifindex': 4,
  'queue-id': 0,
  'queue-type': 'tx',
  'tx-bytes': 142823367,
  'tx-packets': 2387}]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240706064324.137574-1-fujita.tomonori@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

6c2a4c2f

sctp: Fix typos and improve comments · 417d8818

Thorsten Blum authored Jul 04, 2024

Fix typos s/steam/stream/ and spell out Schedule/Unschedule in the
comments.

Compile-tested only.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240704202558.62704-2-thorsten.blum@toblux.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

417d8818

l2tp: fix possible UAF when cleaning up tunnels · f8ad00f3

James Chapman authored Jul 04, 2024

syzbot reported a UAF caused by a race when the L2TP work queue closes a
tunnel at the same time as a userspace thread closes a session in that
tunnel.

Tunnel cleanup is handled by a work queue which iterates through the
sessions contained within a tunnel, and closes them in turn.

Meanwhile, a userspace thread may arbitrarily close a session via
either netlink command or by closing the pppox socket in the case of
l2tp_ppp.

The race condition may occur when l2tp_tunnel_closeall walks the list
of sessions in the tunnel and deletes each one.  Currently this is
implemented using list_for_each_safe, but because the list spinlock is
dropped in the loop body it's possible for other threads to manipulate
the list during list_for_each_safe's list walk.  This can lead to the
list iterator being corrupted, leading to list_for_each_safe spinning.
One sequence of events which may lead to this is as follows:

 * A tunnel is created, containing two sessions A and B.
 * A thread closes the tunnel, triggering tunnel cleanup via the work
   queue.
 * l2tp_tunnel_closeall runs in the context of the work queue.  It
   removes session A from the tunnel session list, then drops the list
   lock.  At this point the list_for_each_safe temporary variable is
   pointing to the other session on the list, which is session B, and
   the list can be manipulated by other threads since the list lock has
   been released.
 * Userspace closes session B, which removes the session from its parent
   tunnel via l2tp_session_delete.  Since l2tp_tunnel_closeall has
   released the tunnel list lock, l2tp_session_delete is able to call
   list_del_init on the session B list node.
 * Back on the work queue, l2tp_tunnel_closeall resumes execution and
   will now spin forever on the same list entry until the underlying
   session structure is freed, at which point UAF occurs.

The solution is to iterate over the tunnel's session list using
list_first_entry_not_null to avoid the possibility of the list
iterator pointing at a list item which may be removed during the walk.

Also, have l2tp_tunnel_closeall ref each session while it processes it
to prevent another thread from freeing it.

	cpu1				cpu2
	---				---
					pppol2tp_release()

	spin_lock_bh(&tunnel->list_lock);
	for (;;) {
		session = list_first_entry_or_null(&tunnel->session_list,
						   struct l2tp_session, list);
		if (!session)
			break;
		list_del_init(&session->list);
		spin_unlock_bh(&tunnel->list_lock);

 					l2tp_session_delete(session);

		l2tp_session_delete(session);
		spin_lock_bh(&tunnel->list_lock);
	}
	spin_unlock_bh(&tunnel->list_lock);

Calling l2tp_session_delete on the same session twice isn't a problem
per-se, but if cpu2 manages to destruct the socket and unref the
session to zero before cpu1 progresses then it would lead to UAF.

Reported-by: syzbot+b471b7c936301a59745b@syzkaller.appspotmail.com
Reported-by: syzbot+c041b4ce3a6dfd1e63e2@syzkaller.appspotmail.com
Fixes: d18d3f0a ("l2tp: replace hlist with simple list for per-tunnel session list")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Link: https://patch.msgid.link/20240704152508.1923908-1-jchapman@katalix.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

f8ad00f3