Commits · 3128b9f51ee7ec7d091496379247489aab3007bb · Kirill Smelkov / linux

17 Jun, 2022 32 commits

selftests: mlxsw: resource_scale: Introduce traffic tests · 3128b9f5

Petr Machata authored Jun 16, 2022

The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures are
noted and handled gracefully.

However the ability to allocate the resource does not mean that the
resource actually works when passing traffic. For that, make it possible
for a given scale to also test traffic.

Traffic test is only run on the positive leg of the scale test (no point
trying to pass traffic when the expected outcome is that the resource will
not be allocated). Traffic tests are opt-in, if a given test does not
expose it, it is not run.

To this end, delay the test cleanup until after the traffic test is run.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3128b9f5

selftests: mlxsw: resource_scale: Update scale target after test setup · d3ffeb2d

Ido Schimmel authored Jun 16, 2022

The scale of each resource is tested in the following manner:

1. The scale target is queried.
2. The test setup is prepared.
3. The test is invoked.

In some cases, the occupancy of a resource changes as part of the second
step, requiring the test to return a scale target that takes this change
into account.

Make this more robust by re-querying the scale target after the second
step.

Another possible solution is to swap the first and second steps, but
when a test needs to be skipped (i.e., scale target is zero), the setup
would have been in vain.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3ffeb2d

selftests: mirror_gre_bridge_1q_lag: Enslave port to bridge before other configurations · e386a527

Amit Cohen authored Jun 16, 2022

Using mlxsw driver, the configurations are offloaded just in case that
there is a physical port which is enslaved to the virtual device
(e.g., to a bridge). In 'mirror_gre_bridge_1q_lag' test, the bridge gets an
address and route before there are ports in the bridge. It means that these
configurations are not offloaded.

Till now the test passes with mlxsw driver even that the RIF of the
bridge is not in the hardware, because the ARP packets are trapped in
layer 2 and also mirrored, so there is no real need of the RIF in hardware.
The previous patch changed the traps 'ARP_REQUEST' and 'ARP_RESPONSE' to
be done at layer 3 instead of layer 2. With this change the ARP packets are
not trapped during the test, as the RIF is not in the hardware because of
the order of configurations.

Reorder the configurations to make them to be offloaded, then the test will
pass with the change of the traps.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e386a527

mlxsw: Add a resource describing number of RIFs · 4ec2feb2

Petr Machata authored Jun 16, 2022

The Spectrum ASIC has a limit on how many L3 devices (called RIFs) can be
created. The limit depends on the ASIC and FW revision, and mlxsw reads it
from the FW. In order to communicate both the number of RIFs that there can
be, and how many are taken now (i.e. occupancy), introduce a corresponding
devlink resource.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ec2feb2

mlxsw: Keep track of number of allocated RIFs · b9840fe0

Petr Machata authored Jun 16, 2022

In order to expose number of RIFs as a resource, it is going to be handy
to have the number of currently-allocated RIFs as a single number.
Introduce such.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9840fe0

mlxsw: Trap ARP packets at layer 3 instead of layer 2 · 4b1cc357

Amit Cohen authored Jun 16, 2022

Currently, the traps 'ARP_REQUEST' and 'ARP_RESPONSE' occur at layer 2.
To allow the packets to be flooded, they are configured with the action
'MIRROR_TO_CPU' which means that the CPU receives a replica of the packet.

Today, Spectrum ASICs also support trapping ARP packets at layer 3. This
behavior is better, then the packets can just be trapped and there is no
need to mirror them. An additional motivation is that using the traps at
layer 2, the ARP packets are dropped in the router as they do not have an
IP header, then they are counted as error packets, which might confuse
users.

Add the relevant traps for layer 3 and use them instead of the existing
traps. There is no visible change to user space.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b1cc357

Merge branch 'tcp-mem-pressure-fixes' · e42134b5

David S. Miller authored Jun 17, 2022

Eric Dumazet says:

====================
tcp: final (?) round of mem pressure fixes

While working on prior patch series (e10b02ee "Merge branch
'net-reduce-tcp_memory_allocated-inflation'"), I found that we
could still have frozen TCP flows under memory pressure.

I thought we had solved this in 2015, but the fix was not complete.

v2: deal with zerocopy tx paths.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e42134b5

tcp: fix possible freeze in tx path under memory pressure · f54755f6

Eric Dumazet authored Jun 14, 2022

Blamed commit only dealt with applications issuing small writes.

Issue here is that we allow to force memory schedule for the sk_buff
allocation, but we have no guarantee that sendmsg() is able to
copy some payload in it.

In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.

For example, if we consider tcp_wmem[0] = 4096 (default on x86),
and initial skb->truesize being 1280, tcp_sendmsg() is able to
copy up to 2816 bytes under memory pressure.

Before this patch a sendmsg() sending more than 2816 bytes
would either block forever (if persistent memory pressure),
or return -EAGAIN.

For bigger MTU networks, it is advised to increase tcp_wmem[0]
to avoid sending too small packets.

v2: deal with zero copy paths.

Fixes: 8e4d980a ("tcp: fix behavior for epoll edge trigger")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f54755f6

tcp: fix possible freeze in tx path under memory pressure · 849b425c

Eric Dumazet authored Jun 14, 2022

Blamed commit only dealt with applications issuing small writes.

Issue here is that we allow to force memory schedule for the sk_buff
allocation, but we have no guarantee that sendmsg() is able to
copy some payload in it.

In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.

For example, if we consider tcp_wmem[0] = 4096 (default on x86),
and initial skb->truesize being 1280, tcp_sendmsg() is able to
copy up to 2816 bytes under memory pressure.

Before this patch a sendmsg() sending more than 2816 bytes
would either block forever (if persistent memory pressure),
or return -EAGAIN.

For bigger MTU networks, it is advised to increase tcp_wmem[0]
to avoid sending too small packets.

v2: deal with zero copy paths.

Fixes: 8e4d980a ("tcp: fix behavior for epoll edge trigger")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

849b425c

tcp: fix over estimation in sk_forced_mem_schedule() · c4ee1185

Eric Dumazet authored Jun 14, 2022

sk_forced_mem_schedule() has a bug similar to ones fixed
in commit 7c80b038 ("net: fix sk_wmem_schedule() and
sk_rmem_schedule() errors")

While this bug has little chance to trigger in old kernels,
we need to fix it before the following patch.

Fixes: d83769a5 ("tcp: fix possible deadlock in tcp_send_fin()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4ee1185

Merge branch 'net-lan743x-pci11010-pci11414-devices-enhancements' · e8b03391

Jakub Kicinski authored Jun 16, 2022

Raju Lakkaraju says:

====================
net: lan743x: PCI11010 / PCI11414 devices Enhancements

This patch series continues with the addition of supported features
for the Ethernet function of the PCI11010 / PCI11414 devices to
the LAN743x driver.
====================

Link: https://lore.kernel.org/r/20220616041226.26996-1-Raju.Lakkaraju@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e8b03391

net: phy: add support to get Master-Slave configuration · 311abcdd

Raju Lakkaraju authored Jun 16, 2022

Add support to Master-Slave configuration and state
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

311abcdd

net: lan743x: Add support to SGMII 1G and 2.5G · 46b777ad

Raju Lakkaraju authored Jun 16, 2022

Add SGMII access read and write functions
Add support to SGMII 1G and 2.5G for PCI11010/PCI11414 chips
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

46b777ad

net: lan743x: Add support to Secure-ON WOL · 6b3768ac

Raju Lakkaraju authored Jun 16, 2022

Add support to Magic Packet Detection with Secure-ON for PCI11010/PCI11414 chips
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6b3768ac

net: lan743x: Add support to LAN743x register dump · 9aeb87d2

Raju Lakkaraju authored Jun 16, 2022

Add support to LAN743x common register dump
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9aeb87d2

Merge branch 'net-dsa-realtek-rtl8365mb-improve-handling-of-phy-modes' · f0502724

Jakub Kicinski authored Jun 16, 2022

Alvin Šipraga says:

====================
net: dsa: realtek: rtl8365mb: improve handling of PHY modes

This series introduces some minor cleanup of the driver and improves the
handling of PHY interface modes to break the assumption that CPU ports
are always over an external interface, and the assumption that user
ports are always using an internal PHY.
====================

Link: https://lore.kernel.org/r/20220615225116.432283-1-alvin@pqrs.dkSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f0502724

net: dsa: realtek: rtl8365mb: handle PHY interface modes correctly · a48b6e44

Alvin Šipraga authored Jun 16, 2022

Realtek switches in the rtl8365mb family always have at least one port
with a so-called external interface, supporting PHY interface modes such
as RGMII or SGMII. The purpose of this patch is to improve the driver's
handling of these ports.

A new struct rtl8365mb_chip_info is introduced together with a static
array of such structs. An instance of this struct is added for each
supported switch, distinguished by its chip ID and version. Embedded in
each chip_info struct is an array of struct rtl8365mb_extint, describing
the external interfaces available. This is more specific than the old
rtl8365mb_extint_port_map, which was only valid for switches with up to
6 ports.

The struct rtl8365mb_extint also contains a bitmask of supported PHY
interface modes, which allows the driver to distinguish which ports
support RGMII. This corrects a previous mistake in the driver whereby it
was assumed that any port with an external interface supports RGMII.
This is not actually the case: for example, the RTL8367S has two
external interfaces, only the second of which supports RGMII. The first
supports only SGMII and HSGMII. This new design will make it easier to
add support for other interface modes.

Finally, rtl8365mb_phylink_get_caps() is fixed up to return supported
capabilities based on the external interface properties described above.
This addresses Vladimir's point in the linked thread that the
capabilities are not actually a function of the DSA port type: Although
most typical applications will treat the ports with internal PHY as user
ports, there is no actual hardware limitation preventing one from using
them as a CPU port. Equally, ports with external interface(s) may well
be treated as user ports, even though it is typical to use those ports
as CPU ports.

Link: https://lore.kernel.org/netdev/20220510192301.5djdt3ghoavxulhl@bang-olufsen.dk/Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a48b6e44

net: dsa: realtek: rtl8365mb: remove learn_limit_max private data member · b3456030

Alvin Šipraga authored Jun 16, 2022

The variable is just assigned the value of a macro, so it can be
removed.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b3456030

net: dsa: realtek: rtl8365mb: correct the max number of ports · ca5ecd42

Alvin Šipraga authored Jun 16, 2022

The maximum number of ports is actually 11, according to two
observations:

1. The highest port ID used in the vendor driver is 10. Since port IDs
   are indexed from 0, and since DSA follows the same numbering system,
   this means up to 11 ports are to be presumed.

2. The registers with port mask fields always amount to a maximum port
   mask of 0x7FF, corresponding to a maximum 11 ports.

In view of this, I also deleted the comment.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ca5ecd42

net: dsa: realtek: rtl8365mb: remove port_mask private data member · b325159d

Alvin Šipraga authored Jun 16, 2022

There is no real need for this variable: the line change interrupt mask
is sufficiently masked out when getting linkup_ind and linkdown_ind in
the interrupt handler.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b325159d

net: dsa: realtek: rtl8365mb: rename macro RTL8367RB -> RTL8367RB_VB · 5eb1a238

Alvin Šipraga authored Jun 16, 2022

The official name of this switch is RTL8367RB-VB, not RTL8367RB. There
is also an RTL8367RB-VC which is rather different. Change the name of
the CHIP_ID/_VER macros for reasons of consistency.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5eb1a238

Merge branch 'net-ipa-more-multi-channel-event-ring-work' · 821c7733

Jakub Kicinski authored Jun 16, 2022

Alex Elder says:

====================
net: ipa: more multi-channel event ring work

This series makes a little more progress toward supporting multiple
channels with a single event ring.  The first removes the assumption
that consecutive events are associated with the same RX channel.

The second derives the channel associated with an event from the
event itself, and the next does a small cleanup enabled by that.

The fourth causes updates to occur for every event processed (rather
once).  And the final patch does a little more rework to make TX
completion have more in common with RX completion.
====================

Link: https://lore.kernel.org/r/20220615165929.5924-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

821c7733