Commits · 8c3fdff2171c834df5fa5ff353b94ada2e5376ca · Kirill Smelkov / linux

04 Jun, 2024 11 commits

openvswitch: Move stats allocation to core · 8c3fdff2

Breno Leitao authored May 31, 2024

With commit 34d21de9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core instead
of this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Move openvswitch driver to leverage the core allocation.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240531111552.3209198-1-leitao@debian.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

8c3fdff2

Merge branch 'tcp-refactor-skb_cmp_decrypted-checks' · cd0057ad

Paolo Abeni authored Jun 04, 2024

Jakub Kicinski says:

====================
tcp: refactor skb_cmp_decrypted() checks

Refactor the input patch coalescing checks and wrap "EOR forcing"
logic into a helper. This will hopefully make the code easier to
follow. While at it throw some DEBUG_NET checks into skb_shift().
====================

Link: https://lore.kernel.org/r/20240530233616.85897-1-kuba@kernel.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

cd0057ad

net: skb: add compatibility warnings to skb_shift() · 99b8add0

Jakub Kicinski authored May 30, 2024

According to current semantics we should never try to shift data
between skbs which differ on decrypted or pp_recycle status.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

99b8add0

tcp: add a helper for setting EOR on tail skb · 1be68a87

Jakub Kicinski authored May 30, 2024

TLS (and hopefully soon PSP will) use EOR to prevent skbs
with different decrypted state from getting merged, without
adding new tests to the skb handling. In both cases once
the connection switches to an "encrypted" state, all subsequent
skbs will be encrypted, so a single "EOR fence" is sufficient
to prevent mixing.

Add a helper for setting the EOR bit, to make this arrangement
more explicit.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

1be68a87

tcp: wrap mptcp and decrypted checks into tcp_skb_can_collapse_rx() · 07111530

Jakub Kicinski authored May 30, 2024

tcp_skb_can_collapse() checks for conditions which don't make
sense on input. Because of this we ended up sprinkling a few
pairs of mptcp_skb_can_collapse() and skb_cmp_decrypted() calls
on the input path. Group them in a new helper. This should make
it less likely that someone will check mptcp and not decrypted
or vice versa when adding new code.

This implicitly adds a decrypted check early in tcp_collapse().
AFAIU this will very slightly increase our ability to collapse
packets under memory pressure, not a real bug.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

07111530

Merge branch 'net-allow-dissecting-matching-tunnel-control-flags' · 2589d668

Paolo Abeni authored Jun 04, 2024

Davide Caratti says:

====================
net: allow dissecting/matching tunnel control flags

Ilya says: "for correct matching on decapsulated packets, we should match
on not only tunnel id and headers, but also on tunnel configuration flags
like TUNNEL_NO_CSUM and TUNNEL_DONT_FRAGMENT. This is done to distinguish
similar tunnels with slightly different configs. And it is important since
tunnel configuration is flow based, i.e. can be different for every packet,
even though the main tunnel port is the same."

 - patch 1 extends the kernel's flow dissector to extract these flags
   from the packet's tunnel metadata.
 - patch 2 extends TC flower to match on any combination of TUNNEL_NO_CSUM,
   TUNNEL_DONT_FRAGMENT, TUNNEL_OAM, TUNNEL_CRIT_OPT

v4:
 - fix kernel-doc warning in flow_dissector.h (thanks Jakub)

v3:
 - rebase on top of new uAPI bits and internals after commit 5832c4a7
   ("ip_tunnel: convert __be16 tunnel flags to bitmaps"). Use of network
   byte order is no more needed, since these bits match on metadata: convert
   netlink attributes to be u32.
 - also include TUNNEL_CRIT_OPT

v2:
 - use NL_REQ_ATTR_CHECK() where possible (thanks Jamal)
 - don't overwrite 'ret' in the error path of fl_set_key_flags()
====================

Link: https://lore.kernel.org/r/cover.1717088241.git.dcaratti@redhat.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

2589d668

net/sched: cls_flower: add support for matching tunnel control flags · 1d17568e

Davide Caratti authored May 30, 2024

extend cls_flower to match TUNNEL_FLAGS_PRESENT bits in tunnel metadata.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

1d17568e

flow_dissector: add support for tunnel control flags · 668b6a2e

Davide Caratti authored May 30, 2024

Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata.
This is a prerequisite for matching these control flags using TC flower.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

668b6a2e

net: count drops due to missing qdisc as dev->tx_drops · 4fdb6b60

Jakub Kicinski authored May 29, 2024

Catching and debugging missing qdiscs is pretty tricky. When qdisc
is deleted we replace it with a noop qdisc, which silently drops
all the packets. Since the noop qdisc has a single static instance
we can't count drops at the qdisc level. Count them as dev->tx_drops.

  ip netns add red
  ip link add type veth peer netns red
  ip            link set dev veth0 up
  ip -netns red link set dev veth0 up
  ip            a a dev veth0 10.0.0.1/24
  ip -netns red a a dev veth0 10.0.0.2/24
  ping -c 2 10.0.0.2
  #  2 packets transmitted, 2 received, 0% packet loss, time 1031ms
  ip -s link show dev veth0
  #  TX:  bytes packets errors dropped carrier collsns
  #        1314      17      0       0       0       0

  tc qdisc replace dev veth0 root handle 1234: mq
  tc qdisc replace dev veth0 parent 1234:1 pfifo
  tc qdisc del dev veth0 parent 1234:1
  ping -c 2 10.0.0.2
  #  2 packets transmitted, 0 received, 100% packet loss, time 1034ms
  ip -s link show dev veth0
  #  TX:  bytes packets errors dropped carrier collsns
  #        1314      17      0       3       0       0
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240529162527.3688979-1-kuba@kernel.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

4fdb6b60

r8152: Wake up the system if the we need a reset · 8c1d92a7

Douglas Anderson authored May 30, 2024

If we get to the end of the r8152's suspend() routine and we find that
the USB device is INACCESSIBLE then it means that some of our
preparation for suspend didn't take place. We need a USB reset to get
ourselves back in a consistent state so we can try again and that
can't happen during system suspend. Call pm_wakeup_event() to wake the
system up in this case.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/66590f25.170a0220.8b5ad.1752@mx.google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8c1d92a7

r8152: If inaccessible at resume time, issue a reset · 4933b066

Douglas Anderson authored May 30, 2024

If we happened to get a USB transfer error during the transition to
suspend then the usb_queue_reset_device() that r8152_control_msg()
calls will get dropped on the floor. This is because
usb_lock_device_for_reset() (which usb_queue_reset_device() uses)
silently fails if it's called when a device is suspended or if too
much time passes.

Let's resolve this by resetting the device ourselves in r8152's
resume() function.

NOTE: due to timing, it's _possible_ that we could end up with two USB
resets: the one queued previously and the one called from the resume()
patch. This didn't happen in test cases I ran, though it's conceivably
possible. We can't easily know if this happened since
usb_queue_reset_device() can just silently drop the reset request. In
any case, it's not expected that this is a problem since the two
resets can't run at the same time (because of the device lock) and it
should be OK to reset the device twice. If somehow the double-reset
causes problems we could prevent resets from being queued up while
suspend is running.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/66590f22.170a0220.8b5ad.1750@mx.google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4933b066

03 Jun, 2024 12 commits

Merge branch 'Felix-DSA-probing-cleanup' · 83042ce9

David S. Miller authored Jun 03, 2024

Vladimir Oltean says:

====================
Probing cleanup for the Felix DSA driver

This is a follow-up to Russell King's request for code consolidation
among felix_vsc9959, seville_vsc9953 and ocelot_ext, stated here:
https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Details are in individual patches. Testing was done on NXP LS1028A
(felix_vsc9959).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

83042ce9

net: dsa: ocelot: unexport felix_phylink_mac_ops and felix_switch_ops · a4303941

Vladimir Oltean authored May 30, 2024

Now that the common felix_register_switch() from the umbrella driver
is the only entity that accesses these data structures, we can remove
them from the list of the exported symbols.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a4303941

net: dsa: ocelot: common probing code · efdbee7d

Vladimir Oltean authored May 30, 2024

Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init code, which could be
made common [1].

[1]: https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Here, we take the following common steps:
- "felix" and "ds" structure allocation
- "felix", "ocelot" and "ds" basic structure initialization
- dsa_register_switch() call

and we make a common function out of them.

For every driver except felix_vsc9959, this is also the entire probing
procedure. For felix_vsc9959, we also need to do some PCI-specific
stuff, which can easily be reordered to be done before, and unwound on
failure.

We also have to convert the bus-specific platform_set_drvdata() and
pci_set_drvdata() calls into dev_set_drvdata(). But this should have no
impact on the behavior.
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

efdbee7d

net: dsa: ocelot: use ds->num_tx_queues = OCELOT_NUM_TC for all models · 4ca54dd9

Vladimir Oltean authored May 30, 2024

Russell King points out that seville_vsc9953 populates
felix->info->num_tx_queues = 8, but this doesn't make it all the way
into ds->num_tx_queues (which is how the user interface netdev queues
get allocated) [1].

[1]: https://lore.kernel.org/all/20240415160150.yejcazpjqvn7vhxu@skbuf/

When num_tx_queues=0 for seville, this is implicitly converted to 1 by
dsa_user_create(), and this is good enough for basic operation for a
switch port. The tc qdisc offload layer works with netdev TX queues,
so for QoS offload we need to pretend we have multiple TX queues. The
VSC9953, like ocelot_ext, doesn't export QoS offload, so it doesn't
really matter. But we can definitely set num_tx_queues=8 for all
switches.

The felix->info->num_tx_queues construct itself seems unnecessary.
It was introduced by commit de143c0e ("net: dsa: felix: Configure
Time-Aware Scheduler via taprio offload") at a time when vsc9959
(LS1028A) was the only switch supported by the driver.

8 traffic classes, and 1 queue per traffic class, is a common
architectural feature of all switches in the family. So they could
all just set OCELOT_NUM_TC and be fine.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ca54dd9

net: dsa: ocelot: move devm_request_threaded_irq() to felix_setup() · 0367a177

Vladimir Oltean authored May 30, 2024

The current placement of devm_request_threaded_irq() is inconvenient.
It is between the allocation of the "felix" structure and
dsa_register_switch(), both of which we'd like to refactor into a
function that's common for all switches. But the IRQ is specific to
felix_vsc9959.

A closer inspection of the felix_irq_handler() code suggests that
it does things that depend on the data structures having been fully
initialized. For example, ocelot_get_txtstamp() takes
&port->tx_skbs.lock, which has only been initialized in
ocelot_init_port() which has not run yet.

It is not one of those IRQF_SHARED IRQs, so CONFIG_DEBUG_SHIRQ_FIXME
shouldn't apply here, and thus, it doesn't really matter, because in
practice, the IRQ will not be triggered so early. Nonetheless, it is a
good practice for the driver to be prepared for it to fire as soon as it
is requested.

Create a new felix->info method for running custom code for vsc9959 from
within felix_setup(), and move the request_irq() call there. The
ocelot_ext should have an IRQ as well, so this should be a step in the
right direction for that model (VSC7512) as well.

Some minor changes are made while moving the code. Casts from void *
aren't necessary, so drop them, and rename felix_irq_handler() to the
more specific vsc9959_irq_handler().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0367a177

net: dsa: ocelot: consistently use devres in felix_pci_probe() · 4510bbd3

Vladimir Oltean authored May 30, 2024

Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.

[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Prepare the ground in the felix_vsc9959 driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4510bbd3

net: dsa: ocelot: delete open coded status = "disabled" parsing · cc711c52

Vladimir Oltean authored May 30, 2024

Since commit 6fffbc7a ("PCI: Honor firmware's device disabled
status"), PCI device drivers with OF bindings no longer need this check.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc711c52

net: dsa: ocelot: use devres in seville_probe() · 90ee9a5b

Vladimir Oltean authored May 30, 2024

Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.

[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Prepare the ground in the seville_vsc9953 driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90ee9a5b

net: dsa: ocelot: use devres in ocelot_ext_probe() · 454cfffe

Vladimir Oltean authored May 30, 2024

Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.

[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/

Prepare the ground in the ocelot_ext driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

454cfffe

Merge branch 'net-smc-snd_buf-rcv_buf' · 93e30878

David S. Miller authored Jun 03, 2024

Guangguan Wang says:

====================
net/smc: Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB

SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose default value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.

In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.

The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

93e30878

net/smc: change SMCR_RMBE_SIZES from 5 to 15 · 2f4b101c

Guangguan Wang authored Jun 03, 2024

SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose default value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.

In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.

The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f4b101c

net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined · 3ac14b9d

Guangguan Wang authored Jun 03, 2024

SG_MAX_SINGLE_ALLOC is used to limit maximum number of entries that
will be allocated in one piece of scatterlist. When the entries of
scatterlist exceeds SG_MAX_SINGLE_ALLOC, sg chain will be used. From
commit 7c703e54 ("arch: switch the default on ARCH_HAS_SG_CHAIN"),
we can know that the macro CONFIG_ARCH_NO_SG_CHAIN is used to identify
whether sg chain is supported. So, SMC-R's rmb buffer should be limited
by SG_MAX_SINGLE_ALLOC only when the macro CONFIG_ARCH_NO_SG_CHAIN is
defined.

Fixes: a3fe3d01 ("net/smc: introduce sg-logic for RMBs")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ac14b9d

01 Jun, 2024 17 commits

af_unix: Remove dead code in unix_stream_read_generic(). · b5c08988

Kuniyuki Iwashima authored May 29, 2024

When splice() support was added in commit 2b514574 ("net:
af_unix: implement splice for stream af_unix sockets"), we had
to release unix_sk(sk)->readlock (current iolock) before calling
splice_to_pipe().

Due to the unlock, commit 73ed5d25 ("af-unix: fix use-after-free
with concurrent readers while splicing") added a safeguard in
unix_stream_read_generic(); we had to bump the skb refcount before
calling ->recv_actor() and then check if the skb was consumed by a
concurrent reader.

However, the pipe side locking was refactored, and since commit
25869262 ("skb_splice_bits(): get rid of callback"), we can
call splice_to_pipe() without releasing unix_sk(sk)->iolock.

Now, the skb is always alive after the ->recv_actor() callback,
so let's remove the unnecessary drop_skb logic.

This is mostly the revert of commit 73ed5d25 ("af-unix: fix
use-after-free with concurrent readers while splicing").
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240529144648.68591-1-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b5c08988

Merge branch... · 750ed239

Jakub Kicinski authored Jun 01, 2024

Merge branch 'lan78xx-enable-125-mhz-clk-and-auto-speed-configuration-for-lan7801-if-no-eeprom-is-detected'

Rengarajan says:

====================
lan78xx: Enable 125 MHz CLK and Auto Speed configuration for LAN7801 if NO EEPROM is detected

This patch series adds the support for 125 MHz clock, Auto speed and
auto duplex configuration for LAN7801 in the absence of EEPROM.
====================

Link: https://lore.kernel.org/r/20240529140256.1849764-1-rengarajan.s@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

750ed239

lan78xx: Enable Auto Speed and Auto Duplex configuration for LAN7801 if NO EEPROM is detected · 799f532d

Rengarajan S authored May 29, 2024

Enabled ASD/ADD configuration for LAN7801 in the absence of EEPROM.
After the lite reset these contents go back to defaults where ASD/
ADD is disabled. The check is already available for LAN7800.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Rengarajan S <rengarajan.s@microchip.com>
Link: https://lore.kernel.org/r/20240529140256.1849764-3-rengarajan.s@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

799f532d

lan78xx: Enable 125 MHz CLK configuration for LAN7801 if NO EEPROM is detected · 5160b129

Rengarajan S authored May 29, 2024

The 125MHz and 25MHz clock configurations are enabled in the initialization
regardless of EEPROM (125MHz is needed for RGMII 1000Mbps operation). After
a lite reset (lan78xx_reset), these contents go back to defaults(all 0, so
no 125MHz or 25MHz clock).
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Rengarajan S <rengarajan.s@microchip.com>
Link: https://lore.kernel.org/r/20240529140256.1849764-2-rengarajan.s@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5160b129

Merge branch 'net-ethernet-cortina-use-phylib-for-rx-and-tx-pause' · e58b43f2

Jakub Kicinski authored Jun 01, 2024

Linus Walleij says:

====================
net: ethernet: cortina: Use phylib for RX and TX pause

This patch series switches the Cortina Gemini ethernet
driver to use phylib to set up RX and TX pause for the
PHY.

v3: https://lore.kernel.org/r/20240513-gemini-ethernet-fix-tso-v3-0-b442540cc140@linaro.org
v2: https://lore.kernel.org/r/20240511-gemini-ethernet-fix-tso-v2-0-2ed841574624@linaro.org
v1: https://lore.kernel.org/r/20240509-gemini-ethernet-fix-tso-v1-0-10cd07b54d1c@linaro.org
====================

Link: https://lore.kernel.org/r/20240529-gemini-phylib-fixes-v4-0-16487ca4c2fe@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e58b43f2

net: ethernet: cortina: Implement .set_pauseparam() · dbdb0918

Linus Walleij authored May 29, 2024

The Cortina Gemini ethernet can very well set up TX or RX
pausing, so add this functionality to the driver in a
.set_pauseparam() callback. Essentially just call down to
phylib and let phylib deal with this, .adjust_link()
will respect the setting from phylib.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240529-gemini-phylib-fixes-v4-3-16487ca4c2fe@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dbdb0918

net: ethernet: cortina: Use negotiated TX/RX pause · 15c22101

Linus Walleij authored May 29, 2024

Instead of directly poking into registers of the PHY, use
the existing function to query phylib about this directly.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240529-gemini-phylib-fixes-v4-2-16487ca4c2fe@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

15c22101

net: ethernet: cortina: Rename adjust link callback · a967d3ce

Linus Walleij authored May 29, 2024

The callback passed to of_phy_get_and_connect() in the
Cortina Gemini driver is called "gmac_speed_set" which is
archaic, rename it to "gmac_adjust_link" following the
pattern of most other drivers.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240529-gemini-phylib-fixes-v4-1-16487ca4c2fe@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a967d3ce

Merge branch 'net-visibility-of-memory-limits-in-netns' · 5086e1b7

Jakub Kicinski authored Jun 01, 2024

Matteo Croce says:

====================
net: visibility of memory limits in netns

Some programs need to know the size of the network buffers to operate
correctly, export the following sysctls read-only in network namespaces:

- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_default
- net.core.wmem_max
====================

Link: https://lore.kernel.org/r/20240530232722.45255-1-technoboy85@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5086e1b7

selftests: net: tests net.core.{r,w}mem_{default,max} sysctls in a netns · 5b5233fb

Matteo Croce authored May 31, 2024

Add a selftest which checks that the sysctl is present in a netns,
that the value is read from the init one, and that it's readonly.
Signed-off-by: Matteo Croce <teknoraver@meta.com>
Link: https://lore.kernel.org/r/20240530232722.45255-3-technoboy85@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5b5233fb

net: make net.core.{r,w}mem_{default,max} namespaced · 19249c07

Matteo Croce authored May 31, 2024

The following sysctl are global and can't be read from a netns:

net.core.rmem_default
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max

Make the following sysctl parameters available readonly from within a
network namespace, allowing a container to read them.
Signed-off-by: Matteo Croce <teknoraver@meta.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/r/20240530232722.45255-2-technoboy85@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

19249c07

bnxt_en: add timestamping statistics support · 165f8769

Vadim Fedorenko authored May 30, 2024

The ethtool_ts_stats structure was introduced earlier this year. Now
it's time to support this group of counters in more drivers.
This patch adds support to bnxt driver.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240530204751.99636-1-vadfed@meta.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

165f8769

Merge branch 'ice-introduce-eth56g-phy-model-for-e825c-products' · fc5570e0

Jakub Kicinski authored Jun 01, 2024

Jacob Keller says:

====================
ice: Introduce ETH56G PHY model for E825C products

E825C products have a different PHY model than E822, E823 and E810 products.
This PHY is ETH56G and its support is necessary to have functional PTP stack
for E825C products.

This series refactors the ice driver to add support for the new PHY model.

Karol introduces the ice_ptp_hw structure. This is used to replace some
hard-coded values relating to the PHY quad and port numbers, as well as to
hold the phy_model type.

Jacob refactors the driver code that converts between the ice_ptp_tmr_cmd
enumeration and hardware register values to better re-use logic and reduce
duplication when introducing another PHY type.

Sergey introduces functions to help enable and disable the Tx timestamp
interrupts. This makes the ice_ptp.c code more generic and encapsulates the
PHY specifics into ice_ptp_hw.c

Karol introduces helper functions to clear the valid bits for Tx and Rx
timestamps. This enables informing hardware to discard stale timestamps
after performing clock operations.

Sergey moves the Clock Generation Unit (CGU) logic out of the E822 specific
area of the ice_ptp_hw.c file as it will be re-used for other device PHY
models.

Jacob introduces a helper function for obtaining the base increment values,
moving this logic out of ice_ptp.c and into the ice_ptp_hw.c file to better
encapsulate hardware differences.

Sergey builds on these refactors to introduce the new ETH56G PHY model used
by the E825C products. This includes introducing the required helpers,
constants, and PHY model checks.

Karol simplifies the CGU logic by using anonymous structures, dropping an
unnecessary ".field" name for accessing the CGU data.

Michal Michalik updates the CGU logic to support the E825C hardware,
ensuring that the clock generation is configured properly.

Grzegorz Nitka adds support to read the NAC topology data from the device.
This is in preparation for supporting devices which combine two NACs
together, connecting all ports to the same clock source. This enables the
driver to determine if its operating on such a device, or if its operating
on the standard 1-NAC configuration.

Grzsecgorz Nitka adjusts the PTP initialization to prepare for the 2x50G
E825C devices, introducing special mapping for the PHY ports to prepare for
support of the 2-NAC devices.

With this, the ice driver is capable of handling PTP for the single-NAC
E825C devices. Complete support for the 2-NAC devices requirs some work on
how the ports connect to the clock owner. During review of this work, it
was pointed out that our existing use of auxiliary bus is disliked, and
Jiri requested that we change it. We are currently working on developing a
replacement solution for the auxiliary bus implementation and have dropped
the relevant changes out of this series. A future series will refactor the
port to clock connection, at which time we will finish the support for
2-NAC E825C devices.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
====================

Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-0-c082739bb6f6@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fc5570e0

ice: Adjust PTP init for 2x50G E825C devices · 4409ea17

Grzegorz Nitka authored May 28, 2024

>From FW/HW perspective, 2 port topology in E825C devices requires
merging of 2 port mapping internally and breakout mapping externally.
As a consequence, it requires different port numbering from PTP code
perspective.
For that topology, pf_id can not be used to index PTP ports. Even if
the 2nd port is identified as port with pf_id = 1, all PHY operations
need to be performed as it was port 2. Thus, special mapping is needed
for the 2nd port.
This change adds detection of 2x50G topology and applies 'custom'
mapping on the 2nd port.
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-next-2024-05-28-ptp-refactors-v1-11-c082739bb6f6@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>