Commits · ebddb1404900657b7f03a56ee4c34a9d218c4030 · Kirill Smelkov / linux

12 Dec, 2022 16 commits

net: move the nat function to nf_nat_ovs for ovs and tc · ebddb140

Xin Long authored Dec 08, 2022

There are two nat functions are nearly the same in both OVS and
TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().

This patch creates nf_nat_ovs.c under netfilter and moves them
there then exports nf_ct_nat() so that it can be shared by both
OVS and TC, and keeps the nat (type) check and nat flag update
in OVS and TC's own place, as these parts are different between
OVS and TC.

Note that in OVS nat function it was using skb->protocol to get
the proto as it already skips vlans in key_extract(), while it
doesn't in TC, and TC has to call skb_protocol() to get proto.
So in nf_ct_nat_execute(), we keep using skb_protocol() which
works for both OVS and TC contrack.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebddb140

net: sched: update the nat flag for icmp error packets in ct_nat_execute · 0564c3e5

Xin Long authored Dec 08, 2022

In ovs_ct_nat_execute(), the packet flow key nat flags are updated
when it processes ICMP(v6) error packets translation successfully.

In ct_nat_execute() when processing ICMP(v6) error packets translation
successfully, it should have done the same in ct_nat_execute() to set
post_ct_s/dnat flag, which will be used to update flow key nat flags
in OVS module later.
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0564c3e5

openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat · 2b85144a

Xin Long authored Dec 08, 2022

When it fails to allocate nat ext, the packet should be dropped, like
the memory allocation failures in other places in ovs_ct_nat().

This patch changes to return NF_DROP when fails to add nat ext before
doing NAT in ovs_ct_nat(), also it would keep consistent with tc
action ct' processing in tcf_ct_act_nat().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b85144a

openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat · 77959289

Xin Long authored Dec 08, 2022

Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().

This patch changes nothing funcational but only makes this return
earlier in ovs_ct_nat() to keep consistent with TC's processing
in tcf_ct_act_nat().
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

77959289

openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute · bf14f492

Xin Long authored Dec 08, 2022

The calls to ovs_ct_nat_execute() are as below:

  ovs_ct_execute()
    ovs_ct_lookup()
      __ovs_ct_lookup()
        ovs_ct_nat()
          ovs_ct_nat_execute()
    ovs_ct_commit()
      __ovs_ct_lookup()
        ovs_ct_nat()
          ovs_ct_nat_execute()

and since skb_pull_rcsum() and skb_push_rcsum() are already
called in ovs_ct_execute(), there's no need to do it again
in ovs_ct_nat_execute().
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf14f492

myri10ge: use strscpy() to instead of strncpy() · a2d40ce7

Xu Panda authored Dec 09, 2022

The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL terminated strings.
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a2d40ce7

liquidio: use strscpy() to instead of strncpy() · f6b759f5

Xu Panda authored Dec 09, 2022

The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL terminated strings.
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6b759f5

hns: use strscpy() to instead of strncpy() · 94d30e89

Xu Panda authored Dec 09, 2022

The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL terminated strings.
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94d30e89

net: stmmac: Add check for taprio basetime configuration · 6d534ee0

Michael Sit Wei Hong authored Dec 08, 2022

Adds a boundary check to prevent negative basetime input from user
while configuring taprio.
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d534ee0

Merge branch 'tun-vnet-uso' · 93c60b59

David S. Miller authored Dec 12, 2022

Andrew Melnychenko says:

====================
TUN/VirtioNet USO features support.

Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6.
Technically they enable NETIF_F_GSO_UDP_L4
(and only if USO4 & USO6 are set simultaneously).
It allows the transmission of large UDP packets.

UDP Segmentation Offload (USO/GSO_UDP_L4) - ability to split UDP packets
into several segments. It's similar to UFO, except it doesn't use IP
fragmentation. The drivers may push big packets and the NIC will split
them(or assemble them in case of receive), but in the case of VirtioNet
we just pass big UDP to the host. So we are freeing the driver from doing
the unnecessary job of splitting. The same thing for several guests
on one host, we can pass big packets between guests.

Different features USO4 and USO6 are required for qemu where Windows
guests can enable disable USO receives for IPv4 and IPv6 separately.
On the other side, Linux can't really differentiate USO4 and USO6, for now.
For now, to enable USO for TUN it requires enabling USO4 and USO6 together.
In the future, there would be a mechanism to control UDP_L4 GSO separately.

New types for virtio-net already in virtio-net specification:
https://github.com/oasis-tcs/virtio-spec/issues/120

Test it WIP Qemu https://github.com/daynix/qemu/tree/USOv3

Changes since v4 & RFC:
 * Fixed typo and refactored.
 * Tun USO offload refactored.
 * Add support for guest-to-guest segmentation offload (thx Jason).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

93c60b59

drivers/net/virtio_net.c: Added USO support. · 418044e1

Andrew Melnychenko authored Dec 07, 2022

Now, it possible to enable GSO_UDP_L4("tx-udp-segmentation") for VirtioNet.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

418044e1

linux/virtio_net.h: Support USO offload in vnet header. · 860b7f27

Andrew Melnychenko authored Dec 07, 2022

Now, it's possible to convert USO vnet packets from/to skb.
Added support for GSO_UDP_L4 offload.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

860b7f27

uapi/linux/virtio_net.h: Added USO types. · 34061b34

Andrew Melnychenko authored Dec 07, 2022

Added new GSO type for USO: VIRTIO_NET_HDR_GSO_UDP_L4.
Feature VIRTIO_NET_F_HOST_USO allows to enable NETIF_F_GSO_UDP_L4.
Separated VIRTIO_NET_F_GUEST_USO4 & VIRTIO_NET_F_GUEST_USO6 features
required for Windows guests.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34061b34

driver/net/tun: Added features for USO. · 399e0827

Andrew Melnychenko authored Dec 07, 2022

Added support for USO4 and USO6.
For now, to "enable" USO, it's required to set both USO4 and USO6 simultaneously.
USO enables NETIF_F_GSO_UDP_L4.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

399e0827

uapi/linux/if_tun.h: Added new offload types for USO4/6. · b22bbdd1

Andrew Melnychenko authored Dec 07, 2022

Added 2 additional offlloads for USO(IPv4 & IPv6).
Separate offloads are required for Windows VM guests,
g.e. Windows may set USO rx only for IPv4.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b22bbdd1

udp: allow header check for dodgy GSO_UDP_L4 packets. · 1fd54773

Andrew Melnychenko authored Dec 07, 2022

Allow UDP_L4 for robust packets.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fd54773

10 Dec, 2022 9 commits

Merge tag 'ipsec-next-2022-12-09' of... · dd8b3a80

Jakub Kicinski authored Dec 09, 2022

Merge tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
ipsec-next 2022-12-09

1) Add xfrm packet offload core API.
   From Leon Romanovsky.

2) Add xfrm packet offload support for mlx5.
   From Leon Romanovsky and Raed Salem.

3) Fix a typto in a error message.
   From Colin Ian King.

* tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits)
  xfrm: Fix spelling mistake "oflload" -> "offload"
  net/mlx5e: Open mlx5 driver to accept IPsec packet offload
  net/mlx5e: Handle ESN update events
  net/mlx5e: Handle hardware IPsec limits events
  net/mlx5e: Update IPsec soft and hard limits
  net/mlx5e: Store all XFRM SAs in Xarray
  net/mlx5e: Provide intermediate pointer to access IPsec struct
  net/mlx5e: Skip IPsec encryption for TX path without matching policy
  net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows
  net/mlx5e: Improve IPsec flow steering autogroup
  net/mlx5e: Configure IPsec packet offload flow steering
  net/mlx5e: Use same coding pattern for Rx and Tx flows
  net/mlx5e: Add XFRM policy offload logic
  net/mlx5e: Create IPsec policy offload tables
  net/mlx5e: Generalize creation of default IPsec miss group and rule
  net/mlx5e: Group IPsec miss handles into separate struct
  net/mlx5e: Make clear what IPsec rx_err does
  net/mlx5e: Flatten the IPsec RX add rule path
  net/mlx5e: Refactor FTE setup code to be more clear
  net/mlx5e: Move IPsec flow table creation to separate function
  ...
====================

Link: https://lore.kernel.org/r/20221209093310.4018731-1-steffen.klassert@secunet.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dd8b3a80

net: devlink: Add missing error check to devlink_resource_put() · 5fc11a40

Gavrilov Ilia authored Dec 08, 2022

When the resource size changes, the return value of the
'nla_put_u64_64bit' function is not checked. That has been fixed to avoid
rechecking at the next step.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Note that this is harmless, we'd error out at the next put().
Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru>
Link: https://lore.kernel.org/r/20221208082821.3927937-1-Ilia.Gavrilov@infotecs.ruSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5fc11a40

skbuff: Introduce slab_build_skb() · ce098da1

Kees Cook authored Dec 07, 2022

syzkaller reported:

  BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
  Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295

For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
build_skb().

When build_skb() is passed a frag_size of 0, it means the buffer came
from kmalloc. In these cases, ksize() is used to find its actual size,
but since the allocation may not have been made to that size, actually
perform the krealloc() call so that all the associated buffer size
checking will be correctly notified (and use the "new" pointer so that
compiler hinting works correctly). Split this logic out into a new
interface, slab_build_skb(), but leave the original 0 checking for now
to catch any stragglers.

Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
Fixes: 38931d89 ("mm: Make ksize() a reporting-only function")
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: pepsipu <soopthegoop@gmail.com>
Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: ast@kernel.org
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: jolsa@kernel.org
Cc: KP Singh <kpsingh@kernel.org>
Cc: martin.lau@linux.dev
Cc: Stanislav Fomichev <sdf@google.com>
Cc: song@kernel.org
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221208060256.give.994-kees@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ce098da1

net: bcmgenet: Remove the unused function · 28d39503

Jiapeng Chong authored Dec 09, 2022

The function dmadesc_get_addr() is defined in the bcmgenet.c file, but
not called elsewhere, so remove this unused function.

drivers/net/ethernet/broadcom/genet/bcmgenet.c:120:26: warning: unused function 'dmadesc_get_addr'.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3401Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221209033723.32452-1-jiapeng.chong@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

28d39503

Merge branch 'mptcp-miscellaneous-cleanup' · 2b53d869

Jakub Kicinski authored Dec 09, 2022

Mat Martineau says:

====================
mptcp: Miscellaneous cleanup

Two code cleanup patches for the 6.2 merge window that don't change
behavior:

Patch 1 makes proper use of nlmsg_free(), as suggested by Jakub while
reviewing f8c9dfbd ("mptcp: add pm listener events").

Patch 2 clarifies success status in a few mptcp functions, which
prevents some smatch false positives.
====================

Link: https://lore.kernel.org/r/20221209004431.143701-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2b53d869

mptcp: return 0 instead of 'err' var · 03e7d28c

Matthieu Baerts authored Dec 08, 2022

When 'err' is 0, it looks clearer to return '0' instead of the variable
called 'err'.

The behaviour is then not modified, just a clearer code.

By doing this, we can also avoid false positive smatch warnings like
this one:

  net/mptcp/pm_netlink.c:1169 mptcp_pm_parse_pm_addr_attr() warn: missing error code? 'err'
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

03e7d28c

mptcp: use nlmsg_free instead of kfree_skb · 8b34b52c

Geliang Tang authored Dec 08, 2022

Use nlmsg_free() instead of kfree_skb() in pm_netlink.c.

The SKB's have been created by nlmsg_new(). The proper cleaning way
should then be done with nlmsg_free().

For the moment, nlmsg_free() is simply calling kfree_skb() so we don't
change the behaviour here.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8b34b52c

Merge tag 'mlx5-updates-2022-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · c80edd8d

Jakub Kicinski authored Dec 09, 2022

Saeed Mahameed says:

====================
mlx5-updates-2022-12-08

1) Support range match action in SW steering

Yevgeny Kliteynik says:
=======================

The following patch series adds support for a range match action in
SW Steering.

SW steering is able to match only on the exact values of the packet fields,
as requested by the user: the user provides mask for the fields that are of
interest, and the exact values to be matched on when the traffic is handled.

The following patch series add new type of action - Range Match, where the
user provides a field to be matched on and a range of values (min to max)
that will be considered as hit.

There are several new notions that were implemented in order to support
Range Match:
 - MATCH_RANGES Steering Table Entry (STE): the new STE type that allows
   matching the packets' fields on the range of values instead of a specific
   value.
 - Match Definer: this is a general FW object that defines which fields
   in the packet will be referenced by the mask and tag of each STE.
   Match definer ID is part of STE fields, and it defines how the HW needs
   to interpret the STE's mask/tag values.
   Till now SW steering used the definers that were managed by FW and
   implemented the STE layout as described by the HW spec.
   Now that we're adding a new type of STE, SW steering needs to also be
   able to define this new STE's layout, and this is do

=======================

2) From OZ add support for meter mtu offload
   2.1: Refactor the code to allow both metering and range post actions as a
        pre-step for adding police mtu offload support.
   2.2: Instantiate mtu green/red flow tables with a single match-all rule.
        Add the green/red actions to the hit/miss table accordingly
   2.3: Initialize the meter object with the TC police mtu parameter.
        Use the hardware range match action feature.

3) From MaorD, support routes with more than 2 nexthops in multipath

4) Michael and Or, improve and extend vport representor counters.

* tag 'mlx5-updates-2022-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Expose steering dropped packets counter
  net/mlx5: Refactor and expand rep vport stat group
  net/mlx5e: multipath, support routes with more than 2 nexthops
  net/mlx5e: TC, add support for meter mtu offload
  net/mlx5e: meter, add mtu post meter tables
  net/mlx5e: meter, refactor to allow multiple post meter tables
  net/mlx5: DR, Add support for range match action
  net/mlx5: DR, Add function that tells if STE miss addr has been initialized
  net/mlx5: DR, Some refactoring of miss address handling
  net/mlx5: DR, Manage definers with refcounts
  net/mlx5: DR, Handle FT action in a separate function
  net/mlx5: DR, Rework is_fw_table function
  net/mlx5: DR, Add functions to create/destroy MATCH_DEFINER general object
  net/mlx5: fs, add match on ranges API
  net/mlx5: mlx5_ifc updates for MATCH_DEFINER general object
====================

Link: https://lore.kernel.org/r/20221209001420.142794-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c80edd8d

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 043cd1e2

Jakub Kicinski authored Dec 09, 2022

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-12-08 (ice)

Jacob Keller says:

This series of patches primarily consists of changes to fix some corner
cases that can cause Tx timestamp failures. The issues were discovered and
reported by Siddaraju DH and primarily affect E822 hardware, though this
series also includes some improvements that affect E810 hardware as well.

The primary issue is regarding the way that E822 determines when to generate
timestamp interrupts. If the driver reads timestamp indexes which do not
have a valid timestamp, the E822 interrupt tracking logic can get stuck.
This is due to the way that E822 hardware tracks timestamp index reads
internally. I was previously unaware of this behavior as it is significantly
different in E810 hardware.

Most of the fixes target refactors to ensure that the ice driver does not
read timestamp indexes which are not valid on E822 hardware. This is done by
using the Tx timestamp ready bitmap register from the PHY. This register
indicates what timestamp indexes have outstanding timestamps waiting to be
captured.

Care must be taken in all cases where we read the timestamp registers, and
thus all flows which might have read these registers are refactored. The
ice_ptp_tx_tstamp function is modified to consolidate as much of the logic
relating to these registers as possible. It now handles discarding stale
timestamps which are old or which occurred after a PHC time update. This
replaces previously standalone thread functions like the periodic work
function and the ice_ptp_flush_tx_tracker function.

In addition, some minor cleanups noticed while writing these refactors are
included.

The remaining patches refactor the E822 implementation to remove the
"bypass" mode for timestamps. The E822 hardware has the ability to provide a
more precise timestamp by making use of measurements of the precise way that
packets flow through the hardware pipeline. These measurements are known as
"Vernier" calibration. The "bypass" mode disables many of these measurements
in favor of a faster start up time for Tx and Rx timestamping. Instead, once
these measurements were captured, the driver tries to reconfigure the PHY to
enable the vernier calibrations.

Unfortunately this recalibration does not work. Testing indicates that the
PHY simply remains in bypass mode without the increased timestamp precision.
Remove the attempt at recalibration and always use vernier mode. This has
one disadvantage that Tx and Rx timestamps cannot begin until after at least
one packet of that type goes through the hardware pipeline. Because of this,
further refactor the driver to separate Tx and Rx vernier calibration.
Complete the Tx and Rx independently, enabling the appropriate type of
timestamp as soon as the relevant packet has traversed the hardware
pipeline. This was reported by Milena Olech.

Note that although these might be considered "bug fixes", the required
changes in order to appropriately resolve these issues is large. Thus it
does not feel suitable to send this series to net.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: reschedule ice_ptp_wait_for_offset_valid during reset
ice: make Tx and Rx vernier offset calibration independent
ice: only check set bits in ice_ptp_flush_tx_tracker
ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstamp
ice: cleanup allocations in ice_ptp_alloc_tx_tracker
ice: protect init and calibrating check in ice_ptp_request_ts
ice: synchronize the misc IRQ when tearing down Tx tracker
ice: check Tx timestamp memory register for ready timestamps
ice: handle discarding old Tx requests in ice_ptp_tx_tstamp
ice: always call ice_ptp_link_change and make it void
ice: fix misuse of "link err" with "link status"
ice: Reset TS memory for all quads
ice: Remove the E822 vernier "bypass" logic
ice: Use more generic names for ice_ptp_tx fields
====================

Link: https://lore.kernel.org/r/20221208213932.1274143-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

043cd1e2

09 Dec, 2022 15 commits

net: openvswitch: Add support to count upcall packets · 1933ea36

wangchuanlei authored Dec 06, 2022

Add support to count upall packets, when kmod of openvswitch
upcall to count the number of packets for upcall succeed and
failed, which is a better way to see how many packets upcalled
on every interfaces.
Signed-off-by: wangchuanlei <wangchuanlei@inspur.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1933ea36

rhashtable: Allow rhashtable to be used from irq-safe contexts · e47877c7

Tejun Heo authored Dec 06, 2022

rhashtable currently only does bh-safe synchronization making it impossible
to use from irq-safe contexts. Switch it to use irq-safe synchronization to
remove the restriction.

v2: Update the lock functions to return the ulong flags value and unlock
    functions to take the value directly instead of passing around the
    pointer. Suggested by Linus.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e47877c7

Merge branch 'net-sched-retpoline' · b602d003

David S. Miller authored Dec 09, 2022

Pedro Tammela says:

====================
net/sched: retpoline wrappers for tc

In tc all qdics, classifiers and actions can be compiled as modules.
This results today in indirect calls in all transitions in the tc hierarchy.
Due to CONFIG_RETPOLINE, CPUs with mitigations=on might pay an extra cost on
indirect calls. For newer Intel cpus with IBRS the extra cost is
nonexistent, but AMD Zen cpus and older x86 cpus still go through the
retpoline thunk.

Known built-in symbols can be optimized into direct calls, thus
avoiding the retpoline thunk. So far, tc has not been leveraging this
build information and leaving out a performance optimization for some
CPUs. In this series we wire up 'tcf_classify()' and 'tcf_action_exec()'
with direct calls when known modules are compiled as built-in as an
opt-in optimization.

We measured these changes in one AMD Zen 4 cpu (Retpoline), one AMD Zen 3 cpu (Retpoline),
one Intel 10th Gen CPU (IBRS), one Intel 3rd Gen cpu (Retpoline) and one
Intel Xeon CPU (IBRS) using pktgen with 64b udp packets. Our test setup is a
dummy device with clsact and matchall in a kernel compiled with every
tc module as built-in.  We observed a 3-8% speed up on the retpoline CPUs,
when going through 1 tc filter, and a 60-100% speed up when going through 100 filters.
For the IBRS cpus we observed a 1-2% degradation in both scenarios, we believe
the extra branches check introduced a small overhead therefore we added
a static key that bypasses the wrapper on kernels not using the retpoline mitigation,
but compiled with CONFIG_RETPOLINE.

1 filter:
CPU        | before (pps) | after (pps) | diff
R9 7950X   | 5914980      | 6380227     | +7.8%
R9 5950X   | 4237838      | 4412241     | +4.1%
R9 5950X   | 4265287      | 4413757     | +3.4%   [*]
i5-3337U   | 1580565      | 1682406     | +6.4%
i5-10210U  | 3006074      | 3006857     | +0.0%
i5-10210U  | 3160245      | 3179945     | +0.6%   [*]
Xeon 6230R | 3196906      | 3197059a     | +0.0%
Xeon 6230R | 3190392      | 3196153     | +0.01%  [*]

100 filters:
CPU        | before (pps) | after (pps) | diff
R9 7950X   | 373598       | 820396      | +119.59%
R9 5950X   | 313469       | 633303      | +102.03%
R9 5950X   | 313797       | 633150      | +101.77% [*]
i5-3337U   | 127454       | 211210      | +65.71%
i5-10210U  | 389259       | 381765      | -1.9%
i5-10210U  | 408812       | 412730      | +0.9%    [*]
Xeon 6230R | 415420       | 406612      | -2.1%
Xeon 6230R | 416705       | 405869      | -2.6%    [*]

[*] In these tests we ran pktgen with clone set to 1000.

On the 7950x system we also tested the impact of filters if iteration order
placement varied, first by compiling a kernel with the filter under test being
the first one in the static iteration and then repeating it with being last (of 15 classifiers existing today).
We saw a difference of +0.5-1% in pps between being the first in the iteration vs being the last.
Therefore we order the classifiers and actions according to relevance per our current thinking.

v5->v6:
- Address Eric Dumazet suggestions

v4->v5:
- Rebase

v3->v4:
- Address Eric Dumazet suggestions

v2->v3:
- Address suggestions by Jakub, Paolo and Eric
- Dropped RFC tag (I forgot to add it on v2)

v1->v2:
- Fix build errors found by the bots
- Address Kuniyuki Iwashima suggestions

====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b602d003

net/sched: avoid indirect classify functions on retpoline kernels · 9f3101dc

Pedro Tammela authored Dec 06, 2022

Expose the necessary tc classifier functions and wire up cls_api to use
direct calls in retpoline kernels.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f3101dc

net/sched: avoid indirect act functions on retpoline kernels · 871cf386

Pedro Tammela authored Dec 06, 2022

Expose the necessary tc act functions and wire up act_api to use
direct calls in retpoline kernels.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

871cf386

net/sched: add retpoline wrapper for tc · 7f0e8102

Pedro Tammela authored Dec 06, 2022

On kernels using retpoline as a spectrev2 mitigation,
optimize actions and filters that are compiled as built-ins into a direct call.

On subsequent patches we expose the classifiers and actions functions
and wire up the wrapper into tc.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f0e8102

net/sched: move struct action_ops definition out of ifdef · 2a7d228f

Pedro Tammela authored Dec 06, 2022

The type definition should be visible even in configurations not using
CONFIG_NET_CLS_ACT.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a7d228f

xfrm: Fix spelling mistake "oflload" -> "offload" · abe2343d

Colin Ian King authored Dec 07, 2022

There is a spelling mistake in a NL_SET_ERR_MSG message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

abe2343d

Merge branch 'mlx5 IPsec packet offload support (Part II)' · 1de8fda4

Steffen Klassert authored Dec 09, 2022

Leon Romanovsky says:

============
This is second part with implementation of packet offload.
============
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

1de8fda4

net: phy: remove redundant "depends on" lines · 0bdff115

Randy Dunlap authored Dec 06, 2022

Delete a few lines of "depends on PHYLIB" since they are inside
an "if PHYLIB / endif # PHYLIB" block, i.e., they are redundant
and the other 50+ drivers there don't use "depends on PHYLIB"
since it is not needed.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Link: https://lore.kernel.org/r/20221207044257.30036-1-rdunlap@infradead.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0bdff115

net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP · b534dc46

Willem de Bruijn authored Dec 07, 2022

Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
write_seq sockets instead of snd_una.

This should have been the behavior from the start. Because processes
may now exist that rely on the established behavior, do not change
behavior of the existing option, but add the right behavior with a new
flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on
stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID.

Intuitively the contract is that the counter is zero after the
setsockopt, so that the next write N results in a notification for
the last byte N - 1.

On idle sockets snd_una == write_seq and this holds for both. But on
sockets with data in transmission, snd_una records the unacked offset
in the stream. This depends on the ACK response from the peer. A
process cannot learn this in a race free manner (ioctl SIOCOUTQ is one
racy approach).

write_seq records the offset at the last byte written by the process.
This is a better starting point. It matches the intuitive contract in
all circumstances, unaffected by external behavior.

The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
Move the field in struct sock to avoid growing the socket (for some
common CONFIG variants). The UAPI interface so_timestamping.flags is
already int, so 32 bits wide.
Reported-by: Sotirios Delimanolis <sotodel@meta.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20221207143701.29861-1-willemdebruijn.kernel@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b534dc46

Merge branch 'fix-possible-deadlock-during-wed-attach' · ecd6df3c

Jakub Kicinski authored Dec 08, 2022

Lorenzo Bianconi says:

====================
fix possible deadlock during WED attach

Fix a possible deadlock in mtk_wed_attach if mtk_wed_wo_init routine fails.
Check wo pointer is properly allocated before running mtk_wed_wo_reset() and
mtk_wed_wo_deinit().
====================

Link: https://lore.kernel.org/r/cover.1670421354.git.lorenzo@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ecd6df3c

net: ethernet: mtk_wed: fix possible deadlock if mtk_wed_wo_init fails · 587585e1

Lorenzo Bianconi authored Dec 07, 2022

Introduce __mtk_wed_detach() in order to avoid a deadlock in
mtk_wed_attach routine if mtk_wed_wo_init fails since both
mtk_wed_attach and mtk_wed_detach run holding hw_lock mutex.

Fixes: 4c5de09e ("net: ethernet: mtk_wed: add configure wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

587585e1

net: ethernet: mtk_wed: fix some possible NULL pointer dereferences · c79e0af5

Lorenzo Bianconi authored Dec 07, 2022

Fix possible NULL pointer dereference in mtk_wed_detach routine checking
wo pointer is properly allocated before running mtk_wed_wo_reset() and
mtk_wed_wo_deinit().
Even if it is just a theoretical issue at the moment check wo pointer is
not NULL in mtk_wed_mcu_msg_update.
Moreover, honor mtk_wed_mcu_send_msg return value in mtk_wed_wo_reset()

Fixes: 79968444 ("net: ethernet: mtk_wed: introduce wed wo support")
Fixes: 4c5de09e ("net: ethernet: mtk_wed: add configure wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c79e0af5

nfp: Fix spelling mistake "tha" -> "the" · 3df96774

Colin Ian King authored Dec 07, 2022

There is a spelling mistake in a nn_dp_warn message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20221207094312.2281493-1-colin.i.king@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3df96774