Commits · d9a6f0d0df1899ff9086a57abc600e414f4b8cdd · Kirill Smelkov / linux

07 Sep, 2022 32 commits

netfilter: conntrack: prepare tcp_in_window for ternary return value · d9a6f0d0

Florian Westphal authored Aug 26, 2022

tcp_in_window returns true if the packet is in window and false if it is
not.

If its outside of window, packet will be treated as INVALID.

There are corner cases where the packet should still be tracked, because
rulesets may drop or log such packets, even though they can occur during
normal operation, such as overly delayed acks.

In extreme cases, connection may hang forever because conntrack state
differs from real state.

There is no retransmission for ACKs.

In case of ACK loss after conntrack processing, its possible that a
connection can be stuck because the actual retransmits are considered
stale ("SEQ is under the lower bound (already ACKed data
retransmitted)".

The problem is made worse by carrier-grade-nat which can also result
in stale packets from old connections to get treated as 'recent' packets
in conntrack (it doesn't support tcp timestamps at this time).

Prepare tcp_in_window() to return an enum that tells the desired
action (in-window/accept, bogus/drop).

A third action (accept the packet as in-window, but do not change
state) is added in a followup patch.
Signed-off-by: Florian Westphal <fw@strlen.de>

d9a6f0d0

Merge branch 'macsec-offload-mlx5' · 016eb590

David S. Miller authored Sep 07, 2022

Saeed Mahameed says:

====================
Introduce MACsec skb_metadata_dst and mlx5 macsec offload

v1->v2:
   - attach mlx5 implementation patches.

This patchset introduces MACsec skb_metadata_dst to lay the ground
for MACsec HW offload.

MACsec is an IEEE standard (IEEE 802.1AE) for MAC security.
It defines a way to establish a protocol independent connection
between two hosts with data confidentiality, authenticity and/or
integrity, using GCM-AES. MACsec operates on the Ethernet layer and
as such is a layer 2 protocol, which means it’s designed to secure
traffic within a layer 2 network, including DHCP or ARP requests.

Linux has a software implementation of the MACsec standard and
HW offloading support.
The offloading is re-using the logic, netlink API and data
structures of the existing MACsec software implementation.

For Tx:
In the current MACsec offload implementation, MACsec interfaces shares
the same MAC address by default.
Therefore, HW can't distinguish from which MACsec interface the traffic
originated from.

MACsec stack will use skb_metadata_dst to store the SCI value, which is
unique per MACsec interface, skb_metadat_dst will be used later by the
offloading device driver to associate the SKB with the corresponding
offloaded interface (SCI) to facilitate HW MACsec offload.

For Rx:
Like in the Tx changes, if there are more than one MACsec device with
the same MAC address as in the packet's destination MAC, the packet will
be forward only to one of the devices and not neccessarly to the desired one.

Offloading device driver sets the MACsec skb_metadata_dst sci
field with the appropriaate Rx SCI for each SKB so the MACsec rx handler
will know to which port to divert those skbs, instead of wrongly solely
relaying on dst MAC address comparison.

1) patch 1,2, Add support to skb_metadata_dst in MACsec code:
net/macsec: Add MACsec skb_metadata_dst Tx Data path support
net/macsec: Add MACsec skb_metadata_dst Rx Data path support

2) patch 3, Move some MACsec driver code for sharing with various
drivers that implements offload:
net/macsec: Move some code for sharing with various drivers that
implements offload

3) The rest of the patches introduce mlx5 implementation for macsec
offloads TX and RX via steering tables.
  a) TX, intercept skbs with macsec offlad mark in skb_metadata_dst and mark
the descriptor for offload.
  b) RX, intercept offloaded frames and prepare the proper
skb_metadata_dst to mark offloaded rx frames.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

016eb590

net/mlx5e: Add support to configure more than one macsec offload device · 99d4dc66

Lior Nahmanson authored Sep 05, 2022

Add the ability to add up to 16 MACsec offload interfaces
over the same physical interface
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99d4dc66

net/mlx5e: Add MACsec stats support for Rx/Tx flows · 807a1b76

Lior Nahmanson authored Sep 05, 2022

Add the following statistics:
RX successfully decrypted MACsec packets:
macsec_rx_pkts : Number of packets decrypted successfully
macsec_rx_bytes : Number of bytes decrypted successfully

Rx dropped MACsec packets:
macsec_rx_pkts_drop : Number of MACsec packets dropped
macsec_rx_bytes_drop : Number of MACsec bytes dropped

TX successfully encrypted MACsec packets:
macsec_tx_pkts : Number of packets encrypted/authenticated successfully
macsec_tx_bytes : Number of bytes encrypted/authenticated successfully

Tx dropped MACsec packets:
macsec_tx_pkts_drop : Number of MACsec packets dropped
macsec_tx_bytes_drop : Number of MACsec bytes dropped

The above can be seen using:
ethtool -S <ifc> |grep macsec
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

807a1b76

net/mlx5e: Add MACsec offload SecY support · 5a39816a

Lior Nahmanson authored Sep 05, 2022

Add offload support for MACsec SecY callbacks - add/update/delete.
add_secy is called when need to create a new MACsec interface.
upd_secy is called when source MAC address or tx SC was changed.
del_secy is called when need to destroy the MACsec interface.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a39816a

net/mlx5e: Implement MACsec Rx data path using MACsec skb_metadata_dst · b7c9400c

Lior Nahmanson authored Sep 05, 2022

MACsec driver need to distinguish to which offload device the MACsec
is target to, in order to handle them correctly.
This can be done by attaching a metadata_dst to a SKB with a SCI,
when there is a match on MACsec rule.
To achieve that, there is a map between fs_id to SCI, so for each RX SC,
there is a unique fs_id allocated when creating RX SC.
fs_id passed to device driver as metadata for packets that passed Rx
MACsec offload to aid the driver to retrieve the matching SCI.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b7c9400c

net/mlx5e: Add MACsec RX steering rules · 3b20949c

Lior Nahmanson authored Sep 05, 2022

Rx flow steering consists of two flow tables (FTs).

The first FT (crypto table) have one default miss rule so non MACsec
offloaded packets bypass the MACSec tables.
All others flow table entries (FTEs) are divided to two equal groups
size, both of them are for MACsec packets:
The first group is for MACsec packets which contains SCI field in the
SecTAG header.
The second group is for MACsec packets which doesn't contain SCI,
where need to match on the source MAC address (only if the SCI
is built from default MACsec port).
Destination MAC address, ethertype and some of SecTAG fields
are also matched for both groups.
In case of match, invoke decrypt action on the packet.
For each MACsec Rx offloaded SA two rules are created: one with SCI
and one without SCI.

The second FT (check table) has two fixed rules:
One rule is for verifying that the previous offload actions were
finished successfully.
In this case, need to decap the SecTAG header and forward the packet
for further processing.
Another default rule for dropping packets that failed in the previous
decrypt actions.

The MACsec FTs are created on demand when the first MACsec rule is
added and destroyed when the last MACsec rule is deleted.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b20949c

net/mlx5: Add MACsec Rx tables support to fs_core · 15d187e2

Lior Nahmanson authored Sep 05, 2022

Add new namespace for MACsec RX flows.
Encrypted MACsec packets should be first decrypted and stripped
from MACsec header and then continues with the kernel's steering
pipeline.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15d187e2

net/mlx5e: Add MACsec offload Rx command support · aae3454e

Lior Nahmanson authored Sep 05, 2022

Add a support for Connect-X MACsec offload Rx SA & SC commands:
add, update and delete.

SCs are created on demend and aren't limited by number and unique by SCI.
Each Rx SA must be associated with Rx SC according to SCI.

Follow-up patches will implement the Rx steering.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aae3454e

net/mlx5e: Implement MACsec Tx data path using MACsec skb_metadata_dst · 9515978e

Lior Nahmanson authored Sep 05, 2022

MACsec driver marks Tx packets for device offload using a dedicated
skb_metadata_dst which holds a 64 bits SCI number.
A previously set rule will match on this number so the correct SA is used
for the MACsec operation.
As device driver can only provide 32 bits of metadata to flow tables,
need to used a mapping from 64 bit to 32 bits marker or id,
which is can be achieved by provide a 32 bit unique flow id in the
control path, and used a hash table to map 64 bit to the unique id in the
datapath.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9515978e

net/mlx5e: Add MACsec TX steering rules · e467b283

Lior Nahmanson authored Sep 05, 2022

Tx flow steering consists of two flow tables (FTs).

The first FT (crypto table) has two fixed rules:
One default miss rule so non MACsec offloaded packets bypass the MACSec
tables, another rule to make sure that MACsec key exchange (MKE) traffic
passes unencrypted as expected (matched of ethertype).
On each new MACsec offload flow, a new MACsec rule is added.
This rule is matched on metadata_reg_a (which contains the id of the
flow) and invokes the MACsec offload action on match.

The second FT (check table) has two fixed rules:
One rule for verifying that the previous offload actions were
finished successfully and packet need to be transmitted.
Another default rule for dropping packets that were failed in the
offload actions.

The MACsec FTs should be created on demand when the first MACsec rule is
added and destroyed when the last MACsec rule is deleted.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e467b283

net/mlx5: Add MACsec Tx tables support to fs_core · ee534d7f

Lior Nahmanson authored Sep 05, 2022

Changed EGRESS_KERNEL namespace to EGRESS_IPSEC and add new
namespace for MACsec TX.
This namespace should be the last namespace for transmitted packets.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee534d7f

net/mlx5: Add MACsec offload Tx command support · 8ff0ac5b

Lior Nahmanson authored Sep 05, 2022

This patch adds support for Connect-X MACsec offload Tx SA commands:
add, update and delete.

In Connect-X MACsec, a Security Association (SA) is added or deleted
via allocating a HW context of an encryption/decryption key and
a HW context of a matching SA (MACsec object).

When new SA is added:
- Use a separate crypto key HW context.
- Create a separate MACsec context in HW to include the SA properties.

Introduce a new compilation flag MLX5_EN_MACSEC for it.

Follow-up patches will implement the Tx steering.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ff0ac5b

net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures · 8385c51f

Lior Nahmanson authored Sep 05, 2022

Add MACsec offload related IFC structs, layouts and enumerations.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8385c51f

net/mlx5: Generalize Flow Context for new crypto fields · e227ee99

Lior Nahmanson authored Sep 05, 2022

In order to support MACsec offload (and maybe some other crypto features
in the future), generalize flow action parameters / defines to be used by
crypto offlaods other than IPsec.
The following changes made:
ipsec_obj_id field at flow action context was changed to crypto_obj_id,
intreduced a new crypto_type field where IPsec is the default zero type
for backward compatibility.
Action ipsec_decrypt was changed to crypto_decrypt.
Action ipsec_encrypt was changed to crypto_encrypt.

IPsec offload code was updated accordingly for backward compatibility.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e227ee99

net/mlx5: Removed esp_id from struct mlx5_flow_act · d1b2234b

Lior Nahmanson authored Sep 05, 2022

esp_id is no longer in used
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1b2234b

net/macsec: Move some code for sharing with various drivers that implements offload · b1671253

Lior Nahmanson authored Sep 05, 2022

Move some MACsec infrastructure like defines and functions,
in order to avoid code duplication for future drivers which
implements MACsec offload.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ben Ben-Ishay <benishay@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1671253

net/macsec: Add MACsec skb_metadata_dst Rx Data path support · 860ead89

Lior Nahmanson authored Sep 05, 2022

Like in the Tx changes, if there are more than one MACsec device with
the same MAC address as in the packet's destination MAC, the packet will
be forward only to this device and not neccessarly to the desired one.

Offloading device drivers will mark offloaded MACsec SKBs with the
corresponding SCI in the skb_metadata_dst so the macsec rx handler will
know to which port to divert those skbs, instead of wrongly solely
relaying on dst MAC address comparison.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

860ead89

net/macsec: Add MACsec skb_metadata_dst Tx Data path support · 0a28bfd4

Lior Nahmanson authored Sep 05, 2022

In the current MACsec offload implementation, MACsec interfaces shares
the same MAC address by default.
Therefore, HW can't distinguish from which MACsec interface the traffic
originated from.

MACsec stack will use skb_metadata_dst to store the SCI value, which is
unique per Macsec interface, skb_metadat_dst will be used by the
offloading device driver to associate the SKB with the corresponding
offloaded interface (SCI).
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a28bfd4

Merge branch 'netlink-be-policy' · da7d8e65
David S. Miller authored Sep 07, 2022
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
da7d8e65

netfilter: nft_payload: reject out-of-range attributes via policy · e7af210e

Florian Westphal authored Sep 05, 2022

Now that nla_policy allows range checks for bigendian data make use of
this to reject such attributes.  At this time, reject happens later
from the init or select_ops callbacks, but its prone to errors.

In the future, new attributes can be handled via NLA_POLICY_MAX_BE
and exiting ones can be converted one by one.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7af210e

netlink: introduce NLA_POLICY_MAX_BE · 08724ef6

Florian Westphal authored Sep 05, 2022

netlink allows to specify allowed ranges for integer types.
Unfortunately, nfnetlink passes integers in big endian, so the existing
NLA_POLICY_MAX() cannot be used.

At the moment, nfnetlink users, such as nf_tables, need to resort to
programmatic checking via helpers such as nft_parse_u32_check().

This is both cumbersome and error prone.  This adds NLA_POLICY_MAX_BE
which adds range check support for BE16, BE32 and BE64 integers.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

08724ef6

Merge branch 'sfc-ptp' · 98ba8108

David S. Miller authored Sep 07, 2022

Edward Cree says:

====================
sfc: add support for PTP over IPv6 and 802.3
Most recent cards (8000 series and newer) had enough hardware support
for this, but it was not enabled in the driver. The transmission of PTP
packets over these protocols was already added in commit bd4a2697
("sfc: use hardware tx timestamps for more than PTP"), but receiving
them was already unsupported so synchronization didn't happen.

These patches add support for timestamping received packets over
IPv6/UPD and IEEE802.3.

v2: fixed weird indentation in efx_ptp_init_filter
v3: fixed bug caused by usage of htons in PTP_EVENT_PORT definition.
    It was used in more places, where htons was used too, so using it
    2 times leave it again in host order. I didn't detected it in my
    tests because it only affected if timestamping through the MC, but
    the model I used do it through the MAC. Detected by kernel test
    robot <lkp@intel.com>
v4: removed `inline` specifiers from 2 local functions
v5: restored deleted comment with useful explanation about packets
    reordering. Deleted useless whitespaces.
====================
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98ba8108

sfc: support PTP over Ethernet · e4616f64

Íñigo Huguet authored Sep 05, 2022

The previous patch add support for PTP over IPv6/UDP (only for 8000
series and newer) and this one add support for PTP over 802.3.

Tested: sync as master and as slave is correct with ptp4l. PTP over IPv4
and IPv6 still works fine.
Suggested-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4616f64

sfc: support PTP over IPv6/UDP · 621918c4

Íñigo Huguet authored Sep 05, 2022

commit bd4a2697 ("sfc: use hardware tx timestamps for more than
PTP") added support for hardware timestamping on TX for cards of the
8000 series and newer, in an effort to provide support for other
transports other than IPv4/UDP.

However, timestamping was still not working on RX for these other
transports. This patch add support for PTP over IPv6/UDP.

Tested: sync as master and as slave is correct using ptp4l from linuxptp
package, both with IPv4 and IPv6.
Suggested-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

621918c4

sfc: allow more flexible way of adding filters for PTP · 313aa13a

Íñigo Huguet authored Sep 05, 2022

In preparation for the support of PTP over IPv6/UDP and Ethernet in next
patches, allow a more flexible way of adding and removing RX filters for
PTP. Right now, only 2 filters are allowed, which are the ones needed
for PTP over IPv4/UDP.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

313aa13a

net: dsa: LAN9303: Add basic support for LAN9354 · 13248b97

Jerry Ray authored Sep 02, 2022

Adding support for the LAN9354 device by allowing it to use
the LAN9303 DSA driver.  These devices have the same underlying
access and control methods and from a feature set point of view
the LAN9354 is a superset of the LAN9303.

The MDIO access method has been tested on a SAMA5D3-EDS board
with a LAN9354 RMII daughter card.

While the SPI access method should also be the same, it has not
been tested and as such is not included at this time.
Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

13248b97

net: dsa: LAN9303: Add early read to sync · 732f374e

Jerry Ray authored Sep 02, 2022

Add initial BYTE_ORDER read to sync the 32-bit accesses over the 16-bit
mdio bus to improve driver robustness.

The lan9303 expects two mdio read transactions back-to-back to read a
32-bit register. The first read transaction causes the other half of the
32-bit register to get latched.  The subsequent read returns the latched
second half of the 32-bit read. The BYTE_ORDER register is an exception to
this rule. As it is a constant value, there is no need to latch the second
half. We read this register first in case there were reads during the boot
loader process that might have occurred prior to this driver taking over
ownership of accessing this device.

This patch has been tested on the SAMA5D3-EDS with a LAN9303 RMII daughter
card.
Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

732f374e

net: dsa: microchip: add regmap_range for KSZ9896 chip · 6674e7fd

Romain Naour authored Sep 02, 2022

Add register validation for KSZ9896.
Signed-off-by: Romain Naour <romain.naour@skf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6674e7fd

net: dsa: microchip: ksz9477: remove 0x033C and 0x033D addresses from regmap_access_tables · 3a8b8ea6

Romain Naour authored Sep 02, 2022

According to the KSZ9477S datasheet, there is no global register
at 0x033C and 0x033D addresses.
Signed-off-by: Romain Naour <romain.naour@skf.com>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a8b8ea6

net: dsa: microchip: add KSZ9896 to KSZ9477 I2C driver · 13767525

Romain Naour authored Sep 02, 2022

Add support for the KSZ9896 6-port Gigabit Ethernet Switch to the
ksz9477 driver. The KSZ9896 supports both SPI (already in) and I2C.
Signed-off-by: Romain Naour <romain.naour@skf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

13767525

net: dsa: microchip: add KSZ9896 switch support · 2eb3ff3c

Romain Naour authored Sep 02, 2022

Add support for the KSZ9896 6-port Gigabit Ethernet Switch to the
ksz9477 driver.

Although the KSZ9896 is already listed in the device tree binding
documentation since a1c0ed24 (dt-bindings: net: dsa: document
additional Microchip KSZ9477 family switches) the chip id
(0x00989600) is not recognized by ksz_switch_detect() and rejected
by the driver.

The KSZ9896 is similar to KSZ9897 but has only one configurable
MII/RMII/RGMII/GMII cpu port.
Signed-off-by: Romain Naour <romain.naour@skf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2eb3ff3c

06 Sep, 2022 4 commits

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 2786bcff

Paolo Abeni authored Sep 06, 2022

Daniel Borkmann says:

====================
pull-request: bpf-next 2022-09-05

The following pull-request contains BPF updates for your *net-next* tree.

We've added 106 non-merge commits during the last 18 day(s) which contain
a total of 159 files changed, 5225 insertions(+), 1358 deletions(-).

There are two small merge conflicts, resolve them as follows:

1) tools/testing/selftests/bpf/DENYLIST.s390x

  Commit 27e23836 ("selftests/bpf: Add lru_bug to s390x deny list") in
  bpf tree was needed to get BPF CI green on s390x, but it conflicted with
  newly added tests on bpf-next. Resolve by adding both hunks, result:

  [...]
  lru_bug                                  # prog 'printk': failed to auto-attach: -524
  setget_sockopt                           # attach unexpected error: -524                                               (trampoline)
  cb_refs                                  # expected error message unexpected error: -524                               (trampoline)
  cgroup_hierarchical_stats                # JIT does not support calling kernel function                                (kfunc)
  htab_update                              # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
  [...]

2) net/core/filter.c

  Commit 1227c177 ("net: Fix data-races around sysctl_[rw]mem_(max|default).")
  from net tree conflicts with commit 29003875 ("bpf: Change bpf_setsockopt(SOL_SOCKET)
  to reuse sk_setsockopt()") from bpf-next tree. Take the code as it is from
  bpf-next tree, result:

  [...]
	if (getopt) {
		if (optname == SO_BINDTODEVICE)
			return -EINVAL;
		return sk_getsockopt(sk, SOL_SOCKET, optname,
				     KERNEL_SOCKPTR(optval),
				     KERNEL_SOCKPTR(optlen));
	}

	return sk_setsockopt(sk, SOL_SOCKET, optname,
			     KERNEL_SOCKPTR(optval), *optlen);
  [...]

The main changes are:

1) Add any-context BPF specific memory allocator which is useful in particular for BPF
   tracing with bonus of performance equal to full prealloc, from Alexei Starovoitov.

2) Big batch to remove duplicated code from bpf_{get,set}sockopt() helpers as an effort
   to reuse the existing core socket code as much as possible, from Martin KaFai Lau.

3) Extend BPF flow dissector for BPF programs to just augment the in-kernel dissector
   with custom logic. In other words, allow for partial replacement, from Shmulik Ladkani.

4) Add a new cgroup iterator to BPF with different traversal options, from Hao Luo.

5) Support for BPF to collect hierarchical cgroup statistics efficiently through BPF
   integration with the rstat framework, from Yosry Ahmed.

6) Support bpf_{g,s}et_retval() under more BPF cgroup hooks, from Stanislav Fomichev.

7) BPF hash table and local storages fixes under fully preemptible kernel, from Hou Tao.

8) Add various improvements to BPF selftests and libbpf for compilation with gcc BPF
   backend, from James Hilliard.

9) Fix verifier helper permissions and reference state management for synchronous
   callbacks, from Kumar Kartikeya Dwivedi.

10) Add support for BPF selftest's xskxceiver to also be used against real devices that
    support MAC loopback, from Maciej Fijalkowski.

11) Various fixes to the bpf-helpers(7) man page generation script, from Quentin Monnet.

12) Document BPF verifier's tnum_in(tnum_range(), ...) gotchas, from Shung-Hsi Yu.

13) Various minor misc improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (106 commits)
  bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.
  bpf: Remove usage of kmem_cache from bpf_mem_cache.
  bpf: Remove prealloc-only restriction for sleepable bpf programs.
  bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.
  bpf: Remove tracing program restriction on map types
  bpf: Convert percpu hash map to per-cpu bpf_mem_alloc.
  bpf: Add percpu allocation support to bpf_mem_alloc.
  bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.
  bpf: Adjust low/high watermarks in bpf_mem_cache
  bpf: Optimize call_rcu in non-preallocated hash map.
  bpf: Optimize element count in non-preallocated hash map.
  bpf: Relax the requirement to use preallocated hash maps in tracing progs.
  samples/bpf: Reduce syscall overhead in map_perf_test.
  selftests/bpf: Improve test coverage of test_maps
  bpf: Convert hash map to bpf_mem_alloc.
  bpf: Introduce any context BPF specific memory allocator.
  selftest/bpf: Add test for bpf_getsockopt()
  bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt()
  bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt()
  bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt()
  ...
====================

Link: https://lore.kernel.org/r/20220905161136.9150-1-daniel@iogearbox.netSigned-off-by: Paolo Abeni <pabeni@redhat.com>

2786bcff

net: moxa: fix endianness-related issues from 'sparse' · 03fdb11d

Sergei Antonov authored Sep 02, 2022

Sparse checker found two endianness-related issues:

.../moxart_ether.c:34:15: warning: incorrect type in assignment (different base types)
.../moxart_ether.c:34:15:    expected unsigned int [usertype]
.../moxart_ether.c:34:15:    got restricted __le32 [usertype]

.../moxart_ether.c:39:16: warning: cast to restricted __le32

Fix them by using __le32 type instead of u32.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Link: https://lore.kernel.org/r/20220902125037.1480268-1-saproj@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

03fdb11d

net: ftmac100: fix endianness-related issues from 'sparse' · 9df696b3

Sergei Antonov authored Sep 02, 2022

Sparse found a number of endianness-related issues of these kinds:

.../ftmac100.c:192:32: warning: restricted __le32 degrades to integer

.../ftmac100.c:208:23: warning: incorrect type in assignment (different base types)
.../ftmac100.c:208:23:    expected unsigned int rxdes0
.../ftmac100.c:208:23:    got restricted __le32 [usertype]

.../ftmac100.c:249:23: warning: invalid assignment: &=
.../ftmac100.c:249:23:    left side has type unsigned int
.../ftmac100.c:249:23:    right side has type restricted __le32

.../ftmac100.c:527:16: warning: cast to restricted __le32

Change type of some fields from 'unsigned int' to '__le32' to fix it.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220902113749.1408562-1-saproj@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

9df696b3

net: lan966x: Extend lan966x with RGMII support · d5edc797

Horatiu Vultur authored Sep 02, 2022

Extend lan966x with RGMII support. The MAC supports all RGMII_* modes.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220902111548.614525-1-horatiu.vultur@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

d5edc797

05 Sep, 2022 4 commits

r8169: remove not needed net_ratelimit() check · 96efd6d0

Heiner Kallweit authored Sep 03, 2022

We're not in a hot path and don't want to miss this message,
therefore remove the net_ratelimit() check.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96efd6d0

netlink: Bounds-check struct nlmsgerr creation · 710d21fd

Kees Cook authored Sep 02, 2022

In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(),
switch from __nlmsg_put to nlmsg_put(), and explain the bounds check
for dealing with the memcpy() across a composite flexible array struct.
Avoids this future run-time warning:

  memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16)

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: syzbot <syzkaller@googlegroups.com>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.orgSigned-off-by: David S. Miller <davem@davemloft.net>

710d21fd

Merge branch 'bpf-allocator' · 274052a2

Daniel Borkmann authored Sep 05, 2022

Alexei Starovoitov says:

====================
Introduce any context BPF specific memory allocator.

Tracing BPF programs can attach to kprobe and fentry. Hence they run in
unknown context where calling plain kmalloc() might not be safe. Front-end
kmalloc() with per-cpu cache of free elements. Refill this cache asynchronously
from irq_work.

Major achievements enabled by bpf_mem_alloc:
- Dynamically allocated hash maps used to be 10 times slower than fully
  preallocated. With bpf_mem_alloc and subsequent optimizations the speed
  of dynamic maps is equal to full prealloc.
- Tracing bpf programs can use dynamically allocated hash maps. Potentially
  saving lots of memory. Typical hash map is sparsely populated.
- Sleepable bpf programs can used dynamically allocated hash maps.

Future work:
- Expose bpf_mem_alloc as uapi FD to be used in dynptr_alloc, kptr_alloc
- Convert lru map to bpf_mem_alloc
- Further cleanup htab code. Example: htab_use_raw_lock can be removed.

Changelog:

v5->v6:
- Debugged the reason for selftests/bpf/test_maps ooming in a small VM that BPF CI is using.
  Added patch 16 that optimizes the usage of rcu_barrier-s between bpf_mem_alloc and
  hash map. It drastically improved the speed of htab destruction.

v4->v5:
- Fixed missing migrate_disable in hash tab free path (Daniel)
- Replaced impossible "memory leak" with WARN_ON_ONCE (Martin)
- Dropped sysctl kernel.bpf_force_dyn_alloc patch (Daniel)
- Added Andrii's ack
- Added new patch 15 that removes kmem_cache usage from bpf_mem_alloc.
  It saves memory, speeds up map create/destroy operations
  while maintains hash map update/delete performance.

v3->v4:
- fix build issue due to missing local.h on 32-bit arch
- add Kumar's ack
- proposal for next steps from Delyan:
https://lore.kernel.org/bpf/d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com/

v2->v3:
- Rewrote the free_list algorithm based on discussions with Kumar. Patch 1.
- Allowed sleepable bpf progs use dynamically allocated maps. Patches 13 and 14.
- Added sysctl to force bpf_mem_alloc in hash map even if pre-alloc is
  requested to reduce memory consumption. Patch 15.
- Fix: zero-fill percpu allocation
- Single rcu_barrier at the end instead of each cpu during bpf_mem_alloc destruction

v2 thread:
https://lore.kernel.org/bpf/20220817210419.95560-1-alexei.starovoitov@gmail.com/

v1->v2:
- Moved unsafe direct call_rcu() from hash map into safe place inside bpf_mem_alloc. Patches 7 and 9.
- Optimized atomic_inc/dec in hash map with percpu_counter. Patch 6.
- Tuned watermarks per allocation size. Patch 8
- Adopted this approach to per-cpu allocation. Patch 10.
- Fully converted hash map to bpf_mem_alloc. Patch 11.
- Removed tracing prog restriction on map types. Combination of all patches and final patch 12.

v1 thread:
https://lore.kernel.org/bpf/20220623003230.37497-1-alexei.starovoitov@gmail.com/

LWN article:
https://lwn.net/Articles/899274/
====================

Link: https://lore.kernel.org/r/Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

274052a2

bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc. · 9f2c6e96

Alexei Starovoitov authored Sep 02, 2022

User space might be creating and destroying a lot of hash maps. Synchronous
rcu_barrier-s in a destruction path of hash map delay freeing of hash buckets
and other map memory and may cause artificial OOM situation under stress.
Optimize rcu_barrier usage between bpf hash map and bpf_mem_alloc:
- remove rcu_barrier from hash map, since htab doesn't use call_rcu
  directly and there are no callback to wait for.
- bpf_mem_alloc has call_rcu_in_progress flag that indicates pending callbacks.
  Use it to avoid barriers in fast path.
- When barriers are needed copy bpf_mem_alloc into temp structure
  and wait for rcu barrier-s in the worker to let the rest of
  hash map freeing to proceed.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220902211058.60789-17-alexei.starovoitov@gmail.com

9f2c6e96