Commits · cb675afcddbbeb2bfa6596e3bc236bc026cd425f · Kirill Smelkov / linux

03 Apr, 2023 27 commits

net: dsa: mt7530: introduce separate MDIO driver · cb675afc

Daniel Golle authored Apr 03, 2023

Split MT7530 switch driver into a common part and a part specific
for MDIO connected switches and multi-chip modules.
Move MDIO-specific functions to newly introduced mt7530-mdio.c while
keeping the common parts in mt7530.c.
Introduce new Kconfig symbol CONFIG_NET_DSA_MT7530_MDIO which is
implied by CONFIG_NET_DSA_MT7530.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb675afc

net: dsa: mt7530: split-off common parts from mt7531_setup · 7f54cc97

Daniel Golle authored Apr 03, 2023

MT7988 shares a significant part of the setup function with MT7531.
Split-off those parts into a shared function which is going to be used
also by mt7988_setup.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f54cc97

net: dsa: mt7530: introduce mt7530_remove_common helper function · 720d7363

Daniel Golle authored Apr 03, 2023

Move commonly used parts from mt7530_remove into new
mt7530_remove_common helper function which will be used by both,
mt7530_remove and the to-be-introduced mt7988_remove.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

720d7363

net: dsa: mt7530: introduce mt7530_probe_common helper function · 37c9c0d8

Daniel Golle authored Apr 03, 2023

Move commonly used parts from mt7530_probe into new mt7530_probe_common
helper function which will be used by both, mt7530_probe and the
to-be-introduced mt7988_probe.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

37c9c0d8

net: dsa: mt7530: move p5_intf_modes() function to mt7530.c · 25d15dee

Daniel Golle authored Apr 03, 2023

In preparation of splitting mt7530.c into a driver for MDIO-connected
as well as MDIO-accessed built-in switches on one hand and MMIO-accessed
built-in switches move the p5_inft_modes() function from mt7530.h to
mt7530.c. The function is only needed there and will trigger a compiler
warning about a defined but unused function otherwise when including
mt7530.h in the to-be-introduced bus-specific drivers.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

25d15dee

net: dsa: mt7530: introduce mutex helpers · 1557c679

Daniel Golle authored Apr 03, 2023

As the MDIO bus lock only needs to be involved if actually operating
on an MDIO-connected switch we will need to skip locking for built-in
switches which are accessed via MMIO.
Create helper functions which simplify that upcoming change.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

1557c679

net: dsa: mt7530: move SGMII PCS creation to mt7530_probe function · 6de28522

Daniel Golle authored Apr 03, 2023

Move creating the SGMII PCS from mt753x_setup() to the more appropriate
mt7530_probe() function.
This is done also in preparation of moving all functions related to
MDIO-connected MT753x switches to a separate module.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6de28522

net: dsa: mt7530: use regmap to access switch register space · a08c0455

Daniel Golle authored Apr 03, 2023

Use regmap API to access the switch register space.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a08c0455

net: dsa: mt7530: use unlocked regmap accessors · 1bd099c4

Daniel Golle authored Apr 03, 2023

Instead of wrapping the locked register accessor functions, use the
unlocked variants and add locking wrapper functions to let regmap
handle the locking.

This is a preparation towards being able to always use regmap to
access switch registers instead of open-coded accessor functions.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bd099c4

net: dsa: mt7530: refactor SGMII PCS creation · 9ecc0016

Daniel Golle authored Apr 03, 2023

Instead of macro templates use a dedidated function and allocated
regmap_config when creating the regmaps for the pcs-mtk-lynxi
instances.
This is in preparation to switching to use unlocked regmap accessors
and have regmap's locking API handle locking for us.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ecc0016

net: dsa: mt7530: make some noise if register read fails · b6f56cdd

Daniel Golle authored Apr 03, 2023

Simply returning the negative error value instead of the read value
doesn't seem like a good idea. Return 0 instead and add WARN_ON_ONCE(1)
so this kind of error will not go unnoticed.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6f56cdd

Merge branch 'phy-smsc-edpd-tunable' · 56b029dd

David S. Miller authored Apr 03, 2023

Heiner Kallweit says:

====================
net: phy: smsc: add support for edpd tunable

This adds support for the EDPD PHY tunable.
Per default EDPD is disabled in interrupt mode, the tunable can be used
to override this, e.g. if the link partner doesn't use EDPD.
The interval to check for energy can be chosen between 1000ms and
2000ms. Note that this value consists of the 1000ms phylib interval
for state machine runs plus the time to wait for energy being detected.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

56b029dd

net: phy: smsc: enable edpd tunable support · 3c4c3b3e

Heiner Kallweit authored Apr 02, 2023

Enable EDPD PHY tunable support for all drivers using
lan87xx_read_status.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c4c3b3e

net: phy: smsc: add support for edpd tunable · 657de1cf

Heiner Kallweit authored Apr 02, 2023

This adds support for the EDPD PHY tunable.
Per default EDPD is disabled in interrupt mode, the tunable can be used
to override this, e.g. if the link partner doesn't use EDPD.
The interval to check for energy can be chosen between 1000ms and
2000ms. Note that this value consists of the 1000ms phylib interval
for state machine runs plus the time to wait for energy being detected.

v2:
- consider that phylib core holds phydev->lock when calling the
  phy tunable hooks
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

657de1cf

net: phy: smsc: prepare for making edpd wait period configurable · 1ce65869

Heiner Kallweit authored Apr 02, 2023

Add a member edpd_max_wait_ms to the private data structure in preparation
of making the wait period configurable by supporting the edpd phy tunable.

v2:
- rename constant to EDPD_MAX_WAIT_DFLT_MS
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ce65869

net: phy: smsc: add flag edpd_mode_set_by_user · a6205110

Heiner Kallweit authored Apr 02, 2023

Add flag edpd_mode_set_by_user in preparation of adding edpd phy tunable
support. This flag will allow users to override the default behavior
of edpd being disabled if interrupt mode is used.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6205110

net: phy: smsc: clear edpd_enable if interrupt mode is used · d56417ad

Heiner Kallweit authored Apr 02, 2023

Clear edpd_enable if interupt mode is used, this avoids
having to check for PHY_POLL multiple times.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d56417ad

net: phy: smsc: add helper smsc_phy_config_edpd · 89946e31

Heiner Kallweit authored Apr 02, 2023

Add helper smsc_phy_config_edpd() and explicitly clear bit
MII_LAN83C185_EDPWRDOWN is edpd_enable isn't set.
Boot loader may have left whatever value.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

89946e31

net: phy: smsc: rename flag energy_enable · fc281d78

Heiner Kallweit authored Apr 02, 2023

Rename the flag to edpd_enable, as we're not enabling energy but
edpd (energy detect power down) mode. In addition change the
type to a bit field member in preparation of adding further flags.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc281d78

Merge branch 'dsa_master_ioctl-notifier' · 858e5b06

David S. Miller authored Apr 03, 2023

Vladimir Oltean says:

====================
net: Convert dsa_master_ioctl() to netdev notifier

This is preparatory work in order for Maxim Georgiev to be able to start
the API conversion process of hardware timestamping from ndo_eth_ioctl()
to ndo_hwtstamp_set():
https://lore.kernel.org/netdev/20230331045619.40256-1-glipus@gmail.com/

In turn, Maxim Georgiev's work is a preparation so that Köry Maincent is
able to make the active hardware timestamping layer selectable by user
space.
https://lore.kernel.org/netdev/20230308135936.761794-1-kory.maincent@bootlin.com/

So, quite some dependency chain.

Before this patch set, DSA prevented the conversion of any networking
driver from the ndo_eth_ioctl() API to the ndo_hwtstamp_set() API,
because it wanted to validate the hwtstamping settings on the DSA
master, and it was only coded up to do this using the old API.

After this patch set, a new netdev notifier exists, which does not
depend on anything that would constitute the "soon-to-be-legacy" API,
but rather, it uses a newly introduced struct kernel_hwtstamp_config,
and it doesn't issue any ioctl at all, being thus compatible both with
ndo_eth_ioctl(), and with the not-yet-introduced, but now possible,
ndo_hwtstamp_set().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

858e5b06

net: create a netdev notifier for DSA to reject PTP on DSA master · 88c0a6b5

Vladimir Oltean authored Apr 02, 2023

The fact that PTP 2-step TX timestamping is broken on DSA switches if
the master also timestamps the same packets is documented by commit
f685e609 ("net: dsa: Deny PTP on master if switch supports it").
We attempt to help the users avoid shooting themselves in the foot by
making DSA reject the timestamping ioctls on an interface that is a DSA
master, and the switch tree beneath it contains switches which are aware
of PTP.

The only problem is that there isn't an established way of intercepting
ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network
stack by creating a struct dsa_netdevice_ops with overlaid function
pointers that are manually checked from the relevant call sites. There
used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only
one left.

There is an ongoing effort to migrate driver-visible hardware timestamping
control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set()
model, but DSA actively prevents that migration, since dsa_master_ioctl()
is currently coded to manually call the master's legacy ndo_eth_ioctl(),
and so, whenever a network device driver would be converted to the new
API, DSA's restrictions would be circumvented, because any device could
be used as a DSA master.

The established way for unrelated modules to react on a net device event
is via netdevice notifiers. So we create a new notifier which gets
called whenever there is an attempt to change hardware timestamping
settings on a device.

Finally, there is another reason why a netdev notifier will be a good
idea, besides strictly DSA, and this has to do with PHY timestamping.

With ndo_eth_ioctl(), all MAC drivers must manually call
phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP,
otherwise they must pass this ioctl to the PHY driver via
phy_mii_ioctl().

With the new ndo_hwtstamp_set() API, it will be desirable to simply not
make any calls into the MAC device driver when timestamping should be
performed at the PHY level.

But there exist drivers, such as the lan966x switch, which need to
install packet traps for PTP regardless of whether they are the layer
that provides the hardware timestamps, or the PHY is. That would be
impossible to support with the new API.

The proposal there, too, is to introduce a netdev notifier which acts as
a better cue for switching drivers to add or remove PTP packet traps,
than ndo_hwtstamp_set(). The one introduced here "almost" works there as
well, except for the fact that packet traps should only be installed if
the PHY driver succeeded to enable hardware timestamping, whereas here,
we need to deny hardware timestamping on the DSA master before it
actually gets enabled. This is why this notifier is called "PRE_", and
the notifier that would get used for PHY timestamping and packet traps
would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for
example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing.

In expectation of future netlink UAPI, we also pass a non-NULL extack
pointer to the netdev notifier, and we make DSA populate it with an
informative reason for the rejection. To avoid making it go to waste, we
make the ioctl-based dev_set_hwtstamp() create a fake extack and print
the message to the kernel log.

Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/
Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

88c0a6b5

net: dsa: make dsa_port_supports_hwtstamp() construct a fake ifreq · ff6ac4d0

Vladimir Oltean authored Apr 02, 2023

dsa_master_ioctl() is in the process of getting converted to a different
API, where we won't have access to a struct ifreq * anymore, but rather,
to a struct kernel_hwtstamp_config.

Since ds->ops->port_hwtstamp_get() still uses struct ifreq *, this
creates a difficult situation where we have to make up such a dummy
pointer.

The conversion is a bit messy, because it forces a "good" implementation
of ds->ops->port_hwtstamp_get() to return -EFAULT in copy_to_user()
because of the NULL ifr->ifr_data pointer. However, it works, and it is
only a transient step until ds->ops->port_hwtstamp_get() gets converted
to the new API which passes struct kernel_hwtstamp_config and does not
call copy_to_user().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff6ac4d0

net: add struct kernel_hwtstamp_config and make net_hwtstamp_validate() use it · c4bffeaa

Vladimir Oltean authored Apr 02, 2023

Jakub Kicinski suggested that we may want to add new UAPI for
controlling hardware timestamping through netlink in the future, and in
that case, we will be limited to the struct hwtstamp_config that is
currently passed in fixed binary format through the SIOCGHWTSTAMP and
SIOCSHWTSTAMP ioctls. It would be good if new kernel code already
started operating on an extensible kernel variant of that structure,
similar in concept to struct kernel_ethtool_coalesce vs struct
ethtool_coalesce.

Since struct hwtstamp_config is in include/uapi/linux/net_tstamp.h, here
we introduce include/linux/net_tstamp.h which shadows that other header,
but also includes it, so that existing includers of this header work as
before. In addition to that, we add the definition for the kernel-only
structure, and a helper which translates all fields by manual copying.
I am doing a manual copy in order to not force the alignment (or type)
of the fields of struct kernel_hwtstamp_config to be the same as of
struct hwtstamp_config, even though now, they are the same.

Link: https://lore.kernel.org/netdev/20230330223519.36ce7d23@kernel.org/Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4bffeaa

net: move copy_from_user() out of net_hwtstamp_validate() · d5d5fd8f

Vladimir Oltean authored Apr 02, 2023

The kernel will want to start using the more meaningful struct
hwtstamp_config pointer in more places, so move the copy_from_user() at
the beginning of dev_set_hwtstamp() in order to get to that, and pass
this argument to net_hwtstamp_validate().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5d5fd8f

net: promote SIOCSHWTSTAMP and SIOCGHWTSTAMP ioctls to dedicated handlers · 4ee58e1e

Vladimir Oltean authored Apr 02, 2023

DSA does not want to intercept all ioctls handled by dev_eth_ioctl(),
only SIOCSHWTSTAMP. This can be seen from commit f685e609 ("net:
dsa: Deny PTP on master if switch supports it"). However, the way in
which the dsa_ndo_eth_ioctl() is called would suggest otherwise.

Split the handling of SIOCSHWTSTAMP and SIOCGHWTSTAMP ioctls into
separate case statements of dev_ifsioc(), and make each one call its own
sub-function. This also removes the dsa_ndo_eth_ioctl() call from
dev_eth_ioctl(), which from now on exclusively handles PHY ioctls.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ee58e1e

net: simplify handling of dsa_ndo_eth_ioctl() return code · 1193db2a

Vladimir Oltean authored Apr 02, 2023

In the expression "x == 0 || x != -95", the term "x == 0" does not
change the expression's logical value, because 0 != -95, and so,
if x is 0, the expression would still be true by virtue of the second
term. If x is non-zero, the expression depends on the truth value of
the second term anyway. As such, the first term is redundant and can
be deleted.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1193db2a

net: don't abuse "default" case for unknown ioctl in dev_ifsioc() · 00d521b3

Vladimir Oltean authored Apr 02, 2023

The "switch (cmd)" block from dev_ifsioc() gained a bit too much
unnecessary manual handling of "cmd" in the "default" case, starting
with the private ioctls.

Clean that up by using the "ellipsis" gcc extension, adding separate
cases for the rest of the ioctls, and letting the default case only
return -EINVAL.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00d521b3

02 Apr, 2023 7 commits

net: alteon: remove unused len variable · 51aaa682

Tom Rix authored Mar 31, 2023

clang with W=1 reports
drivers/net/ethernet/alteon/acenic.c:2438:10: error: variable
  'len' set but not used [-Werror,-Wunused-but-set-variable]
                int i, len = 0;
                       ^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51aaa682

Merge branch 'mlxsw-transceiver-trip-points' · f85b8824

David S. Miller authored Apr 02, 2023

Petr Machata says:

====================
mlxsw: Use static trip points for transceiver modules

Ido Schimmel writes:

See patch #1 for motivation and implementation details.

Patches #2-#3 are simple cleanups as a result of the changes in the
first patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f85b8824

mlxsw: core_thermal: Simplify transceiver module get_temp() callback · cc19439f

Ido Schimmel authored Mar 31, 2023

The get_temp() callback of a thermal zone associated with a transceiver
module no longer needs to read the temperature thresholds of the module.
Therefore, simplify the callback by only reading the temperature.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc19439f

mlxsw: core_thermal: Make mlxsw_thermal_module_init() void · c1536d85

Ido Schimmel authored Mar 31, 2023

The function can no longer fail so make it void and remove the
associated error path.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1536d85

mlxsw: core_thermal: Use static trip points for transceiver modules · 5601ef91

Ido Schimmel authored Mar 31, 2023

The driver registers a thermal zone for each transceiver module and
tries to set the trip point temperatures according to the thresholds
read from the transceiver. If a threshold cannot be read or if a
transceiver is unplugged, the trip point temperature is set to zero,
which means that it is disabled as far as the thermal subsystem is
concerned.

A recent change in the thermal core made it so that such trip points are
no longer marked as disabled, which lead the thermal subsystem to
incorrectly set the associated cooling devices to the their maximum
state [1]. A fix to restore this behavior was merged in commit
f1b80a38 ("thermal: core: Restore behavior regarding invalid trip
points"). However, the thermal maintainer suggested to not rely on this
behavior and instead always register a valid array of trip points [2].

Therefore, create a static array of trip points with sane defaults
(suggested by Vadim) and register it with the thermal zone of each
transceiver module. User space can choose to override these defaults
using the thermal zone sysfs interface since these files are writeable.

Before:

 $ cat /sys/class/thermal/thermal_zone11/type
 mlxsw-module11
 $ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp
 65000
 75000
 80000

After:

 $ cat /sys/class/thermal/thermal_zone11/type
 mlxsw-module11
 $ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp
 55000
 65000
 80000

Also tested by reverting commit f1b80a38 ("thermal: core: Restore
behavior regarding invalid trip points") and making sure that the
associated cooling devices are not set to their maximum state.

[1] https://lore.kernel.org/linux-pm/ZA3CFNhU4AbtsP4G@shredder/
[2] https://lore.kernel.org/linux-pm/f78e6b70-a963-c0ca-a4b2-0d4c6aeef1fb@linaro.org/Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5601ef91

net: minor reshuffle of napi_struct · dd2d6604

Jakub Kicinski authored Mar 30, 2023

napi_id is read by GRO and drivers to mark skbs, and it currently
sits at the end of the structure, in a mostly unused cache line.
Move it up into a hole, and separate the clearly control path
fields from the important ones.

Before:

struct napi_struct {
	struct list_head           poll_list;            /*     0    16 */
	long unsigned int          state;                /*    16     8 */
	int                        weight;               /*    24     4 */
	int                        defer_hard_irqs_count; /*    28     4 */
	long unsigned int          gro_bitmask;          /*    32     8 */
	int                        (*poll)(struct napi_struct *, int); /*    40     8 */
	int                        poll_owner;           /*    48     4 */

	/* XXX 4 bytes hole, try to pack */

	struct net_device *        dev;                  /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct gro_list            gro_hash[8];          /*    64   192 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	struct sk_buff *           skb;                  /*   256     8 */
	struct list_head           rx_list;              /*   264    16 */
	int                        rx_count;             /*   280     4 */

	/* XXX 4 bytes hole, try to pack */

	struct hrtimer             timer;                /*   288    64 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
	struct list_head           dev_list;             /*   352    16 */
	struct hlist_node          napi_hash_node;       /*   368    16 */
	/* --- cacheline 6 boundary (384 bytes) --- */
	unsigned int               napi_id;              /*   384     4 */

	/* XXX 4 bytes hole, try to pack */

	struct task_struct *       thread;               /*   392     8 */

	/* size: 400, cachelines: 7, members: 17 */
	/* sum members: 388, holes: 3, sum holes: 12 */
	/* paddings: 1, sum paddings: 4 */
	/* last cacheline: 16 bytes */
};

After:

struct napi_struct {
	struct list_head           poll_list;            /*     0    16 */
	long unsigned int          state;                /*    16     8 */
	int                        weight;               /*    24     4 */
	int                        defer_hard_irqs_count; /*    28     4 */
	long unsigned int          gro_bitmask;          /*    32     8 */
	int                        (*poll)(struct napi_struct *, int); /*    40     8 */
	int                        poll_owner;           /*    48     4 */

	/* XXX 4 bytes hole, try to pack */

	struct net_device *        dev;                  /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct gro_list            gro_hash[8];          /*    64   192 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	struct sk_buff *           skb;                  /*   256     8 */
	struct list_head           rx_list;              /*   264    16 */
	int                        rx_count;             /*   280     4 */
	unsigned int               napi_id;              /*   284     4 */
	struct hrtimer             timer;                /*   288    64 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
	struct task_struct *       thread;               /*   352     8 */
	struct list_head           dev_list;             /*   360    16 */
	struct hlist_node          napi_hash_node;       /*   376    16 */

	/* size: 392, cachelines: 7, members: 17 */
	/* sum members: 388, holes: 1, sum holes: 4 */
	/* paddings: 1, sum paddings: 4 */
	/* forced alignments: 1 */
	/* last cacheline: 8 bytes */
} __attribute__((__aligned__(8)));
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd2d6604

i40e: Add support for VF to specify its primary MAC address · ceb29474

Sylwester Dziedziuch authored Mar 30, 2023

Currently in the i40e driver there is no implementation of different
MAC address handling depending on whether it is a legacy or primary.
Introduce new checks for VF to be able to specify its primary MAC
address based on the VIRTCHNL_ETHER_ADDR_PRIMARY type.

Primary MAC address are treated differently compared to legacy
ones in a scenario where:
1. If a unicast MAC is being added and it's specified as
VIRTCHNL_ETHER_ADDR_PRIMARY, then replace the current
default_lan_addr.addr.
2. If a unicast MAC is being deleted and it's type
is specified as VIRTCHNL_ETHER_ADDR_PRIMARY, then zero the
hw_lan_addr.addr.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ceb29474

01 Apr, 2023 1 commit

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · d74aab2c

Jakub Kicinski authored Mar 31, 2023

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-03-30 (documentation, ice)

This series contains updates to driver documentation and the ice driver.

Tony removes links and addresses related to the out-of-tree driver from the
Intel ethernet driver documentation.

Jake removes a comment that is no longer valid to the ice driver.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: remove comment about not supporting driver reinit
  Documentation/eth/intel: Remove references to SourceForge
  Documentation/eth/intel: Update address for driver support
====================

Link: https://lore.kernel.org/r/20230330165935.2503604-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d74aab2c

31 Mar, 2023 5 commits

Merge tag 'nf-next-2023-03-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · 54fd494a

Jakub Kicinski authored Mar 31, 2023

Florian Westphal says:

====================
netfilter updates for net-next

1. No need to disable BH in nfnetlink proc handler, freeing happens
   via call_rcu.
2. Expose classid in nfetlink_queue, from Eric Sage.
3. Fix nfnetlink message description comments, from Matthieu De Beule.
4. Allow removal of offloaded connections via ctnetlink, from Paul Blakey.

* tag 'nf-next-2023-03-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: ctnetlink: Support offloaded conntrack entry deletion
  netfilter: Correct documentation errors in nf_tables.h
  netfilter: nfnetlink_queue: enable classid socket info retrieval
  netfilter: nfnetlink_log: remove rcu_bh usage
====================

Link: https://lore.kernel.org/r/20230331104809.2959-1-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

54fd494a

dt-bindings: net: fec: add power-domains property · 99b3a769

Peng Fan authored Mar 28, 2023

Add optional power domains property
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230328061518.1985981-1-peng.fan@oss.nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

99b3a769

tcp: Refine SYN handling for PAWS. · ee05d90d

Kuniyuki Iwashima authored Mar 29, 2023

Our Network Load Balancer (NLB) [0] has multiple nodes with different
IP addresses, and each node forwards TCP flows from clients to backend
targets. NLB has an option to preserve the client's source IP address
and port when routing packets to backend targets. [1]

When a client connects to two different NLB nodes, they may select the
same backend target. Then, if the client has used the same source IP
and port, the two flows at the backend side will have the same 4-tuple.

While testing around such cases, I saw these sequences on the backend
target.

IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2819965599, win 62727, options [mss 8365,sackOK,TS val 1029816180 ecr 0,nop,wscale 7], length 0
IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [S.], seq 3040695044, ack 2819965600, win 62643, options [mss 8961,sackOK,TS val 1224784076 ecr 1029816180,nop,wscale 7], length 0
IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [.], ack 1, win 491, options [nop,nop,TS val 1029816181 ecr 1224784076], length 0
IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2681819307, win 62727, options [mss 8365,sackOK,TS val 572088282 ecr 0,nop,wscale 7], length 0
IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [.], ack 1, win 490, options [nop,nop,TS val 1224794914 ecr 1029816181,nop,nop,sack 1 {4156821004:4156821005}], length 0

It seems to be working correctly, but the last ACK was generated by
tcp_send_dupack() and PAWSEstab was increased. This is because the
second connection has a smaller timestamp than the first one.

In this case, we should send a dup ACK in tcp_send_challenge_ack()
to increase the correct counter and rate-limit it properly.

Let's check the SYN flag after the PAWS tests to avoid adding unnecessary
overhead for most packets.

Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html [0]
Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#client-ip-preservation [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee05d90d

macvlan: Fix mc_filter calculation · ae63ad9b

Herbert Xu authored Mar 29, 2023

On Wed, Mar 29, 2023 at 08:10:26AM +0000, patchwork-bot+netdevbpf@kernel.org wrote:
>
> Here is the summary with links:
>   - [1/2] macvlan: Skip broadcast queue if multicast with single receiver
>     https://git.kernel.org/netdev/net-next/c/d45276e75e90
>   - [2/2] macvlan: Add netlink attribute for broadcast cutoff
>     https://git.kernel.org/netdev/net-next/c/954d1fa1ac93

Sorry, I made an error and posted my patches from an earlier
revision so a follow-up fix was missing:

---8<---
The bc_cutoff patch broke the calculation of mc_filter causing
some multicast packets to not make it through to the targeted
device.

Fix this by checking whether vlan is set instead of cutoff >= 0.

Also move the cutoff < 0 logic into macvlan_recompute_bc_filter
so that it doesn't change the mc_filter at all.

Fixes: d45276e7 ("macvlan: Skip broadcast queue if multicast with single receiver")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae63ad9b

Merge tag 'wireless-next-2023-03-30' of... · ce7928f7

Jakub Kicinski authored Mar 30, 2023

Merge tag 'wireless-next-2023-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Major stack changes:

 * TC offload support for drivers below mac80211
 * reduced neighbor report (RNR) handling for AP mode
 * mac80211 mesh fast-xmit and fast-rx support
 * support for another mesh A-MSDU format
   (seems nobody got the spec right)

Major driver changes:

Kalle moved the drivers that were just plain C files
in drivers/net/wireless/ to legacy/ and virtual/ dirs.

hwsim
 * multi-BSSID support
 * some FTM support

ath11k
 * MU-MIMO parameters support
 * ack signal support for management packets

rtl8xxxu
 * support for RTL8710BU aka RTL8188GU chips

rtw89
 * support for various newer firmware APIs

ath10k
 * enabled threaded NAPI on WCN3990

iwlwifi
 * lots of work for multi-link/EHT (wifi7)
 * hardware timestamping support for some devices/firwmares
 * TX beacon protection on newer hardware

* tag 'wireless-next-2023-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (181 commits)
  wifi: clean up erroneously introduced file
  wifi: iwlwifi: mvm: correctly use link in iwl_mvm_sta_del()
  wifi: iwlwifi: separate AP link management queues
  wifi: iwlwifi: mvm: free probe_resp_data later
  wifi: iwlwifi: bump FW API to 75 for AX devices
  wifi: iwlwifi: mvm: move max_agg_bufsize into host TLC lq_sta
  wifi: iwlwifi: mvm: send full STA during HW restart
  wifi: iwlwifi: mvm: rework active links counting
  wifi: iwlwifi: mvm: update mac config when assigning chanctx
  wifi: iwlwifi: mvm: use the correct link queue
  wifi: iwlwifi: mvm: clean up mac_id vs. link_id in MLD sta
  wifi: iwlwifi: mvm: fix station link data leak
  wifi: iwlwifi: mvm: initialize max_rc_amsdu_len per-link
  wifi: iwlwifi: mvm: use appropriate link for rate selection
  wifi: iwlwifi: mvm: use the new lockdep-checking macros
  wifi: iwlwifi: mvm: remove chanctx WARN_ON
  wifi: iwlwifi: mvm: avoid sending MAC context for idle
  wifi: iwlwifi: mvm: remove only link-specific AP keys
  wifi: iwlwifi: mvm: skip inactive links
  wifi: iwlwifi: mvm: adjust iwl_mvm_scan_respect_p2p_go_iter() for MLO
  ...
====================

Link: https://lore.kernel.org/r/20230330205612.921134-1-johannes@sipsolutions.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ce7928f7