Commits · 69d72587c17b48e928f558d360f1b3c0574bf4dd · Kirill Smelkov / linux

23 Oct, 2023 5 commits

geneve: use generic function for tunnel IPv6 route lookup · 69d72587

Beniamino Galvani authored Oct 20, 2023

The route lookup can be done now via generic function
udp_tunnel6_dst_lookup() to replace the custom implementation in
geneve_get_v6_dst().

This is similar to what already done for IPv4 in commit daa2ba7e
("geneve: use generic function for tunnel IPv4 route lookup").
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

69d72587

ipv6: add new arguments to udp_tunnel6_dst_lookup() · 946fcfdb

Beniamino Galvani authored Oct 20, 2023

We want to make the function more generic so that it can be used by
other UDP tunnel implementations such as geneve and vxlan. To do that,
add the following arguments:

 - source and destination UDP port;
 - ifindex of the output interface, needed by vxlan;
 - the tos, because in some cases it is not taken from struct
   ip_tunnel_info (for example, when it's inherited from the inner
   packet);
 - the dst cache, because not all tunnel types (e.g. vxlan) want to
   use the one from struct ip_tunnel_info.

With these parameters, the function no longer needs the full struct
ip_tunnel_info as argument and we can pass only the relevant part of
it (struct ip_tunnel_key).

This is similar to what already done for IPv4 in commit 72fc68c6
("ipv4: add new arguments to udp_tunnel_dst_lookup()").
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

946fcfdb

ipv6: remove "proto" argument from udp_tunnel6_dst_lookup() · 7e937dcf

Beniamino Galvani authored Oct 20, 2023

The function is now UDP-specific, the protocol is always IPPROTO_UDP.

This is similar to what already done for IPv4 in commit 78f3655a
("ipv4: remove "proto" argument from udp_tunnel_dst_lookup()").
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e937dcf

ipv6: rename and move ip6_dst_lookup_tunnel() · fc47e86d

Beniamino Galvani authored Oct 20, 2023

At the moment ip6_dst_lookup_tunnel() is used only by bareudp.
Ideally, other UDP tunnel implementations should use it, but to do so
the function needs to accept new parameters that are specific for UDP
tunnels, such as the ports.

Prepare for these changes by renaming the function to
udp_tunnel6_dst_lookup() and move it to file
net/ipv6/ip6_udp_tunnel.c.

This is similar to what already done for IPv4 in commit bf3fcbf7
("ipv4: rename and move ip_route_output_tunnel()").
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc47e86d

net: atm: Remove redundant check. · 92fc97ae

Gavrilov Ilia authored Oct 20, 2023

Checking the 'adev' variable is unnecessary,
because 'cdev' has already been checked earlier.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Fixes: 656d98b0 ("[ATM]: basic sysfs support for ATM devices")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

92fc97ae

22 Oct, 2023 9 commits

Merge branch 'bnxt_en-next' · 5e370403

David S. Miller authored Oct 22, 2023

Michael Chan says:

====================
bnxt_en: Update for net-next

The first 2 patches are fixes for the recently added hwmon changes.
The next 6 patches are enhancements to support ethtool lanes and
all the proper supported and advertised link modes.  Before these
patches, the driver was only supporting the link modes for copper
media.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5e370403

bnxt_en: extend media types to supported and autoneg modes · 5d4e1bf6

Edwin Peer authored Oct 20, 2023

The current driver code does not accurately report the supported and
advertised link modes.  It basically always assumes the media type
is copper for any particular speed.  Utilize the recently added link
mode mappings to accurately report fully qualified ethtool link modes for
advertised and supported speeds.

If the media type is known, we will report the supported link modes for
that media only.  If the media is not known, we will report all possible
supported link modes.  The user can now specify any supported link modes
(including NRZ and PAM4) to advertise for autoneg.  It used to only accept
copper NRZ modes.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d4e1bf6

bnxt_en: convert to linkmode_set_bit() API · 64d20aea

Edwin Peer authored Oct 20, 2023

Barring the BNXT_FW_TO_ETHTOOL speed macros, which will be removed
in the next patch, update code to use the newer API.

No functional change.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

64d20aea

bnxt_en: Refactor NRZ/PAM4 link speed related logic · 5802e303

Michael Chan authored Oct 20, 2023

Refactor some NRZ/PAM4 link speed related logic into helper functions.
The NRZ and PAM4 link parameters are stored in separate structure fields.
The driver logic has to check whether it is in NRZ or PAM4 mode and
then use the appropriate field.

Refactor this logic into helper functions for better readability.
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5802e303

bnxt_en: refactor speed independent ethtool modes · 94c89e73

Edwin Peer authored Oct 20, 2023

A future patch in this series will change the algorithm used to
determine ethtool speed and media modes. Extract the handling of
the unrelated pause, autoneg modes into an independent function.
Also separate FEC handling out of bnxt_fw_to_ethtool_*_spds().

No functional change.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94c89e73

bnxt_en: support lane configuration via ethtool · d6263677

Edwin Peer authored Oct 20, 2023

Recent kernels support changing the number of link lanes via ethtool.
This is useful for determining the appropriate signal mode to use when
a given link speed can be achieved using different lane configurations.

Accept the ethtool lanes parameter when configuring forced speed.  If
there is no lanes parameter, select a default.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d6263677

bnxt_en: add infrastructure to lookup ethtool link mode · ecdad2a6

Edwin Peer authored Oct 20, 2023

Add infrastructure to look up the enum ethtool_link_mode_bit_indices
from link information provided by the firmware.  The link speed,
signal mode, and media type returned by firmware will be used to
look up the ethtool link mode.

The immediate benefit is that once the link mode is determined, we can
now use ethtool_params_from_link_mode() to fill the basic ethtool
parameters including the number of lanes.  Lanes will be fully
supported in the next patch.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecdad2a6

bnxt_en: Fix invoking hwmon_notify_event · fd78ec3f

Kalesh AP authored Oct 20, 2023

FW sends the async event to the driver when the device temperature goes
above or below the threshold values.  Only notify hwmon if the
temperature is increasing to the next alert level, not when it is
decreasing.

Cc: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd78ec3f

bnxt_en: Do not call sleeping hwmon_notify_event() from NAPI · 55862094

Kalesh AP authored Oct 20, 2023

Defer hwmon_notify_event() to bnxt_sp_task() workqueue because
hwmon_notify_event() can try to acquire a mutex shown in the stack trace
below.  Modify bnxt_event_error_report() to return true if we need to
schedule bnxt_sp_task() to notify hwmon.

  __schedule+0x68/0x520
  hwmon_notify_event+0xe8/0x114
  schedule+0x60/0xe0
  schedule_preempt_disabled+0x28/0x40
  __mutex_lock.constprop.0+0x534/0x550
  __mutex_lock_slowpath+0x18/0x20
  mutex_lock+0x5c/0x70
  kobject_uevent_env+0x2f4/0x3d0
  kobject_uevent+0x10/0x20
  hwmon_notify_event+0x94/0x114
  bnxt_hwmon_notify_event+0x40/0x70 [bnxt_en]
  bnxt_event_error_report+0x260/0x290 [bnxt_en]
  bnxt_async_event_process.isra.0+0x250/0x850 [bnxt_en]
  bnxt_hwrm_handler.isra.0+0xc8/0x120 [bnxt_en]
  bnxt_poll_p5+0x150/0x350 [bnxt_en]
  __napi_poll+0x3c/0x210
  net_rx_action+0x308/0x3b0
  __do_softirq+0x120/0x3e0

Cc: Guenter Roeck <linux@roeck-us.net>
Fixes: a19b4801 ("bnxt_en: Event handler for Thermal event")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55862094

21 Oct, 2023 5 commits

octeon_ep: assert hardware structure sizes · e10f4019

Shinas Rasheed authored Oct 20, 2023

Clean up structure defines related to hardware data to be
asserted to fixed sizes, as padding is not allowed
by hardware.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e10f4019

net: dsa: mv88e6xxx: add an error code check in mv88e6352_tai_event_work · a792197f

Su Hui authored Oct 20, 2023

mv88e6xxx_tai_write() can return error code (-EOPNOTSUPP ...) if failed.
So check the value of 'ret' after calling mv88e6xxx_tai_write().
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a792197f

selftests: tc-testing: add test for 'rt' upgrade on hfsc · ee3d1228

Pedro Tammela authored Oct 19, 2023

Add a test to check if inner rt curves are upgraded to sc curves.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee3d1228

net: wwan: replace deprecated strncpy with strscpy · 75e7d0b2

Justin Stitt authored Oct 19, 2023

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We expect chinfo.name to be NUL-terminated based on its use with format
strings and sprintf:
rpmsg/rpmsg_char.c
165:            dev_err(dev, "failed to open %s\n", eptdev->chinfo.name);
368:    return sprintf(buf, "%s\n", eptdev->chinfo.name);

... and with strcmp():
|  static struct rpmsg_endpoint *qcom_glink_create_ept(struct rpmsg_device *rpdev,
|  						    rpmsg_rx_cb_t cb,
|  						    void *priv,
|  						    struct rpmsg_channel_info
|  									chinfo)
|  ...
|  const char *name = chinfo.name;
|  ...
|  		if (!strcmp(channel->name, name))

Since chinfo is initialized as such (just above the strscpy()):

|       struct rpmsg_channel_info chinfo = {
|               .src = rpwwan->rpdev->src,
|               .dst = RPMSG_ADDR_ANY,
|       };

... we know other members are zero-initialized. This means no
NUL-padding is required (as any NUL-byte assignments are redundant).

Considering the above, a suitable replacement is `strscpy` due to the
fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20231019-strncpy-drivers-net-wwan-rpmsg_wwan_ctrl-c-v2-1-ecf9b5a39430@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

75e7d0b2

pds_core: add an error code check in pdsc_dl_info_get · a1e4c334

Su Hui authored Oct 19, 2023

check the value of 'ret' after call 'devlink_info_version_stored_put'.
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20231019083351.1526484-1-suhui@nfschina.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a1e4c334

20 Oct, 2023 21 commits

Merge branch 'ice-vf-resource-tracking' · 86a0348d

David S. Miller authored Oct 20, 2023

Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2023-10-19 (ice, igb, ixgbe)

This series contains improvements to the ice driver related to VF MSI-X
resource tracking, as well as other minor cleanups.

Dan fixes code in igb and ixgbe where the conversion to list_for_each_entry
failed to account for logic which assumed a NULL pointer after iteration.

Jacob makes ice_get_pf_c827_idx static, and refactors ice_find_netlist_node
based on feedback that got missed before the function merged.

Michal adds a switch rule to drop all traffic received by an inactive LAG
port. He also implements ops to allow individual control of MSI-X vectors
for SR-IOV VFs.

Przemek removes some unused fields in struct ice_flow_entry, and modifies
the ice driver to cache the VF PCI device inside struct ice_vf rather than
performing lookup at run time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

86a0348d

ixgbe: fix end of loop test in ixgbe_set_vf_macvlan() · a41654c3

Dan Carpenter authored Oct 19, 2023

The list iterator in a list_for_each_entry() loop can never be NULL.
If the loop exits without hitting a break then the iterator points
to an offset off the list head and dereferencing it is an out of
bounds access.

Before we transitioned to using list_for_each_entry() loops, then
it was possible for "entry" to be NULL and the comments mention
this.  I have updated the comments to match the new code.

Fixes: c1fec890 ("ethernet/intel: Use list_for_each_entry() helper")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a41654c3

igb: Fix an end of loop test · 4690aea5

Dan Carpenter authored Oct 19, 2023

When we exit a list_for_each_entry() without hitting a break statement,
the list iterator isn't NULL, it just point to an offset off the
list_head.  In that situation, it wouldn't be too surprising for
entry->free to be true and we end up corrupting memory.

The way to test for these is to just set a flag.

Fixes: c1fec890 ("ethernet/intel: Use list_for_each_entry() helper")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4690aea5

ice: cleanup ice_find_netlist_node · 640a65f8

Jacob Keller authored Oct 19, 2023

The ice_find_netlist_node function was introduced in commit 8a3a565f
("ice: add admin commands to access cgu configuration"). Variations of this
function were reviewed concurrently on both intel-wired-lan[1][2], and
netdev [3][4]

[1]: https://lore.kernel.org/intel-wired-lan/20230913204943.1051233-7-vadim.fedorenko@linux.dev/
[2]: https://lore.kernel.org/intel-wired-lan/20230817000058.2433236-5-jacob.e.keller@intel.com/
[3]: https://lore.kernel.org/netdev/20230918212814.435688-1-anthony.l.nguyen@intel.com/
[4]: https://lore.kernel.org/netdev/20230913204943.1051233-7-vadim.fedorenko@linux.dev/

The variant I posted had a few changes due to review feedback which were
never incorporated into the DPLL series:

* Replace the references to ancient and long removed ICE_SUCCESS and
  ICE_ERR_DOES_NOT_EXIST status codes in the function comment.

* Return -ENOENT instead of -ENOTBLK, as a more common way to indicate that
  an entry doesn't exist.

* Avoid the use of memset() and use simple static initialization for the
  cmd variable.

* Use FIELD_PREP to assign the node_type_ctx.

* Remove an unnecessary local variable to keep track of rec_node_handle,
  just pass the node_handle pointer directly into ice_aq_get_netlist_node.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

640a65f8

ice: make ice_get_pf_c827_idx static · 67918b6b

Jacob Keller authored Oct 19, 2023

The ice_get_pf_c827_idx function is only called inside of ice_ptp_hw.c, so
there is no reason to export it. Mark it static and remove the declaration
from ice_ptp_hw.h
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: David S. Miller <davem@davemloft.net>

67918b6b

ice: manage VFs MSI-X using resource tracking · 4d38cb44

Michal Swiatkowski authored Oct 19, 2023

Track MSI-X for VFs using bitmap, by setting and clearing bitmap during
allocation and freeing.

Try to linearize irqs usage for VFs, by freeing them and allocating once
again. Do it only for VFs that aren't currently running.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d38cb44

ice: set MSI-X vector count on VF · 05c16687

Michal Swiatkowski authored Oct 19, 2023

Implement ops needed to set MSI-X vector count on VF.

sriov_get_vf_total_msix() should return total number of MSI-X that can
be used by the VFs. Return the value set by devlink resources API
(pf->req_msix.vf).

sriov_set_msix_vec_count() will set number of MSI-X on particular VF.
Disable VF register mapping, rebuild VSI with new MSI-X and queues
values and enable new VF register mapping.

For best performance set number of queues equal to number of MSI-X.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05c16687

ice: add bitmap to track VF MSI-X usage · ea4af9b4

Michal Swiatkowski authored Oct 19, 2023

Create a bitamp to track MSI-X usage for VFs. The bitmap has the size of
total MSI-X amount on device, because at init time the amount of MSI-X
used by VFs isn't known.

The bitmap is used in follow up patchset to provide a block of
continuous block of MSI-X indexes for each created VF.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ea4af9b4

ice: implement num_msix field per VF · fe1c5ca2

Michal Swiatkowski authored Oct 19, 2023

Store the amount of MSI-X per VF instead of storing it in pf struct. It
is used to calculate number of q_vectors (and queues) for VF VSI.

This is necessary because with follow up changes the number of MSI-X can
be different between VFs. Use it instead of using pf->vf_msix value in
all cases.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe1c5ca2

ice: store VF's pci_dev ptr in ice_vf · 31642d28

Przemek Kitszel authored Oct 19, 2023

Extend struct ice_vf by vfdev.
Calculation of vfdev falls more nicely into ice_create_vf_entries().

Caching of vfdev enables simplification of ice_restore_all_vfs_msi_state().
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31642d28

ice: add drop rule matching on not active lport · 9dffb97d

Michal Swiatkowski authored Oct 19, 2023

Inactive LAG port should not receive any packets, as it can cause adding
invalid FDBs (bridge offload). Add a drop rule matching on inactive lport
in LAG.
Reviewed-by: Simon Horman <horms@kernel.org>
Co-developed-by: Marcin Szycik <marcin.szycik@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9dffb97d

ice: remove unused ice_flow_entry fields · 4cd7bc71

Przemek Kitszel authored Oct 19, 2023

Remove ::entry and ::entry_sz fields of &ice_flow_entry,
as they were never set.
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4cd7bc71

ethtool: untangle the linkmode and ethtool headers · 20c6e05b

Jakub Kicinski authored Oct 19, 2023

Commit 26c5334d ("ethtool: Add forced speed to supported link
modes maps") added a dependency between ethtool.h and linkmode.h.
The dependency in the opposite direction already exists so the
new code was inserted in an awkward place.

The reason for ethtool.h to include linkmode.h, is that
ethtool_forced_speed_maps_init() is a static inline helper.
That's not really necessary.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20c6e05b

net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams. · b4a11b20

Heng Guo authored Oct 19, 2023

Reproduce environment:
network with 3 VM linuxs is connected as below:
VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3
VM1: eth0 ip: 192.168.122.207 MTU 1500
VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500
VM3: eth0 ip: 192.168.123.240 MTU 1500

Reproduce:
VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0.
scapy command:
send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1,
inter=1.000000)

Result:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
  ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
  OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
  FragOKs FragFails FragCreates
Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0
......
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
  ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
  OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
  FragOKs FragFails FragCreates
Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0
......
----------------------------------------------------------------------
"ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase
from 7 to 8.

Issue description and patch:
IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS
("OutOctets") in ip_finish_output2().
According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but
not "OutRequests". "OutRequests" does not include any datagrams counted
in "ForwDatagrams".
ipSystemStatsOutOctets OBJECT-TYPE
    DESCRIPTION
           "The total number of octets in IP datagrams delivered to the
            lower layers for transmission.  Octets from datagrams
            counted in ipIfStatsOutTransmits MUST be counted here.
ipSystemStatsOutRequests OBJECT-TYPE
    DESCRIPTION
           "The total number of IP datagrams that local IP user-
            protocols (including ICMP) supplied to IP in requests for
            transmission.  Note that this counter does not include any
            datagrams counted in ipSystemStatsOutForwDatagrams.
So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add
IPSTATS_MIB_OUTREQUESTS for "OutRequests".
Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add
IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6.

Test result with patch:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
  ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
  OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
  FragOKs FragFails FragCreates OutTransmits
Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4
......
root@qemux86-64:~# cat /proc/net/netstat
......
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
  OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
  InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
  InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
  ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
  OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
  FragOKs FragFails FragCreates OutTransmits
Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5
......
root@qemux86-64:~# cat /proc/net/netstat
......
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
  OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
  InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
  InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0
----------------------------------------------------------------------
"ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3.
"OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428.
Signed-off-by: Heng Guo <heng.guo@windriver.com>
Reviewed-by: Kun Song <Kun.Song@windriver.com>
Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4a11b20

Merge branch 'ksz886x-forced-link-modes' · 095c3ea6

David S. Miller authored Oct 20, 2023

Oleksij Rempel says:

====================
fix forced link mode for KSZ886X switches

changes v3:
- squash patch 1 and 2
- use genphy_config_aneg() instead of genphy_setup_forced()

changes v2:
- address kernel test robot warning
- change comment explaining clearing of KSZ886X_CTRL_FORCE_LINK bit
- s/PHY we create/PHY will create/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

095c3ea6

net: phy: micrel: Fix forced link mode for KSZ886X switches · 510f02fe

Oleksij Rempel authored Oct 19, 2023

Address a link speed detection issue in KSZ886X PHY driver when in
forced link mode. Previously, link partners like "ASIX AX88772B"
with KSZ8873 could fall back to 10Mbit instead of configured 100Mbit.

The issue arises as KSZ886X PHY continues sending Fast Link Pulses (FLPs)
even with autonegotiation off, misleading link partners in autoneg mode,
leading to incorrect link speed detection.

Now, when autonegotiation is disabled, the driver sets the link state
forcefully using KSZ886X_CTRL_FORCE_LINK bit. This action, beyond just
disabling autonegotiation, makes the PHY state more reliably detected by
link partners using parallel detection, thus fixing the link speed
misconfiguration.

With autonegotiation enabled, link state is not forced, allowing proper
autonegotiation process participation.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

510f02fe

net: dsa: microchip: ksz8: Enable MIIM PHY Control reg access · f600bb61

Oleksij Rempel authored Oct 19, 2023

Provide access to MIIM PHY Control register (Reg. 31) through
ksz8_r_phy_ctrl() and ksz8_w_phy_ctrl() functions. Necessary for
upcoming micrel.c patch to address forced link mode configuration.

Closes: https://lore.kernel.org/oe-kbuild-all/202310112224.iYgvjBUy-lkp@intel.com/Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f600bb61

Merge branch 'mlxsw-lag-table-allocation' · c0518571

David S. Miller authored Oct 20, 2023

Petr Machata says:

====================
mlxsw: Move allocation of LAG table to the driver

PGT is an in-HW table that maps addresses to sets of ports. Then when some
HW process needs a set of ports as an argument, instead of embedding the
actual set in the dynamic configuration, what gets configured is the
address referencing the set. The HW then works with the appropriate PGT
entry.

Within the PGT is placed a LAG table. That is a contiguous block of PGT
memory where each entry describes which ports are members of the
corresponding LAG port.

The PGT is split to two parts: one managed by the FW, and one managed by
the driver. Historically, the FW part included also the LAG table, referred
to as FW LAG mode. Giving the responsibility for placement of the LAG table
to the driver, referred to as SW LAG mode, makes the whole system more
flexible. The FW currently supports both FW and SW LAG modes. To shed
complexity, the FW should in the future only support SW LAG mode.

Hence this patchset, where support for placement of LAG is added to mlxsw.

There are FW versions out there that do not support SW LAG mode, and on
Spectrum-1 in particular, there is no plan to support it at all. mlxsw will
therefore have to support both modes of operation.

Another aspect is that at least on Spectrum-1, there are FW versions out
there that claim to support driver-placed LAG table, but then reject or
ignore configurations enabling the same. The driver thus has to have a say
in whether an attempt to configure SW LAG mode should even be done.

The feature is therefore expressed in terms of "does the driver prefer SW
LAG mode?", and "what LAG mode the PCI module managed to configure the FW
with". This is unlike current flood mode configuration, where the driver
can give a strict value, and that's what gets configured. But it gives a
chance to the driver to determine whether LAG mode should be enabled at
all.

The "does the driver prefer SW LAG mode?" bit is expressed as a boolean
lag_mode_prefer_sw. The reason for this is largely another feature that
will be introduced in a follow-up patchset: support for CFF flood mode. The
driver currently requires that the FW be configured with what is called
controlled flood mode. But on capable systems, CFF would be preferred. So
there are two values in flight: the preferred flood mode, and the fallback.
This could be expressed with an array of flood modes ordered by preference,
but that looks like an overkill in comparison. This flag/value model is
then reused for LAG mode as well, except the fallback value is absent and
implied to be FW, because there are no other values to choose from.

The patchset progresses as follows:

- Patches #1 to #5 adjust reg.h and cmd.h with new register fields,
  constants and remarks.

- Patches #6 and #7 add the ability to request SW LAG mode and to query the
  LAG mode that was actually negotiated. This is where the abovementioned
  lag_mode_prefer_sw flag is added.

- Patches #7 to #9 generalize PGT allocations to make it possible to
  allocate the LAG table, which is done in patch #10.

- In patch #11, toggle lag_mode_prefer_sw on Spectrum-2 and above, which
  makes the newly-added code live.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c0518571

mlxsw: spectrum: Set SW LAG mode on Spectrum>1 · b46c1f3f

Petr Machata authored Oct 19, 2023

On Spectrum-2, Spectrum-3 and Spectrum-4 machines, request SW
responsibility for placement of the LAG table.

On Spectrum-1, some FW versions claim to support lag_mode field despite
quietly ignoring any settings made to that field. Thus refrain from
attempting to configure lag_mode on those systems at all.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b46c1f3f

mlxsw: spectrum: Allocate LAG table when in SW LAG mode · c6789725

Petr Machata authored Oct 19, 2023

In this patch, if the LAG mode is SW, allocate the LAG table and configure
SGCR to indicate where it was allocated.

We use the default "DDD" (for dynamic data duplication) layout of the LAG
table. In the DDD mode, the membership information for each LAG is copied
in 8 PGT entries. This is done for performance reasons. The LAG table then
needs to be allocated on an address aligned to 8. Deal with this by
moving the LAG init ahead so that the LAG table is allocated at address 0.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6789725

mlxsw: spectrum_pgt: Generalize PGT allocation · 8c893abd

Petr Machata authored Oct 19, 2023

PGT blocks are allocated through the function
mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows
which piece of PGT exactly they want to get. That was fine while the FID
code was the only client allocating blocks of PGT. However for SW-allocated
LAG table, there will be an additional client: mlxsw_sp_lag_init(). The
interface should therefore be changed to not require particular
coordinates, but to take just the requested size, allocate the block
wherever, and give back the PGT address.

In this patch, change the interface accordingly. Initialize FID family's
pgt_base from the result of the PGT allocation (note that mlxsw makes a
copy of the family structure, so what gets initialized is not actually the
global structure). Drop the now-unnecessary pgt_base initializations and
the corresponding defines.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c893abd