1. 20 Sep, 2022 40 commits
    • Russell King (Oracle)'s avatar
      net: sfp: re-implement soft state polling setup · 8475c4b7
      Russell King (Oracle) authored
      Re-implement the decision making for soft state polling. Instead of
      generating the soft state mask in sfp_soft_start_poll() by looking at
      which GPIOs are available, record their availability in
      sfp_sm_mod_probe() in sfp->state_hw_mask.
      
      This will then allow us to clear bits in sfp->state_hw_mask in module
      specific quirks when the hardware signals should not be used, thereby
      allowing us to switch to using the software state polling.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8475c4b7
    • Vladimir Oltean's avatar
      dt-bindings: net: dsa: convert ocelot.txt to dt-schema · 7f32974b
      Vladimir Oltean authored
      Replace the free-form description of device tree bindings for VSC9959
      and VSC9953 with a YAML formatted dt-schema description. This contains
      more or less the same information, but reworded to be a bit more
      succint.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarMaxim Kochetkov <fido_max@inbox.ru>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20220913125806.524314-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7f32974b
    • Jakub Kicinski's avatar
      Merge branch 'net-ipa-a-mix-of-cleanups' · 93ece9a6
      Jakub Kicinski authored
      Alex Elder says:
      
      ====================
      net: ipa: a mix of cleanups
      
      This series contains a set of cleanups done in preparation for a
      more substantitive upcoming series that reworks how IPA registers
      and their fields are defined.
      
      The first eliminates about half of the possible GSI register
      constant symbols by removing offset definitions that are not
      currently required.
      
      The next two mainly rearrange code for some common enumerated types.
      
      The next one fixes two spots that reuse local variable names in
      inner scopes when defining offsets.
      
      The next adds some additional restrictions on the value held in a
      register.
      
      And the last one just fixes two field mask symbol names so they
      adhere to the common naming convention.
      ====================
      
      Link: https://lore.kernel.org/r/20220910011131.1431934-1-elder@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      93ece9a6
    • Alex Elder's avatar
      net: ipa: fix two symbol names · dae4af6b
      Alex Elder authored
      All field mask symbols are defined with a "_FMASK" suffix, but
      EOT_COAL_GRANULARITY and DRBIP_ACL_ENABLE are defined without one.
      Fix that.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dae4af6b
    • Alex Elder's avatar
      net: ipa: update sequencer definition constraints · a14d5937
      Alex Elder authored
      Starting with IPA v4.5, replication is done differently from before,
      and as a result the "replication" portion of the how the sequencer
      is specified must be zero.
      
      Add a check for the configuration data failing that requirement, and
      only update the sesquencer type value when it's supported.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a14d5937
    • Alex Elder's avatar
      net: ipa: don't reuse variable names · 9eefd2fb
      Alex Elder authored
      In ipa_endpoint_init_hdr(), as well as ipa_endpoint_init_hdr_ext(),
      a top-level automatic variable named "offset" is used to represent
      the offset of a register.
      
      However, deeper within each of those functions is *another*
      definition of a local variable with the same name, representing
      something else.  Scoping rules ensure the result is what was
      intended, but this variable name reuse is bad practice and makes
      the code confusing.
      
      Fix this by naming the inner variable "off".  Use "off" instead of
      "checksum_offset" in ipa_endpoint_init_cfg() for consistency.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9eefd2fb
    • Alex Elder's avatar
      net: ipa: move and redefine ipa_version_valid() · 8b3cb084
      Alex Elder authored
      Move the definition of ipa_version_valid(), making it a static
      inline function defined together with the enumerated type in
      "ipa_version.h".  Define a new count value in the type.
      
      Rename the function to be ipa_version_supported(), and have it
      return true only if the IPA version supplied is explicitly supported
      by the driver.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8b3cb084
    • Alex Elder's avatar
      net: ipa: move the definition of gsi_ee_id · bb788de3
      Alex Elder authored
      Move the definition of the gsi_ee_id enumerated type out of "gsi.h"
      and into "ipa_version.h".  That latter header file isolates the
      definition of the ipa_version enumerated type, allowing it to be
      included in both IPA and GSI code.  We have the same requirement for
      gsi_ee_id, and moving it here makes it easier to get only that
      definition without everything else defined in "gsi.h".
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bb788de3
    • Alex Elder's avatar
      net: ipa: don't define unneeded GSI register offsets · 5ea42858
      Alex Elder authored
      Each GSI execution environment (EE) is able to access many of the
      GSI registers associated with the other EEs.  A block of GSI
      registers is contained within a region of memory, and an EE's
      register offset can be determined by adding the register's base
      offset to the product of the EE ID and a fixed constant.
      
      Despite this possibility, the AP IPA code *never* accesses any GSI
      registers other than its own.  So there's no need to define the
      macros that compute register offsets for other EEs.
      
      Redefine the AP access macros to compute the offset the way the more
      general "any EE" macro would, and get rid of the unneeded macros.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5ea42858
    • Paolo Abeni's avatar
      Merge branch 'net-ethernet-adi-add-adin1110-support' · 01544a27
      Paolo Abeni authored
      Alexandru Tachici says:
      
      ====================
      net: ethernet: adi: Add ADIN1110 support
      
      The ADIN1110 is a low power single port 10BASE-T1L MAC-PHY
      designed for industrial Ethernet applications. It integrates
      an Ethernet PHY core with a MAC and all the associated analog
      circuitry, input and output clock buffering.
      
      ADIN1110 MAC-PHY encapsulates the ADIN1100 PHY. The PHY registers
      can be accessed through the MDIO MAC registers.
      We are registering an MDIO bus with custom read/write in order
      to let the PHY to be discovered by the PAL. This will let
      the ADIN1100 Linux driver to probe and take control of
      the PHY.
      
      The ADIN2111 is a low power, low complexity, two-Ethernet ports
      switch with integrated 10BASE-T1L PHYs and one serial peripheral
      interface (SPI) port.
      
      The device is designed for industrial Ethernet applications using
      low power constrained nodes and is compliant with the IEEE 802.3cg-2019
      Ethernet standard for long reach 10 Mbps single pair Ethernet (SPE).
      The switch supports various routing configurations between
      the two Ethernet ports and the SPI host port providing a flexible
      solution for line, daisy-chain, or ring network topologies.
      
      The ADIN2111 supports cable reach of up to 1700 meters with ultra
      low power consumption of 77 mW. The two PHY cores support the
      1.0 V p-p operating mode and the 2.4 V p-p operating mode defined
      in the IEEE 802.3cg standard.
      
      The device integrates the switch, two Ethernet physical layer (PHY)
      cores with a media access control (MAC) interface and all the
      associated analog circuitry, and input and output clock buffering.
      
      The device also includes internal buffer queues, the SPI and
      subsystem registers, as well as the control logic to manage the reset
      and clock control and hardware pin configuration.
      
      Access to the PHYs is exposed via an internal MDIO bus. Writes/reads
      can be performed by reading/writing to the ADIN2111 MDIO registers
      via SPI.
      
      On probe, for each port, a struct net_device is allocated and
      registered. When both ports are added to the same bridge, the driver
      will enable offloading of frame forwarding at the hardware level.
      
      Driver offers STP support. Normal operation on forwarding state.
      Allows only frames with the 802.1d DA to be passed to the host
      when in any of the other states.
      
      When both ports of ADIN2111 belong to the same SW bridge a maximum
      of 12 FDB entries will offloaded by the hardware and are marked as such.
      ====================
      
      Link: https://lore.kernel.org/r/20220913122629.124546-1-andrei.tachici@stud.acs.upb.roSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      01544a27
    • Alexandru Tachici's avatar
      dt-bindings: net: adin1110: Add docs · 9fd12e86
      Alexandru Tachici authored
      Add bindings for the ADIN1110/2111 MAC-PHY/SWITCH.
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarAlexandru Tachici <alexandru.tachici@analog.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9fd12e86
    • Alexandru Tachici's avatar
      net: ethernet: adi: Add ADIN1110 support · bc93e19d
      Alexandru Tachici authored
      The ADIN1110 is a low power single port 10BASE-T1L MAC-PHY
      designed for industrial Ethernet applications. It integrates
      an Ethernet PHY core with a MAC and all the associated analog
      circuitry, input and output clock buffering.
      
      ADIN1110 MAC-PHY encapsulates the ADIN1100 PHY. The PHY registers
      can be accessed through the MDIO MAC registers.
      We are registering an MDIO bus with custom read/write in order
      to let the PHY to be discovered by the PAL. This will let
      the ADIN1100 Linux driver to probe and take control of
      the PHY.
      
      The ADIN2111 is a low power, low complexity, two-Ethernet ports
      switch with integrated 10BASE-T1L PHYs and one serial peripheral
      interface (SPI) port.
      
      The device is designed for industrial Ethernet applications using
      low power constrained nodes and is compliant with the IEEE 802.3cg-2019
      Ethernet standard for long reach 10 Mbps single pair Ethernet (SPE).
      The switch supports various routing configurations between
      the two Ethernet ports and the SPI host port providing a flexible
      solution for line, daisy-chain, or ring network topologies.
      
      The ADIN2111 supports cable reach of up to 1700 meters with ultra
      low power consumption of 77 mW. The two PHY cores support the
      1.0 V p-p operating mode and the 2.4 V p-p operating mode defined
      in the IEEE 802.3cg standard.
      
      The device integrates the switch, two Ethernet physical layer (PHY)
      cores with a media access control (MAC) interface and all the
      associated analog circuitry, and input and output clock buffering.
      
      The device also includes internal buffer queues, the SPI and
      subsystem registers, as well as the control logic to manage the reset
      and clock control and hardware pin configuration.
      
      Access to the PHYs is exposed via an internal MDIO bus. Writes/reads
      can be performed by reading/writing to the ADIN2111 MDIO registers
      via SPI.
      
      On probe, for each port, a struct net_device is allocated and
      registered. When both ports are added to the same bridge, the driver
      will enable offloading of frame forwarding at the hardware level.
      
      Driver offers STP support. Normal operation on forwarding state.
      Allows only frames with the 802.1d DA to be passed to the host
      when in any of the other states.
      
      When both ports of ADIN2111 belong to the same SW bridge a maximum
      of 12 FDB entries will offloaded by the hardware and are marked as such.
      Co-developed-by: default avatarLennart Franzen <lennart@lfdomain.com>
      Signed-off-by: default avatarLennart Franzen <lennart@lfdomain.com>
      Signed-off-by: default avatarAlexandru Tachici <alexandru.tachici@analog.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bc93e19d
    • Alexandru Tachici's avatar
      net: phy: adin1100: add PHY IDs of adin1110/adin2111 · 875b718a
      Alexandru Tachici authored
      Add additional PHY IDs for the internal PHYs of adin1110 and adin2111.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarAlexandru Tachici <alexandru.tachici@analog.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      875b718a
    • Paolo Abeni's avatar
      Merge branch 'seg6-add-next-c-sid-support-for-srv6-end-behavior' · cec9d59e
      Paolo Abeni authored
      Andrea Mayer says:
      
      ====================
      seg6: add NEXT-C-SID support for SRv6 End behavior
      
      The Segment Routing (SR) architecture is based on loose source routing.
      A list of instructions, called segments, can be added to the packet headers to
      influence the forwarding and processing of the packets in an SR enabled
      network.
      In SRv6 (Segment Routing over IPv6 data plane) [1], the segment identifiers
      (SIDs) are IPv6 addresses (128 bits) and the segment list (SID List) is carried
      in the Segment Routing Header (SRH). A segment may correspond to a "behavior"
      that is executed by a node when the packet is received.
      The Linux kernel currently supports a large subset of the behaviors described
      in [2] (e.g., End, End.X, End.T and so on).
      
      Some SRv6 scenarios (i.e.: traffic-engineering, fast-rerouting, VPN, mobile
      network backhaul, etc.) may require a large number of segments (i.e. up to 15).
      Therefore, reducing the size of the SID List is useful to minimize the impact
      on MTU (Maximum Transfer Unit) and to enable SRv6 on legacy hardware devices
      with limited processing power that can suffer from long IPv6 headers.
      
      Draft-ietf-spring-srv6-srh-compression [3] extends the SRv6 architecture by
      providing different mechanisms for the efficient representation (i.e.
      compression) of the SID List.
      
      The NEXT-C-SID mechanism described in [3] offers the possibility of encoding
      several SRv6 segments within a single 128 bit SID address. Such a SID address
      is called a Compressed SID Container. In this way, the length of the SID List
      can be drastically reduced. In some cases, the SRH can be omitted, as the IPv6
      Destination Address can carry the whole Segment List, using its compressed
      representation.
      
      The NEXT-C-SID mechanism relies on the "flavors" framework defined in [2].
      The flavors represent additional operations that can modify or extend a subset
      of the existing behaviors.
      
      In this patchset we extend the SRv6 Subsystem in order to support the
      NEXT-C-SID mechanism.
      
      In details the patchset is made of:
       - patch 1/3: add netlink_ext_ack support in parsing SRv6 behavior attributes;
       - patch 2/3: add NEXT-C-SID support for SRv6 End behavior;
       - patch 3/3: add selftest for NEXT-C-SID in SRv6 End behavior.
      
      The corresponding iproute2 patch for supporting the NEXT-C-SID in SRv6 End
      behavior is provided in a separated patchset.
      
      Comments, improvements and suggestions are always appreciated.
      
      [1] - https://datatracker.ietf.org/doc/html/rfc8754
      [2] - https://datatracker.ietf.org/doc/html/rfc8986
      [3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression
      
      ====================
      
      Link: https://lore.kernel.org/r/20220912171619.16943-1-andrea.mayer@uniroma2.itSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cec9d59e
    • Andrea Mayer's avatar
      selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End behavior · 19d6356a
      Andrea Mayer authored
      This selftest is designed for testing the support of NEXT-C-SID flavor
      for SRv6 End behavior. It instantiates a virtual network composed of
      several nodes: hosts and SRv6 routers. Each node is realized using a
      network namespace that is properly interconnected to others through veth
      pairs.
      The test considers SRv6 routers implementing IPv4/IPv6 L3 VPNs leveraged
      by hosts for communicating with each other. Such routers i) apply
      different SRv6 Policies to the traffic received from connected hosts,
      considering the IPv4 or IPv6 protocols; ii) use the NEXT-C-SID
      compression mechanism for encoding several SRv6 segments within a single
      128-bit SID address, referred to as a Compressed SID (C-SID) container.
      
      The NEXT-C-SID is provided as a "flavor" of the SRv6 End behavior,
      enabling it to properly process the C-SID containers. The correct
      execution of the enabled NEXT-C-SID SRv6 End behavior is verified
      through reachability tests carried out between hosts belonging to the
      same VPN.
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Acked-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      19d6356a
    • Andrea Mayer's avatar
      seg6: add NEXT-C-SID support for SRv6 End behavior · 848f3c0d
      Andrea Mayer authored
      The NEXT-C-SID mechanism described in [1] offers the possibility of
      encoding several SRv6 segments within a single 128 bit SID address. Such
      a SID address is called a Compressed SID (C-SID) container. In this way,
      the length of the SID List can be drastically reduced.
      
      A SID instantiated with the NEXT-C-SID flavor considers an IPv6 address
      logically structured in three main blocks: i) Locator-Block; ii)
      Locator-Node Function; iii) Argument.
      
                              C-SID container
      +------------------------------------------------------------------+
      |     Locator-Block      |Loc-Node|            Argument            |
      |                        |Function|                                |
      +------------------------------------------------------------------+
      <--------- B -----------> <- NF -> <------------- A --------------->
      
         (i) The Locator-Block can be any IPv6 prefix available to the provider;
      
        (ii) The Locator-Node Function represents the node and the function to
             be triggered when a packet is received on the node;
      
       (iii) The Argument carries the remaining C-SIDs in the current C-SID
             container.
      
      The NEXT-C-SID mechanism relies on the "flavors" framework defined in
      [2]. The flavors represent additional operations that can modify or
      extend a subset of the existing behaviors.
      
      This patch introduces the support for flavors in SRv6 End behavior
      implementing the NEXT-C-SID one. An SRv6 End behavior with NEXT-C-SID
      flavor works as an End behavior but it is capable of processing the
      compressed SID List encoded in C-SID containers.
      
      An SRv6 End behavior with NEXT-C-SID flavor can be configured to support
      user-provided Locator-Block and Locator-Node Function lengths. In this
      implementation, such lengths must be evenly divisible by 8 (i.e. must be
      byte-aligned), otherwise the kernel informs the user about invalid
      values with a meaningful error code and message through netlink_ext_ack.
      
      If Locator-Block and/or Locator-Node Function lengths are not provided
      by the user during configuration of an SRv6 End behavior instance with
      NEXT-C-SID flavor, the kernel will choose their default values i.e.,
      32-bit Locator-Block and 16-bit Locator-Node Function.
      
      [1] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression
      [2] - https://datatracker.ietf.org/doc/html/rfc8986Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      848f3c0d
    • Andrea Mayer's avatar
      seg6: add netlink_ext_ack support in parsing SRv6 behavior attributes · e2a8ecc4
      Andrea Mayer authored
      An SRv6 behavior instance can be set up using mandatory and/or optional
      attributes.
      In the setup phase, each supplied attribute is parsed and processed. If
      the parsing operation fails, the creation of the behavior instance stops
      and an error number/code is reported to the user.  In many cases, it is
      challenging for the user to figure out exactly what happened by relying
      only on the error code.
      
      For this reason, we add the support for netlink_ext_ack in parsing SRv6
      behavior attributes. In this way, when an SRv6 behavior attribute is
      parsed and an error occurs, the kernel can send a message to the
      userspace describing the error through a meaningful text message in
      addition to the classic error code.
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e2a8ecc4
    • Richard Gobert's avatar
      net-next: gro: Fix use of skb_gro_header_slow · cb628a9a
      Richard Gobert authored
      In the cited commit, the function ipv6_gro_receive was accidentally
      changed to use skb_gro_header_slow, without attempting the fast path.
      Fix it.
      
      Fixes: 35ffb665 ("net: gro: skb_gro_header helper function")
      Signed-off-by: default avatarRichard Gobert <richardbgobert@gmail.com>
      Link: https://lore.kernel.org/r/20220911184835.GA105063@debianSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cb628a9a
    • Nathan Chancellor's avatar
      net/mlx5e: Ensure macsec_rule is always initiailized in macsec_fs_{r,t}x_add_rule() · 2e50e9bf
      Nathan Chancellor authored
      Clang warns:
      
        drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:539:6: error: variable 'macsec_rule' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
                if (err)
                    ^~~
        drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:598:9: note: uninitialized use occurs here
                return macsec_rule;
                      ^~~~~~~~~~~
        drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:539:2: note: remove the 'if' if its condition is always false
                if (err)
                ^~~~~~~~
        drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:523:38: note: initialize the variable 'macsec_rule' to silence this warning
                union mlx5e_macsec_rule *macsec_rule;
                                                    ^
                                                    = NULL
        drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1131:6: error: variable 'macsec_rule' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
                if (err)
                    ^~~
        drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1215:9: note: uninitialized use occurs here
                return macsec_rule;
                      ^~~~~~~~~~~
        drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1131:2: note: remove the 'if' if its condition is always false
                if (err)
                ^~~~~~~~
        drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1118:38: note: initialize the variable 'macsec_rule' to silence this warning
                union mlx5e_macsec_rule *macsec_rule;
                                                    ^
                                                    = NULL
        2 errors generated.
      
      If macsec_fs_{r,t}x_ft_get() fail, macsec_rule will be uninitialized.
      Initialize it to NULL at the top of each function so that it cannot be
      used uninitialized.
      
      Fixes: e467b283 ("net/mlx5e: Add MACsec TX steering rules")
      Fixes: 3b20949c ("net/mlx5e: Add MACsec RX steering rules")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1706Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarRaed Salem <raeds@nvidia.com>
      Link: https://lore.kernel.org/r/20220911085748.461033-1-nathan@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2e50e9bf
    • Paolo Abeni's avatar
      Merge branch 'dsa-changes-for-multiple-cpu-ports-part-4' · e8b9f0da
      Paolo Abeni authored
      Vladimir Oltean says:
      
      ====================
      DSA changes for multiple CPU ports (part 4)
      
      Those who have been following part 1:
      https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/
      part 2:
      https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/
      and part 3:
      https://patchwork.kernel.org/project/netdevbpf/cover/20220819174820.3585002-1-vladimir.oltean@nxp.com/
      will know that I am trying to enable the second internal port pair from
      the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q".
      
      This series represents the final part of that effort. We have:
      
      - the introduction of new UAPI in the form of IFLA_DSA_MASTER, the
        iproute2 patch for which is here:
        https://patchwork.kernel.org/project/netdevbpf/patch/20220904190025.813574-1-vladimir.oltean@nxp.com/
      
      - preparation for LAG DSA masters in terms of suppressing some
        operations for masters in the DSA core that simply don't make sense
        when those masters are a bonding/team interface
      
      - handling all the net device events that occur between DSA and a
        LAG DSA master, including migration to a different DSA master when the
        current master joins a LAG, or the LAG gets destroyed
      
      - updating documentation
      
      - adding an implementation for NXP LS1028A, where things are insanely
        complicated due to hardware limitations. We have 2 tagging protocols:
      
        * the native "ocelot" protocol (NPI port mode). This does not support
          CPU ports in a LAG, and supports a single DSA master. The DSA master
          can be changed between eno2 (2.5G) and eno3 (1G), but all ports must
          be down during the changing process, and user ports assigned to the
          old DSA master will refuse to come up if the user requests that
          during a "transient" state.
      
        * the "ocelot-8021q" software-defined protocol, where the Ethernet
          ports connected to the CPU are not actually "god mode" ports as far
          as the hardware is concerned. So here, static assignment between
          user and CPU ports is possible by editing the PGID_SRC masks for
          the port-based forwarding matrix, and "CPU ports in a LAG" simply
          means "a LAG like any other".
      
      The series was regression-tested on LS1028A using the local_termination.sh
      kselftest, in most of the possible operating modes and tagging protocols.
      I have not done a detailed performance evaluation yet, but using LAG, is
      possible to exceed the termination bandwidth of a single CPU port in an
      iperf3 test with multiple senders and multiple receivers.
      
      v1 at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20220830195932.683432-1-vladimir.oltean@nxp.com/
      
      Previous (older) RFC at:
      https://lore.kernel.org/netdev/20220523104256.3556016-1-olteanv@gmail.com/
      ====================
      
      Link: https://lore.kernel.org/r/20220911010706.2137967-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e8b9f0da
    • Vladimir Oltean's avatar
      net: dsa: felix: add support for changing DSA master · eca70102
      Vladimir Oltean authored
      Changing the DSA master means different things depending on the tagging
      protocol in use.
      
      For NPI mode ("ocelot" and "seville"), there is a single port which can
      be configured as NPI, but DSA only permits changing the CPU port
      affinity of user ports one by one. So changing a user port to a
      different NPI port globally changes what the NPI port is, and breaks the
      user ports still using the old one.
      
      To address this while still permitting the change of the NPI port,
      require that the user ports which are still affine to the old NPI port
      are down, and cannot be brought up until they are all affine to the same
      NPI port.
      
      The tag_8021q mode ("ocelot-8021q") is more flexible, in that each user
      port can be freely assigned to one CPU port or to the other. This works
      by filtering host addresses towards both tag_8021q CPU ports, and then
      restricting the forwarding from a certain user port only to one of the
      two tag_8021q CPU ports.
      
      Additionally, the 2 tag_8021q CPU ports can be placed in a LAG. This
      works by enabling forwarding via PGID_SRC from a certain user port
      towards the logical port ID containing both tag_8021q CPU ports, but
      then restricting forwarding per packet, via the LAG hash codes in
      PGID_AGGR, to either one or the other.
      
      When we change the DSA master to a LAG device, DSA guarantees us that
      the LAG has at least one lower interface as a physical DSA master.
      But DSA masters can come and go as lowers of that LAG, and
      ds->ops->port_change_master() will not get called, because the DSA
      master is still the same (the LAG). So we need to hook into the
      ds->ops->port_lag_{join,leave} calls on the CPU ports and update the
      logical port ID of the LAG that user ports are assigned to.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      eca70102
    • Vladimir Oltean's avatar
      docs: net: dsa: update information about multiple CPU ports · 0773e3a8
      Vladimir Oltean authored
      DSA now supports multiple CPU ports, explain the use cases that are
      covered, the new UAPI, the permitted degrees of freedom, the driver API,
      and remove some old "hanging fruits".
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0773e3a8
    • Vladimir Oltean's avatar
      net: dsa: allow masters to join a LAG · acc43b7b
      Vladimir Oltean authored
      There are 2 ways in which a DSA user port may become handled by 2 CPU
      ports in a LAG:
      
      (1) its current DSA master joins a LAG
      
       ip link del bond0 && ip link add bond0 type bond mode 802.3ad
       ip link set eno2 master bond0
      
      When this happens, all user ports with "eno2" as DSA master get
      automatically migrated to "bond0" as DSA master.
      
      (2) it is explicitly configured as such by the user
      
       # Before, the DSA master was eno3
       ip link set swp0 type dsa master bond0
      
      The design of this configuration is that the LAG device dynamically
      becomes a DSA master through dsa_master_setup() when the first physical
      DSA master becomes a LAG slave, and stops being so through
      dsa_master_teardown() when the last physical DSA master leaves.
      
      A LAG interface is considered as a valid DSA master only if it contains
      existing DSA masters, and no other lower interfaces. Therefore, we
      mainly rely on method (1) to enter this configuration.
      
      Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when
      it becomes a standalone DSA master again. But the LAG master also has a
      dev->dsa_ptr, and this is actually duplicated from one of the physical
      LAG slaves, and therefore needs to be balanced when LAG slaves come and
      go.
      
      To the switch driver, putting DSA masters in a LAG is seen as putting
      their associated CPU ports in a LAG.
      
      We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG,
      by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      acc43b7b
    • Vladimir Oltean's avatar
      net: dsa: propagate extack to port_lag_join · 2e359b00
      Vladimir Oltean authored
      Drivers could refuse to offload a LAG configuration for a variety of
      reasons, mainly having to do with its TX type. Additionally, since DSA
      masters may now also be LAG interfaces, and this will translate into a
      call to port_lag_join on the CPU ports, there may be extra restrictions
      there. Propagate the netlink extack to this DSA method in order for
      drivers to give a meaningful error message back to the user.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2e359b00
    • Vladimir Oltean's avatar
      net: dsa: suppress device links to LAG DSA masters · 13eccc1b
      Vladimir Oltean authored
      These don't work (print a harmless error about the operation failing)
      and make little sense to have anyway, because when a LAG DSA master goes
      away, we will introduce logic to move our CPU port back to the first
      physical DSA master. So suppress these device links in preparation for
      adding support for LAG DSA masters.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      13eccc1b
    • Vladimir Oltean's avatar
      net: dsa: suppress appending ethtool stats to LAG DSA masters · cfeb84a5
      Vladimir Oltean authored
      Similar to the discussion about tracking the admin/oper state of LAG DSA
      masters, we have the problem here that struct dsa_port *cpu_dp caches a
      single pair of orig_ethtool_ops and netdev_ops pointers.
      
      So if we call dsa_master_setup(bond0, cpu_dp) where cpu_dp is also the
      dev->dsa_ptr of one of the physical DSA masters, we'd effectively
      overwrite what we cached from that physical netdev with what replaced
      from the bonding interface.
      
      We don't need DSA ethtool stats on the bonding interface when used as
      DSA master, it's good enough to have them just on the physical DSA
      masters, so suppress this logic.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cfeb84a5
    • Vladimir Oltean's avatar
      net: dsa: don't keep track of admin/oper state on LAG DSA masters · 6e61b55c
      Vladimir Oltean authored
      We store information about the DSA master's state in
      cpu_dp->master_admin_up and cpu_dp->master_oper_up, and this assumes a
      bijective association between a CPU port and a DSA master.
      
      However, when we have CPU ports in a LAG (and DSA masters in a LAG too),
      the way in which we set up things is that the physical DSA masters still
      have dev->dsa_ptr pointing to our cpu_dp, but the bonding/team device
      itself also has its dev->dsa_ptr pointing towards one of the CPU port
      structures (the first one).
      
      So logically speaking, that first cpu_dp can't keep track of both the
      physical master's admin/oper state, and of the bonding master's state.
      
      This isn't even needed; the reason why we keep track of the DSA master's
      state is to know when it is available for Ethernet-based register access.
      For that use case, we don't even need LAG; we just need to decide upon
      one of the physical DSA masters (if there is more than 1 available) and
      use that.
      
      This change suppresses dsa_tree_master_{admin,oper}_state_change() calls
      on LAG DSA masters (which will be supported in a future change), to
      allow the tracking of just physical DSA masters.
      
      Link: https://lore.kernel.org/netdev/628cc94d.1c69fb81.15b0d.422d@mx.google.com/Suggested-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6e61b55c
    • Vladimir Oltean's avatar
      net: dsa: allow the DSA master to be seen and changed through rtnetlink · 95f510d0
      Vladimir Oltean authored
      Some DSA switches have multiple CPU ports, which can be used to improve
      CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(),
      sets up only the first one, leading to suboptimal use of hardware.
      
      The desire is to not change the default configuration but to permit the
      user to create a dynamic mapping between individual user ports and the
      CPU port that they are served by, configurable through rtnetlink. It is
      also intended to permit load balancing between CPU ports, and in that
      case, the foreseen model is for the DSA master to be a bonding interface
      whose lowers are the physical DSA masters.
      
      To that end, we create a struct rtnl_link_ops for DSA user ports with
      the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that
      contains the ifindex of the newly desired DSA master.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      95f510d0
    • Vladimir Oltean's avatar
      net: dsa: introduce dsa_port_get_master() · 8f6a19c0
      Vladimir Oltean authored
      There is a desire to support for DSA masters in a LAG.
      
      That configuration is intended to work by simply enslaving the master to
      a bonding/team device. But the physical DSA master (the LAG slave) still
      has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical
      CPU port.
      
      However, we would like to be able to retrieve the LAG that's the upper
      of the physical DSA master. In preparation for that, introduce a helper
      called dsa_port_get_master() that replaces all occurrences of the
      dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will
      be made later within the helper itself.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8f6a19c0
    • Vladimir Oltean's avatar
      net: introduce iterators over synced hw addresses · db01868b
      Vladimir Oltean authored
      Some network drivers use __dev_mc_sync()/__dev_uc_sync() and therefore
      program the hardware only with addresses with a non-zero sync_cnt.
      
      Some of the above drivers also need to save/restore the address
      filtering lists when certain events happen, and they need to walk
      through the struct net_device :: uc and struct net_device :: mc lists.
      But these lists contain unsynced addresses too.
      
      To keep the appearance of an elementary form of data encapsulation,
      provide iterators through these lists that only look at entries with a
      non-zero sync_cnt, instead of filtering entries out from device drivers.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      db01868b
    • Paolo Abeni's avatar
      Merge branch 'ice-l2tpv3-offload-support' · 42e53b44
      Paolo Abeni authored
      Tony Nguyen says:
      
      ====================
      ice: L2TPv3 offload support
      
      Wojciech Drewek says:
      
      Add support for dissecting L2TPv3 session id in flow dissector. Add support
      for this field in tc-flower and support offloading L2TPv3. Finally, add
      support for hardware offload of L2TPv3 packets based on session id in
      switchdev mode in ice driver.
      
      Example filter:
        # tc filter add dev $PF1 ingress prio 1 protocol ip \
            flower \
              ip_proto l2tp \
              l2tpv3_sid 1234 \
              skip_sw \
            action mirred egress redirect dev $VF1_PR
      
      Changes in iproute2 are required to use the new fields.
      
      ICE COMMS DDP package is required to create a filter in ice.
      COMMS DDP package contains profiles of more advanced protocols.
      Without COMMS DDP package hw offload will not work, however
      sw offload will still work.
      ====================
      
      Link: https://lore.kernel.org/r/20220908171644.1282191-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      42e53b44
    • Marcin Szycik's avatar
      ice: Add L2TPv3 hardware offload support · cd634549
      Marcin Szycik authored
      Add support for offloading packets based on L2TPv3 session id in switchdev
      mode.
      
      Example filter:
      tc filter add dev $PF1 ingress prio 1 protocol ip flower ip_proto l2tp \
          l2tpv3_sid 1234 skip_sw action mirred egress redirect dev $VF1_PR
      
      Changes in iproute2 are required to be able to specify l2tpv3_sid.
      
      ICE COMMS DDP package is required to create a filter as it contains L2TPv3
      profiles.
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cd634549
    • Wojciech Drewek's avatar
      flow_offload: Introduce flow_match_l2tpv3 · 2c1befac
      Wojciech Drewek authored
      Allow to offload L2TPv3 filters by adding flow_rule_match_l2tpv3.
      Drivers can extract L2TPv3 specific fields from now on.
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2c1befac
    • Wojciech Drewek's avatar
      net/sched: flower: Add L2TPv3 filter · 8b189ea0
      Wojciech Drewek authored
      Add support for matching on L2TPv3 session ID.
      Session ID can be specified only when ip proto was
      set to IPPROTO_L2TP.
      
      Example filter:
        # tc filter add dev $PF1 ingress prio 1 protocol ip \
            flower \
              ip_proto l2tp \
              l2tpv3_sid 1234 \
              skip_sw \
            action mirred egress redirect dev $VF1_PR
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8b189ea0
    • Wojciech Drewek's avatar
      flow_dissector: Add L2TPv3 dissectors · dda2fa08
      Wojciech Drewek authored
      Allow to dissect L2TPv3 specific field which is:
      - session ID (32 bits)
      
      L2TPv3 might be transported over IP or over UDP,
      this implementation is only about L2TPv3 over IP.
      IP protocol carries L2TPv3 when ip_proto is
      IPPROTO_L2TP (115).
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dda2fa08
    • Wojciech Drewek's avatar
      uapi: move IPPROTO_L2TP to in.h · 65b32f80
      Wojciech Drewek authored
      IPPROTO_L2TP is currently defined in l2tp.h, but most of
      ip protocols are defined in in.h file. Move it there in order
      to keep code clean.
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      65b32f80
    • Jules Irenge's avatar
      octeon_ep: Remove useless casting value returned by vzalloc to structure · ed48cfed
      Jules Irenge authored
      coccinelle reports a warning
      
      WARNING: casting value returned by memory allocation
      function to (struct octep_rx_buffer *) is useless.
      
      To fix this the useless cast is removed.
      Signed-off-by: default avatarJules Irenge <jbi.octave@gmail.com>
      Link: https://lore.kernel.org/r/Yx+sr9o0uylXVcOl@playgroundSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed48cfed
    • Nathan Huckleberry's avatar
      openvswitch: Change the return type for vport_ops.send function hook to int · 8bb7c4f8
      Nathan Huckleberry authored
      All usages of the vport_ops struct have the .send field set to
      dev_queue_xmit or internal_dev_recv.  Since most usages are set to
      dev_queue_xmit, the function hook should match the signature of
      dev_queue_xmit.
      
      The only call to vport_ops->send() is in net/openvswitch/vport.c and it
      throws away the return value.
      
      This mismatched return type breaks forward edge kCFI since the underlying
      function definition does not match the function hook definition.
      Reported-by: default avatarDan Carpenter <error27@gmail.com>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1703
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNathan Huckleberry <nhuck@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Link: https://lore.kernel.org/r/20220913230739.228313-1-nhuck@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8bb7c4f8
    • Nathan Huckleberry's avatar
      net: wwan: t7xx: Fix return type of t7xx_ccmni_start_xmit · 73c99e26
      Nathan Huckleberry authored
      The ndo_start_xmit field in net_device_ops is expected to be of type
      netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).
      
      The mismatched return type breaks forward edge kCFI since the underlying
      function definition does not match the function hook definition.
      
      The return type of t7xx_ccmni_start_xmit should be changed from int to
      netdev_tx_t.
      Reported-by: default avatarDan Carpenter <error27@gmail.com>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1703
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNathan Huckleberry <nhuck@google.com>
      Acked-by: default avatarSergey Ryazanov <ryazanov.s.a@gmail.com>
      Link: https://lore.kernel.org/r/20220912214510.929070-1-nhuck@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      73c99e26
    • Nathan Huckleberry's avatar
      net: wwan: iosm: Fix return type of ipc_wwan_link_transmit · 0c9441c4
      Nathan Huckleberry authored
      The ndo_start_xmit field in net_device_ops is expected to be of type
      netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).
      
      The mismatched return type breaks forward edge kCFI since the underlying
      function definition does not match the function hook definition.
      
      The return type of ipc_wwan_link_transmit should be changed from int to
      netdev_tx_t.
      Reported-by: default avatarDan Carpenter <error27@gmail.com>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1703
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNathan Huckleberry <nhuck@google.com>
      Acked-by: default avatarSergey Ryazanov <ryazanov.s.a@gmail.com>
      Link: https://lore.kernel.org/r/20220912214455.929028-1-nhuck@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c9441c4