1. 25 Feb, 2022 17 commits
    • Jakub Kicinski's avatar
      Merge branch 'fdb-entries-on-dsa-lag-interfaces' · 53110c67
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      FDB entries on DSA LAG interfaces
      
      This work permits having static and local FDB entries on LAG interfaces
      that are offloaded by DSA ports. New API needs to be introduced in
      drivers. To maintain consistency with the bridging offload code, I've
      taken the liberty to reorganize the data structures added by Tobias in
      the DSA core a little bit.
      
      Tested on NXP LS1028A (felix switch). Would appreciate feedback/testing
      on other platforms too. Testing procedure was the one described here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
      
      with this script:
      
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      ip link del br0
      ip link add br0 type bridge && ip link set br0 up
      ip link set br0 arp off
      ip link set bond0 master br0 && ip link set bond0 up
      ip link set swp0 master br0 && ip link set swp0 up
      ip link set dev bond0 type bridge_slave flood off learning off
      bridge fdb add dev bond0 <mac address of other eno0> master static
      
      I'm noticing a problem in 'bridge fdb dump' with the 'self' entries, and
      I didn't solve this. On Ocelot, an entry learned on a LAG is reported as
      being on the first member port of it (so instead of saying 'self bond0',
      it says 'self swp1'). This is better than not seeing the entry at all,
      but when DSA queries for the FDBs on a port via ds->ops->port_fdb_dump,
      it never queries for FDBs on a LAG. Not clear what we should do there,
      we aren't in control of the ->ndo_fdb_dump of the bonding/team drivers.
      Alternatively, we could just consider the 'self' entries reported via
      ndo_fdb_dump as "better than nothing", and concentrate on the 'master'
      entries that are in sync with the bridge when packets are flooded to
      software.
      ====================
      
      Link: https://lore.kernel.org/r/20220223140054.3379617-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53110c67
    • Vladimir Oltean's avatar
      net: dsa: felix: support FDB entries on offloaded LAG interfaces · 961d8b69
      Vladimir Oltean authored
      This adds the logic in the Felix DSA driver and Ocelot switch library.
      For Ocelot switches, the DEST_IDX that is the output of the MAC table
      lookup is a logical port (equal to physical port, if no LAG is used, or
      a dynamically allocated number otherwise). The allocation we have in
      place for LAG IDs is different from DSA's, so we can't use that:
      - DSA allocates a continuous range of LAG IDs starting from 1
      - Ocelot appears to require that physical ports and LAG IDs are in the
        same space of [0, num_phys_ports), and additionally, ports that aren't
        in a LAG must have physical port id == logical port id
      
      The implication is that an FDB entry towards a LAG might need to be
      deleted and reinstalled when the LAG ID changes.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      961d8b69
    • Vladimir Oltean's avatar
      net: dsa: support FDB events on offloaded LAG interfaces · e212fa7c
      Vladimir Oltean authored
      This change introduces support for installing static FDB entries towards
      a bridge port that is a LAG of multiple DSA switch ports, as well as
      support for filtering towards the CPU local FDB entries emitted for LAG
      interfaces that are bridge ports.
      
      Conceptually, host addresses on LAG ports are identical to what we do
      for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply
      be replicated towards all member ports like we do for multicast, or VLAN.
      Instead we need new driver API. Hardware usually considers a LAG to be a
      "logical port", and sets the entire LAG as the forwarding destination.
      The physical egress port selection within the LAG is made by hashing
      policy, as usual.
      
      To represent the logical port corresponding to the LAG, we pass by value
      a copy of the dsa_lag structure to all switches in the tree that have at
      least one port in that LAG.
      
      To illustrate why a refcounted list of FDB entries is needed in struct
      dsa_lag, it is enough to say that:
      - a LAG may be a bridge port and may therefore receive FDB events even
        while it isn't yet offloaded by any DSA interface
      - DSA interfaces may be removed from a LAG while that is a bridge port;
        we don't want FDB entries lingering around, but we don't want to
        remove entries that are still in use, either
      
      For all the cases below to work, the idea is to always keep an FDB entry
      on a LAG with a reference count equal to the DSA member ports. So:
      - if a port joins a LAG, it requests the bridge to replay the FDB, and
        the FDB entries get created, or their refcount gets bumped by one
      - if a port leaves a LAG, the FDB replay deletes or decrements refcount
        by one
      - if an FDB is installed towards a LAG with ports already present, that
        entry is created (if it doesn't exist) and its refcount is bumped by
        the amount of ports already present in the LAG
      
      echo "Adding FDB entry to bond with existing ports"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      
      ip link del br0
      ip link del bond0
      
      echo "Adding FDB entry to empty bond"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      
      ip link del br0
      ip link del bond0
      
      echo "Adding FDB entry to empty bond, then removing ports one by one"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      
      ip link set swp1 nomaster
      ip link set swp2 nomaster
      ip link del br0
      ip link del bond0
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e212fa7c
    • Vladimir Oltean's avatar
      net: dsa: call SWITCHDEV_FDB_OFFLOADED for the orig_dev · 93c79823
      Vladimir Oltean authored
      When switchdev_handle_fdb_event_to_device() replicates a FDB event
      emitted for the bridge or for a LAG port and DSA offloads that, we
      should notify back to switchdev that the FDB entry on the original
      device is what was offloaded, not on the DSA slave devices that the
      event is replicated on.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      93c79823
    • Vladimir Oltean's avatar
      net: dsa: remove "ds" and "port" from struct dsa_switchdev_event_work · e35f12e9
      Vladimir Oltean authored
      By construction, the struct net_device *dev passed to
      dsa_slave_switchdev_event_work() via struct dsa_switchdev_event_work
      is always a DSA slave device.
      
      Therefore, it is redundant to pass struct dsa_switch and int port
      information in the deferred work structure. This can be retrieved at all
      times from the provided struct net_device via dsa_slave_to_port().
      
      For the same reason, we can drop the dsa_is_user_port() check in
      dsa_fdb_offload_notify().
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e35f12e9
    • Vladimir Oltean's avatar
      net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device · ec638740
      Vladimir Oltean authored
      When the switchdev_handle_fdb_event_to_device() event replication helper
      was created, my original thought was that FDB events on LAG interfaces
      should most likely be special-cased, not just replicated towards all
      switchdev ports beneath that LAG. So this replication helper currently
      does not recurse through switchdev lower interfaces of LAG bridge ports,
      but rather calls the lag_mod_cb() if that was provided.
      
      No switchdev driver uses this helper for FDB events on LAG interfaces
      yet, so that was an assumption which was yet to be tested. It is
      certainly usable for that purpose, as my RFC series shows:
      
      https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/
      
      however this approach is slightly convoluted because:
      
      - the switchdev driver gets a "dev" that isn't its own net device, but
        rather the LAG net device. It must call switchdev_lower_dev_find(dev)
        in order to get a handle of any of its own net devices (the ones that
        pass check_cb).
      
      - in order for FDB entries on LAG ports to be correctly refcounted per
        the number of switchdev ports beneath that LAG, we haven't escaped the
        need to iterate through the LAG's lower interfaces. Except that is now
        the responsibility of the switchdev driver, because the replication
        helper just stopped half-way.
      
      So, even though yes, FDB events on LAG bridge ports must be
      special-cased, in the end it's simpler to let switchdev_handle_fdb_*
      just iterate through the LAG port's switchdev lowers, and let the
      switchdev driver figure out that those physical ports are under a LAG.
      
      The switchdev_handle_fdb_event_to_device() helper takes a
      "foreign_dev_check" callback so it can figure out whether @dev can
      autonomously forward to @foreign_dev. DSA fills this method properly:
      if the LAG is offloaded by another port in the same tree as @dev, then
      it isn't foreign. If it is a software LAG, it is foreign - forwarding
      happens in software.
      
      Whether an interface is foreign or not decides whether the replication
      helper will go through the LAG's switchdev lowers or not. Since the
      lan966x doesn't properly fill this out, FDB events on software LAG
      uppers will get called. By changing lan966x_foreign_dev_check(), we can
      suppress them.
      
      Whereas DSA will now start receiving FDB events for its offloaded LAG
      uppers, so we need to return -EOPNOTSUPP, since we currently don't do
      the right thing for them.
      
      Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec638740
    • Vladimir Oltean's avatar
      net: dsa: create a dsa_lag structure · dedd6a00
      Vladimir Oltean authored
      The main purpose of this change is to create a data structure for a LAG
      as seen by DSA. This is similar to what we have for bridging - we pass a
      copy of this structure by value to ->port_lag_join and ->port_lag_leave.
      For now we keep the lag_dev, id and a reference count in it. Future
      patches will add a list of FDB entries for the LAG (these also need to
      be refcounted to work properly).
      
      The LAG structure is created using dsa_port_lag_create() and destroyed
      using dsa_port_lag_destroy(), just like we have for bridging.
      
      Because now, the dsa_lag itself is refcounted, we can simplify
      dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in
      the dst->lags array only as long as at least one port uses it. The
      refcounting logic inside those functions can be removed now - they are
      called only when we should perform the operation.
      
      dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag
      structure instead of the lag_dev net_device.
      
      dsa_lag_foreach_port() now takes the dsa_lag structure as argument.
      
      dst->lags holds an array of dsa_lag structures.
      
      dsa_lag_map() now also saves the dsa_lag->id value, so that linear
      walking of dst->lags in drivers using dsa_lag_id() is no longer
      necessary. They can just look at lag.id.
      
      dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(),
      which can be used by drivers to get the LAG ID assigned by DSA to a
      given port.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dedd6a00
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: use dsa_switch_for_each_port in mv88e6xxx_lag_sync_masks · b99dbdf0
      Vladimir Oltean authored
      Make the intent of the code more clear by using the dedicated helper for
      iterating over the ports of a switch.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b99dbdf0
    • Vladimir Oltean's avatar
      net: dsa: make LAG IDs one-based · 3d4a0a2a
      Vladimir Oltean authored
      The DSA LAG API will be changed to become more similar with the bridge
      data structures, where struct dsa_bridge holds an unsigned int num,
      which is generated by DSA and is one-based. We have a similar thing
      going with the DSA LAG, except that isn't stored anywhere, it is
      calculated dynamically by dsa_lag_id() by iterating through dst->lags.
      
      The idea of encoding an invalid (or not requested) LAG ID as zero for
      the purpose of simplifying checks in drivers means that the LAG IDs
      passed by DSA to drivers need to be one-based too. So back-and-forth
      conversion is needed when indexing the dst->lags array, as well as in
      drivers which assume a zero-based index.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d4a0a2a
    • Vladimir Oltean's avatar
      net: dsa: qca8k: rename references to "lag" as "lag_dev" · 066ce977
      Vladimir Oltean authored
      In preparation of converting struct net_device *dp->lag_dev into a
      struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
      all occurrences of the "lag" variable in qca8k to "lag_dev".
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      066ce977
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: rename references to "lag" as "lag_dev" · e23eba72
      Vladimir Oltean authored
      In preparation of converting struct net_device *dp->lag_dev into a
      struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
      all occurrences of the "lag" variable in mv88e6xxx to "lag_dev".
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e23eba72
    • Vladimir Oltean's avatar
      net: dsa: rename references to "lag" as "lag_dev" · 46a76724
      Vladimir Oltean authored
      In preparation of converting struct net_device *dp->lag_dev into a
      struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
      all occurrences of the "lag" variable in the DSA core to "lag_dev".
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      46a76724
    • Oleksij Rempel's avatar
      net: asix: remove code duplicates in asix_mdio_read/write and asix_mdio_read/write_nopm · 89183b6e
      Oleksij Rempel authored
      This functions are mostly same except of one hard coded "in_pm" variable.
      So, rework them to reduce maintenance overhead.
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/20220223110633.3006551-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89183b6e
    • Yang Yingliang's avatar
      net: marvell: prestera: Fix return value check in prestera_kern_fib_cache_find() · 37f40f81
      Yang Yingliang authored
      rhashtable_lookup_fast() returns NULL pointer not ERR_PTR(), so
      it can return fib_node directly in prestera_kern_fib_cache_find().
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220223084954.1771075-2-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      37f40f81
    • Yang Yingliang's avatar
      net: marvell: prestera: Fix return value check in prestera_fib_node_find() · d434ee9d
      Yang Yingliang authored
      rhashtable_lookup_fast() returns NULL pointer not ERR_PTR(), so
      it can return fib_node directly in prestera_fib_node_find().
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220223084954.1771075-1-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d434ee9d
    • Casper Andersson's avatar
      net: sparx5: Support offloading of bridge port flooding flags · 06388a03
      Casper Andersson authored
      Though the SparX-5i can control IPv4/6 multicasts separately from non-IP
      multicasts, these are all muxed onto the bridge's BR_MCAST_FLOOD flag.
      Signed-off-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Link: https://lore.kernel.org/r/20220223082700.qrot7lepwqcdnyzw@wse-c0155Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      06388a03
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · aaa25a2f
      Jakub Kicinski authored
      tools/testing/selftests/net/mptcp/mptcp_join.sh
        34aa6e3b ("selftests: mptcp: add ip mptcp wrappers")
      
        857898eb ("selftests: mptcp: add missing join check")
        6ef84b15 ("selftests: mptcp: more robust signal race test")
      https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/
      
      drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h
      drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c
        fb7e76ea ("net/mlx5e: TC, Skip redundant ct clear actions")
        c63741b4 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information")
      
        09bf9792 ("net/mlx5e: TC, Move pedit_headers_action to parse_attr")
        84ba8062 ("net/mlx5e: Test CT and SAMPLE on flow attr")
        efe6f961 ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow")
        3b49a7ed ("net/mlx5e: TC, Reject rules with multiple CT actions")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aaa25a2f
  2. 24 Feb, 2022 23 commits