1. 03 Mar, 2022 40 commits
    • David S. Miller's avatar
      Merge branch 'dsa-unicast-filtering' · 6fb8661c
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      DSA unicast filtering
      
      This series doesn't attempt anything extremely brave, it just changes
      the way in which standalone ports which support FDB isolation work.
      
      Up until now, DSA has recommended that switch drivers configure
      standalone ports in a separate VID/FID with learning disabled, and with
      the CPU port as the only destination, reached trivially via flooding.
      That works, except that standalone ports will deliver all packets to the
      CPU. We can leverage the hardware FDB as a MAC DA filter, and disable
      flooding towards the CPU port, to force the dropping of packets with
      unknown MAC DA.
      
      We handle port promiscuity by re-enabling flooding towards the CPU port.
      This is relevant because the bridge puts its automatic (learning +
      flooding) ports in promiscuous mode, and this makes some things work
      automagically, like for example bridging with a foreign interface.
      We don't delve yet into the territory of managing CPU flooding more
      aggressively while under a bridge.
      
      The only switch driver that benefits from this work right now is the
      NXP LS1028A switch (felix). The others need to implement FDB isolation
      first, before DSA is going to install entries to the port's standalone
      database. Otherwise, these entries might collide with bridge FDB/MDB
      entries.
      
      This work was done mainly to have all the required features in place
      before somebody starts seriously architecting DSA support for multiple
      CPU ports. Otherwise it is much more difficult to bolt these features on
      top of multiple CPU ports.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fb8661c
    • Vladimir Oltean's avatar
      net: mscc: ocelot: accept configuring bridge port flags on the NPI port · ac455209
      Vladimir Oltean authored
      In order for the Felix DSA driver to be able to turn on/off flooding
      towards its CPU port, we need to redirect calls on the NPI port to
      actually act upon the index in the analyzer block that corresponds to
      the CPU port module. This was never necessary until now because DSA
      (or the bridge) never called ocelot_port_bridge_flags() for the NPI
      port.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac455209
    • Vladimir Oltean's avatar
      net: dsa: felix: stop clearing CPU flooding in felix_setup_tag_8021q · 0cc36980
      Vladimir Oltean authored
      felix_migrate_flood_to_tag_8021q_port() takes care of clearing the
      flooding bits on the old CPU port (which was the CPU port module), so
      manually clearing this bit from PGID_UC, PGID_MC, PGID_BC is redundant.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cc36980
    • Vladimir Oltean's avatar
      net: dsa: felix: start off with flooding disabled on the CPU port · 90897569
      Vladimir Oltean authored
      The driver probes with all ports as standalone, and it supports unicast
      filtering. So DSA will call port_fdb_add() for all necessary addresses
      on the current CPU port. We also handle migrations when the CPU port
      hardware resource changes (on tagging protocol change), so there should
      not be any unknown address that we have to receive while not promiscuous.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90897569
    • Vladimir Oltean's avatar
      net: dsa: felix: migrate flood settings from NPI to tag_8021q CPU port · b903a6bd
      Vladimir Oltean authored
      When the tagging protocol changes from "ocelot" to "ocelot-8021q" or in
      reverse, the DSA promiscuity setting that was applied for the old CPU
      port must be transferred to the new one.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b903a6bd
    • Vladimir Oltean's avatar
      net: dsa: felix: migrate host FDB and MDB entries when changing tag proto · f9cef64f
      Vladimir Oltean authored
      The "ocelot" and "ocelot-8021q" tagging protocols make use of different
      hardware resources, and host FDB entries have different destination
      ports in the switch analyzer module, practically speaking.
      
      So when the user requests a tagging protocol change, the driver must
      migrate all host FDB and MDB entries from the NPI port (in fact CPU port
      module) towards the same physical port, but this time used as a regular
      port.
      
      It is pointless for the felix driver to keep a copy of the host
      addresses, when we can create and export DSA helpers for walking through
      the addresses that it already needs to keep on the CPU port, for
      refcounting purposes.
      
      felix_classify_db() is moved up to avoid a forward declaration.
      
      We pass "bool change" because dp->fdbs and dp->mdbs are uninitialized
      lists when felix_setup() first calls felix_set_tag_protocol(), so we
      need to avoid calling dsa_port_walk_fdbs() during probe time.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9cef64f
    • Vladimir Oltean's avatar
      net: dsa: manage flooding on the CPU ports · 7569459a
      Vladimir Oltean authored
      DSA can treat IFF_PROMISC and IFF_ALLMULTI on standalone user ports as
      signifying whether packets with an unknown MAC DA will be received or
      not. Since known MAC DAs are handled by FDB/MDB entries, this means that
      promiscuity is analogous to including/excluding the CPU port from the
      flood domain of those packets.
      
      There are two ways to signal CPU flooding to drivers.
      
      The first (chosen here) is to synthesize a call to
      ds->ops->port_bridge_flags() for the CPU port, with a mask of
      BR_FLOOD | BR_MCAST_FLOOD. This has the effect of turning on egress
      flooding on the CPU port regardless of source.
      
      The alternative would be to create a new ds->ops->port_host_flood()
      which is called per user port. Some switches (sja1105) have a flood
      domain that is managed per {ingress port, egress port} pair, so it would
      make more sense for this kind of switch to not flood the CPU from port A
      if just port B requires it. Nonetheless, the sja1105 has other quirks
      that prevent it from making use of unicast filtering, and without a
      concrete user making use of this feature, I chose not to implement it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7569459a
    • Vladimir Oltean's avatar
      net: dsa: install the primary unicast MAC address as standalone port host FDB · 499aa9e1
      Vladimir Oltean authored
      To be able to safely turn off CPU flooding for standalone ports, we need
      to ensure that the dev_addr of each DSA slave interface is installed as
      a standalone host FDB entry for compatible switches.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      499aa9e1
    • Vladimir Oltean's avatar
      net: dsa: install secondary unicast and multicast addresses as host FDB/MDB · 5e8a1e03
      Vladimir Oltean authored
      In preparation of disabling flooding towards the CPU in standalone ports
      mode, identify the addresses requested by upper interfaces and use the
      new API for DSA FDB isolation to request the hardware driver to offload
      these as FDB or MDB objects. The objects belong to the user port's
      database, and are installed pointing towards the CPU port.
      
      Because dev_uc_add()/dev_mc_add() is VLAN-unaware, we offload to the
      port standalone database addresses with VID 0 (also VLAN-unaware).
      So this excludes switches with global VLAN filtering from supporting
      unicast filtering, because there, it is possible for a port of a switch
      to join a VLAN-aware bridge, and this changes the VLAN awareness of
      standalone ports, requiring VLAN-aware standalone host FDB entries.
      For the same reason, hellcreek, which requires VLAN awareness in
      standalone mode, is also exempted from unicast filtering.
      
      We create "standalone" variants of dsa_port_host_fdb_add() and
      dsa_port_host_mdb_add() (and the _del coresponding functions).
      
      We also create a separate work item type for handling deferred
      standalone host FDB/MDB entries compared to the switchdev one.
      This is done for the purpose of clarity - the procedure for offloading a
      bridge FDB entry is different than offloading a standalone one, and
      the switchdev event work handles only FDBs anyway, not MDBs.
      Deferral is needed for standalone entries because ndo_set_rx_mode runs
      in atomic context. We could probably optimize things a little by first
      queuing up all entries that need to be offloaded, and scheduling the
      work item just once, but the data structures that we can pass through
      __dev_uc_sync() and __dev_mc_sync() are limiting (there is nothing like
      a void *priv), so we'd have to keep the list of queued events somewhere
      in struct dsa_switch, and possibly a lock for it. Too complicated for
      now.
      
      Adding the address to the master is handled by dev_uc_sync(), adding it
      to the hardware is handled by __dev_uc_sync(). So this is the reason why
      dsa_port_standalone_host_fdb_add() does not call dev_uc_add(). Not that
      it had the rtnl_mutex anyway - ndo_set_rx_mode has it, but is atomic.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e8a1e03
    • Vladimir Oltean's avatar
      net: dsa: rename the host FDB and MDB methods to contain the "bridge" namespace · 68d6d71e
      Vladimir Oltean authored
      We are preparing to add API in port.c that adds FDB and MDB entries that
      correspond to the port's standalone database. Rename the existing
      methods to make it clear that the FDB and MDB entries offloaded come
      from the bridge database.
      
      Since the function names lengthen in dsa_slave_switchdev_event_work(),
      we place "addr" and "vid" in temporary variables, to shorten those.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68d6d71e
    • Vladimir Oltean's avatar
      net: dsa: remove workarounds for changing master promisc/allmulti only while up · 35aae5ab
      Vladimir Oltean authored
      Lennert Buytenhek explains in commit df02c6ff ("dsa: fix master
      interface allmulti/promisc handling"), dated Nov 2008, that changing the
      promiscuity of interfaces that are down (here the master) is broken.
      
      This fact regarding promisc/allmulti has changed since commit
      b6c40d68 ("net: only invoke dev->change_rx_flags when device is UP")
      by Vlad Yasevich, dated Nov 2013.
      
      Therefore, DSA now has unnecessary complexity to handle master state
      transitions from down to up. In fact, syncing the unicast and multicast
      addresses can happen completely asynchronously to the administrative
      state changes.
      
      This change reduces that complexity by effectively fully reverting
      commit df02c6ff ("dsa: fix master interface allmulti/promisc
      handling").
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35aae5ab
    • Karol Kolacinski's avatar
      ice: add TTY for GNSS module for E810T device · 43113ff7
      Karol Kolacinski authored
      Add a new ice_gnss.c file for holding the basic GNSS module functions.
      If the device supports GNSS module, call the new ice_gnss_init and
      ice_gnss_release functions where appropriate.
      
      Implement basic functionality for reading the data from GNSS module
      using TTY device.
      
      Add I2C read AQ command. It is now required for controlling the external
      physical connectors via external I2C port expander on E810-T adapters.
      
      Future changes will introduce write functionality.
      Signed-off-by: default avatarKarol Kolacinski <karol.kolacinski@intel.com>
      Signed-off-by: default avatarSudhansu Sekhar Mishra <sudhansu.mishra@intel.com>
      Tested-by: default avatarSunitha Mekala <sunithax.d.mekala@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43113ff7
    • David S. Miller's avatar
      Merge branch 'nfc-llcp-cleanups' · ef132dc4
      David S. Miller authored
      Krzysztof Kozlowski says:
      
      ====================
      nfc: llcp: few cleanups/improvements
      
      These are improvements, not fixing any experienced issue, just looking correct
      to me from the code point of view.
      
      Changes since v1
      ================
      1. Split from the fix.
      
      Testing
      =======
      Under QEMU only. The NFC/LLCP code was not really tested on a device.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef132dc4
    • Krzysztof Kozlowski's avatar
      nfc: llcp: Revert "NFC: Keep socket alive until the DISC PDU is actually sent" · 44cd5765
      Krzysztof Kozlowski authored
      This reverts commit 17f7ae16.
      
      The commit brought a new socket state LLCP_DISCONNECTING, which was
      never set, only read, so socket could never set to such state.
      
      Remove the dead code.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44cd5765
    • Krzysztof Kozlowski's avatar
      nfc: llcp: protect nfc_llcp_sock_unlink() calls · a06b8044
      Krzysztof Kozlowski authored
      nfc_llcp_sock_link() is called in all paths (bind/connect) as a last
      action, still protected with lock_sock().  When cleaning up in
      llcp_sock_release(), call nfc_llcp_sock_unlink() in a mirrored way:
      earlier and still under the lock_sock().
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a06b8044
    • Krzysztof Kozlowski's avatar
      nfc: llcp: use test_bit() · a7364912
      Krzysztof Kozlowski authored
      Use test_bit() instead of open-coding it, just like in other places
      touching the bitmap.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7364912
    • Krzysztof Kozlowski's avatar
      nfc: llcp: use centralized exiting of bind on errors · 4dbbf673
      Krzysztof Kozlowski authored
      Coding style encourages centralized exiting of functions, so rewrite
      llcp_sock_bind() error paths to use such pattern.  This reduces the
      duplicated cleanup code, make success path visually shorter and also
      cleans up the errors in proper order (in reversed way from
      initialization).
      
      No functional impact expected.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4dbbf673
    • Krzysztof Kozlowski's avatar
      nfc: llcp: simplify llcp_sock_connect() error paths · ec10fd15
      Krzysztof Kozlowski authored
      The llcp_sock_connect() error paths were using a mixed way of central
      exit (goto) and cleanup
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec10fd15
    • Krzysztof Kozlowski's avatar
      nfc: llcp: nullify llcp_sock->dev on connect() error paths · 13a3585b
      Krzysztof Kozlowski authored
      Nullify the llcp_sock->dev on llcp_sock_connect() error paths,
      symmetrically to the code llcp_sock_bind().  The non-NULL value of
      llcp_sock->dev is used in a few places to check whether the socket is
      still valid.
      
      There was no particular issue observed with missing NULL assignment in
      connect() error path, however a similar case - in the bind() error path
      - was triggereable.  That one was fixed in commit 4ac06a1e ("nfc:
      fix NULL ptr dereference in llcp_sock_getname() after failed connect"),
      so the change here seems logical as well.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13a3585b
    • David S. Miller's avatar
      Merge branch 'net-hw-counters-for-soft-devices' · ca0a53dc
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      HW counters for soft devices
      
      Petr says:
      
      Offloading switch device drivers may be able to collect statistics of the
      traffic taking place in the HW datapath that pertains to a certain soft
      netdevice, such as a VLAN. In this patch set, add the necessary
      infrastructure to allow exposing these statistics to the offloaded
      netdevice in question, and add mlxsw offload.
      
      Across HW platforms, the counter itself very likely constitutes a limited
      resource, and the act of counting may have a performance impact. Therefore
      this patch set makes the HW statistics collection opt-in and togglable from
      userspace on a per-netdevice basis.
      
      Additionally, HW devices may have various limiting conditions under which
      they can realize the counter. Therefore it is also possible to query
      whether the requested counter is realized by any driver. In TC parlance,
      which is to a degree reused in this patch set, two values are recognized:
      "request" tracks whether the user enabled collecting HW statistics, and
      "used" tracks whether any HW statistics are actually collected.
      
      In the past, this author has expressed the opinion that `a typical user
      doing "ip -s l sh", including various scripts, wants to see the full
      picture and not worry what's going on where'. While that would be nice,
      unfortunately it cannot work:
      
      - Packets that trap from the HW datapath to the SW datapath would be
        double counted.
      
        For a given netdevice, some traffic can be purely a SW artifact, and some
        may flow through the HW object corresponding to the netdevice. But some
        traffic can also get trapped to the SW datapath after bumping the HW
        counter. It is not clear how to make sure double-counting does not occur
        in the SW datapath in that case, while still making sure that possibly
        divergent SW forwarding path gets bumped as appropriate.
      
        So simply adding HW and SW stats may work roughly, most of the time, but
        there are scenarios where the result is nonsensical.
      
      - HW devices will have limitations as to what type of traffic they can
        count.
      
        In case of mlxsw, which is part of this patch set, there is no reasonable
        way to count all traffic going through a certain netdevice, such as a
        VLAN netdevice enslaved to a bridge. It is however very simple to count
        traffic flowing through an L3 object, such as a VLAN netdevice with an IP
        address.
      
        Similarly for physical netdevices, the L3 object at which the counter is
        installed is the subport carrying untagged traffic.
      
        These are not "just counters". It is important that the user understands
        what is being counted. It would be incorrect to conflate these statistics
        with another existing statistics suite.
      
      To that end, this patch set introduces a statistics suite called "L3
      stats". This label should make it easy to understand what is being counted,
      and to decide whether a given device can or cannot implement this suite for
      some type of netdevice. At the same time, the code is written to make
      future extensions easy, should a device pop up that can implement a
      different flavor of statistics suite (say L2, or an address-family-specific
      suite).
      
      For example, using a work-in-progress iproute2[1], to turn on and then list
      the counters on a VLAN netdevice:
      
          # ip stats set dev swp1.200 l3_stats on
          # ip stats show dev swp1.200 group offload subgroup l3_stats
          56: swp1.200: group offload subgroup l3_stats on used on
      	RX:  bytes packets errors dropped  missed   mcast
      		0       0      0       0       0       0
      	TX:  bytes packets errors dropped carrier collsns
      		0       0      0       0       0       0
      
      The patchset progresses as follows:
      
      - Patch #1 is a cleanup.
      
      - In patch #2, remove the assumption that all LINK_OFFLOAD_XSTATS are
        dev-backed.
      
        The only attribute defined under the nest is currently
        IFLA_OFFLOAD_XSTATS_CPU_HIT. L3_STATS differs from CPU_HIT in that the
        driver that supplies the statistics is not the same as the driver that
        implements the netdevice. Make the code compatible with this in patch #2.
      
      - In patch #3, add the possibility to filter inside nests.
      
        The filter_mask field of RTM_GETSTATS header determines which
        top-level attributes should be included in the netlink response. This
        saves processing time by only including the bits that the user cares
        about instead of always dumping everything. This is doubly important
        for HW-backed statistics that would typically require a trip to the
        device to fetch the stats. In this patch, the UAPI is extended to
        allow filtering inside IFLA_STATS_LINK_OFFLOAD_XSTATS in particular,
        but the scheme is easily extensible to other nests as well.
      
      - In patch #4, propagate extack where we need it.
        In patch #5, make it possible to propagate errors from drivers to the
        user.
      
      - In patch #6, add the in-kernel APIs for keeping track of the new stats
        suite, and the notifiers that the core uses to communicate with the
        drivers.
      
      - In patch #7, add UAPI for obtaining the new stats suite.
      
      - In patch #8, add a new UAPI message, RTM_SETSTATS, which will carry
        the message to toggle the newly-added stats suite.
        In patch #9, add the toggle itself.
      
      At this point the core is ready for drivers to add support for the new
      stats suite.
      
      - In patches #10, #11 and #12, apply small tweaks to mlxsw code.
      
      - In patch #13, add support for L3 stats, which are realized as RIF
        counters.
      
      - Finally in patch #14, a selftest is added to the net/forwarding
        directory. Technically this is a HW-specific test, in that without a HW
        implementing the counters, it just will not pass. But devices that
        support L3 statistics at all are likely to be able to reuse this
        selftest, so it seems appropriate to put it in the general forwarding
        directory.
      
      We also have a netdevsim implementation, and a corresponding selftest that
      verifies specifically some of the core code. We intend to contribute these
      later. Interested parties can take a look at the raw code at [2].
      
      [1] https://github.com/pmachata/iproute2/commits/soft_counters
      [2] https://github.com/pmachata/linux_mlxsw/commits/petrm_soft_counters_2
      
      v2:
      - Patch #3:
          - Do not declare strict_start_type at the new policies, since they are
            used with nla_parse_nested() (sans _deprecated).
          - Use NLA_POLICY_NESTED to declare what the nest contents should be
          - Use NLA_POLICY_MASK instead of BITFIELD32 for the filtering
            attribute.
      - Patch #6:
          - s/monotonous/monotonic/ in commit message
          - Use a newly-added struct rtnl_hw_stats64 for stats transfer
      - Patch #7:
          - Use a newly-added struct rtnl_hw_stats64 for stats transfer
      - Patch #8:
          - Do not declare strict_start_type at the new policies, since they are
            used with nla_parse_nested() (sans _deprecated).
      - Patch #13:
          - Use a newly-added struct rtnl_hw_stats64 for stats transfer
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca0a53dc
    • Petr Machata's avatar
      selftests: forwarding: hw_stats_l3: Add a new test · ba95e793
      Petr Machata authored
      Add a test that verifies operation of L3 HW statistics.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba95e793
    • Petr Machata's avatar
      mlxsw: Add support for IFLA_OFFLOAD_XSTATS_L3_STATS · 8d0f7d3a
      Petr Machata authored
      Spectrum machines support L3 stats by binding a counter to a RIF, a
      hardware object representing a router interface. Recognize the netdevice
      notifier events, NETDEV_OFFLOAD_XSTATS_*, to support enablement,
      disablement, and reporting back to core.
      
      As a netdevice gains a RIF, if L3 stats are enabled, install the counters,
      and ping the core so that a userspace notification can be emitted.
      
      Similarly, as a netdevice loses a RIF, push the as-yet-unreported
      statistics to the core, so that they are not lost, and ping the core to
      emit userspace notification.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d0f7d3a
    • Petr Machata's avatar
      mlxsw: Extract classification of router-related events to a helper · c1de13f9
      Petr Machata authored
      Several more events are coming in the following patches, and extending the
      if statement is getting awkward. Instead, convert it to a switch.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1de13f9
    • Petr Machata's avatar
      mlxsw: spectrum_router: Drop mlxsw_sp arg from counter alloc/free functions · 9834e246
      Petr Machata authored
      The mlxsw_sp reference is carried by the mlxsw_sp_rif object that is passed
      to these functions as well. Just deduce the former from the latter,
      and drop the explicit mlxsw_sp parameter. Adapt callers.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9834e246
    • Petr Machata's avatar
      mlxsw: reg: Fix packing of router interface counters · 8fe96f58
      Petr Machata authored
      The function mlxsw_reg_ritr_counter_pack() formats a register to configure
      a router interface (RIF) counter. The parameter `egress' determines whether
      an ingress or egress counter is to be configured. RITR, the register in
      question, has two sets of counter-related fields: one for ingress, one for
      egress. When setting values of the fields, the function sets the proper
      counter index field, but when setting the counter type, it always sets the
      egress field. Thus configuration of ingress counters is broken, and in fact
      an attempt to configure an ingress counter mangles a previously configured
      egress counter.
      
      This was never discovered, because there is currently no way to enable
      ingress counters on a router interface, only the egress one.
      
      Fix in an obvious way.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fe96f58
    • Petr Machata's avatar
      net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS · 5fd0b838
      Petr Machata authored
      The offloaded HW stats are designed to allow per-netdevice enablement and
      disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS,
      which should be carried by the RTM_SETSTATS message, and expresses a desire
      to toggle L3 offload xstats on or off.
      
      As part of the above, add an exported function rtnl_offload_xstats_notify()
      that drivers can use when they have installed or deinstalled the counters
      backing the HW stats.
      
      At this point, it is possible to enable, disable and query L3 offload
      xstats on netdevices. (However there is no driver actually implementing
      these.)
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fd0b838
    • Petr Machata's avatar
      net: rtnetlink: Add RTM_SETSTATS · 03ba3566
      Petr Machata authored
      The offloaded HW stats are designed to allow per-netdevice enablement and
      disablement. These stats are only accessible through RTM_GETSTATS, and
      therefore should be toggled by a RTM_SETSTATS message. Add it, and the
      necessary skeleton handler.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03ba3566
    • Petr Machata's avatar
      net: rtnetlink: Add UAPI for obtaining L3 offload xstats · 0e7788fd
      Petr Machata authored
      Add a new IFLA_STATS_LINK_OFFLOAD_XSTATS child attribute,
      IFLA_OFFLOAD_XSTATS_L3_STATS, to carry statistics for traffic that takes
      place in a HW router.
      
      The offloaded HW stats are designed to allow per-netdevice enablement and
      disablement. Additionally, as a netdevice is configured, it may become or
      cease being suitable for binding of a HW counter. Both of these aspects
      need to be communicated to the userspace. To that end, add another child
      attribute, IFLA_OFFLOAD_XSTATS_HW_S_INFO:
      
          - attr nest IFLA_OFFLOAD_XSTATS_HW_S_INFO
      	- attr nest IFLA_OFFLOAD_XSTATS_L3_STATS
       	    - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST
      	      - {0,1} as u8
       	    - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED
      	      - {0,1} as u8
      
      Thus this one attribute is a nest that can be used to carry information
      about various types of HW statistics, and indexing is very simply done by
      wrapping the information for a given statistics suite into the attribute
      that carries the suite is the RTM_GETSTATS query. At the same time, because
      _HW_S_INFO is nested directly below IFLA_STATS_LINK_OFFLOAD_XSTATS, it is
      possible through filtering to request only the metadata about individual
      statistics suites, without having to hit the HW to get the actual counters.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e7788fd
    • Petr Machata's avatar
      net: dev: Add hardware stats support · 9309f97a
      Petr Machata authored
      Offloading switch device drivers may be able to collect statistics of the
      traffic taking place in the HW datapath that pertains to a certain soft
      netdevice, such as VLAN. Add the necessary infrastructure to allow exposing
      these statistics to the offloaded netdevice in question. The API was shaped
      by the following considerations:
      
      - Collection of HW statistics is not free: there may be a finite number of
        counters, and the act of counting may have a performance impact. It is
        therefore necessary to allow toggling whether HW counting should be done
        for any particular SW netdevice.
      
      - As the drivers are loaded and removed, a particular device may get
        offloaded and unoffloaded again. At the same time, the statistics values
        need to stay monotonic (modulo the eventual 64-bit wraparound),
        increasing only to reflect traffic measured in the device.
      
        To that end, the netdevice keeps around a lazily-allocated copy of struct
        rtnl_link_stats64. Device drivers then contribute to the values kept
        therein at various points. Even as the driver goes away, the struct stays
        around to maintain the statistics values.
      
      - Different HW devices may be able to count different things. The
        motivation behind this patch in particular is exposure of HW counters on
        Nvidia Spectrum switches, where the only practical approach to counting
        traffic on offloaded soft netdevices currently is to use router interface
        counters, and count L3 traffic. Correspondingly that is the statistics
        suite added in this patch.
      
        Other devices may be able to measure different kinds of traffic, and for
        that reason, the APIs are built to allow uniform access to different
        statistics suites.
      
      - Because soft netdevices and offloading drivers are only loosely bound, a
        netdevice uses a notifier chain to communicate with the drivers. Several
        new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages
        to the offloading drivers.
      
      - Devices can have various conditions for when a particular counter is
        available. As the device is configured and reconfigured, the device
        offload may become or cease being suitable for counter binding. A
        netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to
        ping offloading drivers and determine whether anyone currently implements
        a given statistics suite. This information can then be propagated to user
        space.
      
        When the driver decides to unoffload a netdevice, it can use a
        newly-added function, netdev_offload_xstats_report_delta(), to record
        outstanding collected statistics, before destroying the HW counter.
      
      This patch adds a helper, call_netdevice_notifiers_info_robust(), for
      dispatching a notifier with the possibility of unwind when one of the
      consumers bails. Given the wish to eventually get rid of the global
      notifier block altogether, this helper only invokes the per-netns notifier
      block.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9309f97a
    • Petr Machata's avatar
      net: rtnetlink: rtnl_fill_statsinfo(): Permit non-EMSGSIZE error returns · 216e6906
      Petr Machata authored
      Obtaining stats for the IFLA_STATS_LINK_OFFLOAD_XSTATS nest involves a HW
      access, and can fail for more reasons than just netlink message size
      exhaustion. Therefore do not always return -EMSGSIZE on the failure path,
      but respect the error code provided by the callee. Set the error explicitly
      where it is reasonable to assume -EMSGSIZE as the failure reason.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      216e6906
    • Petr Machata's avatar
      net: rtnetlink: Propagate extack to rtnl_offload_xstats_fill() · 05415bcc
      Petr Machata authored
      Later patches add handlers for more HW-backed statistics. An extack will be
      useful when communicating HW / driver errors to the client. Add the
      arguments as appropriate.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05415bcc
    • Petr Machata's avatar
      net: rtnetlink: RTM_GETSTATS: Allow filtering inside nests · 46efc97b
      Petr Machata authored
      The filter_mask field of RTM_GETSTATS header determines which top-level
      attributes should be included in the netlink response. This saves
      processing time by only including the bits that the user cares about
      instead of always dumping everything. This is doubly important for
      HW-backed statistics that would typically require a trip to the device to
      fetch the stats.
      
      So far there was only one HW-backed stat suite per attribute. However,
      IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in
      the following patches. It would therefore be advantageous to be able to
      filter within that nest, and select just one or the other HW-backed
      statistics suite.
      
      Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload.
      The scheme is as follows:
      
          - RTM_GETSTATS
      	- struct if_stats_msg
      	- attr nest IFLA_STATS_GET_FILTERS
      	    - attr IFLA_STATS_LINK_OFFLOAD_XSTATS
      		- u32 filter_mask
      
      This scheme reuses the existing enumerators by nesting them in a dedicated
      context attribute. This is covered by policies as usual, therefore a
      gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS
      nest has filtering enabled, because for the SW counters the issue does not
      seem to be that important.
      
      rtnl_offload_xstats_get_size() and _fill() are extended to observe the
      requested filters.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46efc97b
    • Petr Machata's avatar
      net: rtnetlink: Stop assuming that IFLA_OFFLOAD_XSTATS_* are dev-backed · f6e0fb81
      Petr Machata authored
      The IFLA_STATS_LINK_OFFLOAD_XSTATS attribute is a nest whose child
      attributes carry various special hardware statistics. The code that handles
      this nest was written with the idea that all these statistics would be
      exposed by the device driver of a physical netdevice.
      
      In the following patches, a new attribute is added to the abovementioned
      nest, which however can be defined for some soft netdevices. The NDO-based
      approach to querying these does not work, because it is not the soft
      netdevice driver that exposes these statistics, but an offloading NIC
      driver that does so.
      
      The current code does not scale well to this usage. Simply rewrite it back
      to the pattern seen in other fill-like and get_size-like functions
      elsewhere.
      
      Extract to helpers the code that is concerned with handling specifically
      NDO-backed statistics so that it can be easily reused should more such
      statistics be added.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6e0fb81
    • Petr Machata's avatar
      net: rtnetlink: Namespace functions related to IFLA_OFFLOAD_XSTATS_* · 6b524a1d
      Petr Machata authored
      The currently used names rtnl_get_offload_stats() and
      rtnl_get_offload_stats_size() do not clearly show the namespace. The former
      function additionally seems to have been named this way in accordance with
      the NDO name, as opposed to the naming used in the rtnetlink.c file (and
      indeed elsewhere in the netlink handling code). As more and
      differently-flavored attributes are introduced, a common clear prefix is
      needed for all related functions.
      
      Rename the functions to follow the rtnl_offload_xstats_* naming scheme.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b524a1d
    • Manish Chopra's avatar
      qed: validate and restrict untrusted VFs vlan promisc mode · cbcc44db
      Manish Chopra authored
      Today when VFs are put in promiscuous mode, they can request PF
      to configure device for them to receive all VLANs traffic regardless
      of what vlan is configured by the PF (via ip link) and PF allows this
      config request regardless of whether VF is trusted or not.
      
      From security POV, when VLAN is configured for VF through PF (via ip link),
      honour such config requests from VF only when they are configured to be
      trusted, otherwise restrict such VFs vlan promisc mode config.
      
      Cc: stable@vger.kernel.org
      Fixes: f990c82c ("qed*: Add support for ndo_set_vf_trust")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbcc44db
    • Manish Chopra's avatar
      qed: display VF trust config · 4e6e6bec
      Manish Chopra authored
      Driver does support SR-IOV VFs trust configuration but
      it does not display it when queried via ip link utility.
      
      Cc: stable@vger.kernel.org
      Fixes: f990c82c ("qed*: Add support for ndo_set_vf_trust")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e6e6bec
    • David S. Miller's avatar
      Merge branch 'stmmac-SA8155p-ADP' · d52b4536
      David S. Miller authored
      @ 2022-03-02 10:39 Bhupesh Sharma
        2022-03-02 10:39 ` [PATCH v2 1/2 net-next] net: stmmac: Add support for SM8150 Bhupesh Sharma
        2022-03-02 10:39 ` [PATCH v2 2/2 net-next] net: stmmac: dwmac-qcom-ethqos: Adjust rgmii loopback_en per platform Bhupesh Sharma
        0 siblings, 2 replies; 3+ messages in thread
      Bhupesh Sharma says:
      
      ====================
      net: stmmac: Enable support for Qualcomm SA8155p-ADP board
      
      Changes since v1:
      -----------------
      - v1 can be seen here: https://lore.kernel.org/netdev/20220126221725.710167-1-bhupesh.sharma@linaro.org/t/
      - Fixed review comments from Bjorn - broke the v1 series into two
        separate series - one each for 'net' tree and 'arm clock/dts' tree
        - so as to ease review of the same from the respective maintainers.
      - This series is intended for the 'net' tree.
      
      The SA8155p-ADP board supports on-board ethernet (Gibabit Interface),
      with support for both RGMII and RMII buses.
      
      This patchset adds the support for the same.
      
      Note that this patchset is based on an earlier sent patchset
      for adding PDC controller support on SM8150 (see [1]).
      
      [1]. https://lore.kernel.org/linux-arm-msm/20220226184028.111566-1-bhupesh.sharma@linaro.org/T/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d52b4536
    • Bjorn Andersson's avatar
      net: stmmac: dwmac-qcom-ethqos: Adjust rgmii loopback_en per platform · a7bf6d7c
      Bjorn Andersson authored
      Not all platforms should have RGMII_CONFIG_LOOPBACK_EN and the result it
      about 50% packet loss on incoming messages. So make it possile to
      configure this per compatible and enable it for QCS404.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7bf6d7c
    • Vinod Koul's avatar
      net: stmmac: Add support for SM8150 · d90b3120
      Vinod Koul authored
      This adds compatible, POR config & driver data for ethernet controller
      found in SM8150 SoC.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      [bhsharma: Massage the commit log and other cosmetic changes]
      Signed-off-by: default avatarBhupesh Sharma <bhupesh.sharma@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d90b3120
    • David S. Miller's avatar
      Merge branch 'page_pool-stats' · a8ff736d
      David S. Miller authored
      Joe Damato says:
      
      ====================
      page_pool: Add stats counters
      
      Greetings:
      
      Welcome to v9.
      
      This revisions adds a commit which updates the page_pool documentation to
      describe the stats API, structures, and fields.
      
      Additionally, this revision contains a minor cosmetic change suggested by
      Saeed in page_pool_recycle_in_ring in commit 2: "page_pool: Add recycle
      stats", which removes an unnecessary #ifdef.
      
      There are no functional changes in this revision.
      
      Benchmark output from the v7 cover [1] is pasted below, as it is still
      relevant since no functional changes have been made in this revision:
      
      Benchmarks have been re-run. As always, results between runs are highly
      variable; you'll find results showing that stats disabled are both faster
      and slower than stats enabled in back to back benchmark runs.
      
      Raw benchmark output with stats off [2] and stats on [3] are available for
      examination.
      
      Test system:
      	- 2x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
      	- 2 NUMA zones, with 18 cores per zone and 2 threads per core
      
      bench_page_pool_simple results, loops=200000000
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      for_loop			0	0.335		0	0.336
      atomic_inc 			14	6.106		13	6.022
      lock				30	13.365		32	13.968
      
      no-softirq-page_pool01		75	32.884		74	32.308
      no-softirq-page_pool02		79	34.696		74	32.302
      no-softirq-page_pool03		110	48.005		105	46.073
      
      tasklet_page_pool01_fast_path	14	6.156		14	6.211
      tasklet_page_pool02_ptr_ring	41	18.028		39	17.391
      tasklet_page_pool03_slow	107	46.646		105	46.123
      
      bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=4:
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      page_pool_cross_cpu CPU(0)	3973	1731.596	4015	1750.015
      page_pool_cross_cpu CPU(1)	3976	1733.217	4022	1752.864
      page_pool_cross_cpu CPU(2)	3973	1731.615	4016	1750.433
      page_pool_cross_cpu CPU(3)	3976	1733.218	4021	1752.806
      page_pool_cross_cpu CPU(4)	994	433.305		1005	438.217
      
      page_pool_cross_cpu average	3378	-		3415	-
      
      bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=8:
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      page_pool_cross_cpu CPU(0)	6969	3037.488	6909	3011.463
      page_pool_cross_cpu CPU(1)	6974	3039.469	6913	3012.961
      page_pool_cross_cpu CPU(2)	6969	3037.575	6910	3011.585
      page_pool_cross_cpu CPU(3)	6974	3039.415	6913	3012.961
      page_pool_cross_cpu CPU(4)	6969	3037.288	6909	3011.368
      page_pool_cross_cpu CPU(5)	6972	3038.732	6913	3012.920
      page_pool_cross_cpu CPU(6)	6969	3037.350	6909	3011.386
      page_pool_cross_cpu CPU(7)	6973	3039.356	6913	3012.921
      page_pool_cross_cpu CPU(8)	871	379.934		864	376.620
      
      page_pool_cross_cpu average	6293	-		6239	-
      
      Thanks.
      
      [1]: https://lore.kernel.org/all/1645810914-35485-1-git-send-email-jdamato@fastly.com/
      [2]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_disabled
      [3]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_enabled
      
      v8 -> v9:
      	- Add documentation about the page_pool_get_stats API, stats
      	  structures, and fields to Documentation/networking/page_pool.rst.
      	- Remove unnecessary #ifdef in page_pool_recycle_in_ring.
      
      v7 -> v8:
      	- Rename mlx5 ethtool stats so that users have a better idea of
      	  their meaning.
      
      v6 -> v7:
      	- stats split out into two structs one single per-page pool struct
      	  for allocation path stats and one per-cpu pointer for recycle
      	  path stats.
      	- page_pool_get_stats updated to use a wrapper struct to gather
      	  stats for allocation and recycle stats with a single argument.
      	- placement of structs adjusted
      	- mlx5 driver modified to use page_pool_get_stats API
      
      v5 -> v6:
      	- Per cpu page_pool_stats struct pointer is now marked as
      	  ____cacheline_aligned_in_smp. Placement of the field in the
      	  struct is unchanged; it is the last field.
      
      v4 -> v5:
      	- Fixed the description of the kernel option in Kconfig.
      	- Squashed commits 1-10 from v4 into a single commit for easier
      	  review.
      	- Changed the comment style of the comment for
      	  the this_cpu_inc_alloc_stat macro.
      	- Changed the return type of page_pool_get_stats from struct
      	  page_pool_stat * to bool.
      
      v3 -> v4:
      	- Restructured stats to be per-cpu per-pool.
      	- Global stats and proc file were removed.
      	- Exposed an API (page_pool_get_stats) for batching the pool stats.
      
      v2 -> v3:
      	- patch 8/10 ("Add stat tracking cache refill") fixed placement of
      	  counter increment.
      	- patch 10/10 ("net-procfs: Show page pool stats in proc") updated:
      		- fix unused label warning from kernel test robot,
      		- fixed page_pool_seq_show to only display the refill stat
      		  once,
      		- added a remove_proc_entry for page_pool_stat to
      		  dev_proc_net_exit.
      
      v1 -> v2:
      	- A new kernel config option has been added, which defaults to N,
      	   preventing this code from being compiled in by default
      	- The stats structure has been converted to a per-cpu structure
      	- The stats are now exported via proc (/proc/net/page_pool_stat)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8ff736d