1. 23 Jan, 2023 12 commits
    • Vladimir Oltean's avatar
      net: mscc: ocelot: add MAC Merge layer support for VSC9959 · 6505b680
      Vladimir Oltean authored
      Felix (VSC9959) has a DEV_GMII:MM_CONFIG block composed of 2 registers
      (ENABLE_CONFIG and VERIF_CONFIG). Because the MAC Merge statistics and
      pMAC statistics are already in the Ocelot switch lib even if just Felix
      supports them, I'm adding support for the whole MAC Merge layer in the
      common Ocelot library too.
      
      There is an interrupt (shared with the PTP interrupt) which signals
      changes to the MM verification state. This is done because the
      preemptible traffic classes should be committed to hardware only once
      the verification procedure has declared the link partner of being
      capable of receiving preemptible frames.
      
      We implement ethtool getters and setters for the MAC Merge layer state.
      The "TX enabled" and "verify status" are taken from the IRQ handler,
      using a mutex to ensure serialized access.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6505b680
    • Vladimir Oltean's avatar
      net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959 · ab3f97a9
      Vladimir Oltean authored
      The Felix VSC9959 switch supports frame preemption and has a MAC Merge
      layer. In addition to the structured stats that exist for the eMAC,
      export the counters associated with its pMAC (pause, RMON, MAC, PHY,
      control) plus the high-level MAC Merge layer stats. The unstructured
      ethtool counters, as well as the rtnl_link_stats64 were left to report
      only the eMAC counters.
      
      Because statistics processing is quite self-contained in ocelot_stats.c
      now, I've opted for introducing an ocelot->mm_supported bool, based on
      which the common switch lib does everything, rather than pushing the
      TSN-specific code in felix_vsc9959.c, as happens for other TSN stuff.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab3f97a9
    • Vladimir Oltean's avatar
      net: mscc: ocelot: hide access to ocelot_stats_layout behind a helper · 497eea9f
      Vladimir Oltean authored
      Some hardware instances of the ocelot driver support the MAC Merge
      layer, which gives access to an extra preemptible MAC. This has
      implications upon the statistics. There will be a stats layout when MM
      isn't supported, and a different one when it is.
      The ocelot_stats_layout() helper will return the correct one.
      In preparation of that, refactor the existing code to use this helper.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      497eea9f
    • Vladimir Oltean's avatar
      net: mscc: ocelot: allow ocelot_stat_layout elements with no name · 1a733bbd
      Vladimir Oltean authored
      We will add support for pMAC counters and MAC merge layer counters,
      which are only reported via the structured stats, and the current
      ocelot_get_strings() stands in our way, because it expects that the
      statistics should be placed in the data array at the same index as found
      in the ocelot_stats_layout array.
      
      That is not true. Statistics which don't have a name should not be
      exported to the unstructured ethtool -S, so we need to have different
      indices into the ocelot_stats_layout array (i) and into the data array
      (data itself).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a733bbd
    • Vladimir Oltean's avatar
      net: dsa: add plumbing for changing and getting MAC merge layer state · 5f6c2d49
      Vladimir Oltean authored
      The DSA core is in charge of the ethtool_ops of the net devices
      associated with switch ports, so in case a hardware driver supports the
      MAC merge layer, DSA must pass the callbacks through to the driver.
      Add support for precisely that.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f6c2d49
    • Vladimir Oltean's avatar
      net: ethtool: add helpers for MM fragment size translation · dd1c4164
      Vladimir Oltean authored
      We deliberately make the Linux UAPI pass the minimum fragment size in
      octets, even though IEEE 802.3 defines it as discrete values, and
      addFragSize is just the multiplier. This is because there is nothing
      impossible in operating with an in-between value for the fragment size
      of non-final preempted fragments, and there may even appear hardware
      which supports the in-between sizes.
      
      For the hardware which just understands the addFragSize multiplier,
      create two helpers which translate back and forth the values passed in
      octets.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd1c4164
    • Vladimir Oltean's avatar
      net: ethtool: add helpers for aggregate statistics · 449c5459
      Vladimir Oltean authored
      When a pMAC exists but the driver is unable to atomically query the
      aggregate eMAC+pMAC statistics, the user should be given back at least
      the sum of eMAC and pMAC counters queried separately.
      
      This is a generic problem, so add helpers in ethtool to do this
      operation, if the driver doesn't have a better way to report aggregate
      stats. Do this in a way that does not require changes to these functions
      when new stats are added (basically treat the structures as an array of
      u64 values, except for the first element which is the stats source).
      
      In include/linux/ethtool.h, there is already a section where helper
      function prototypes should be placed. The trouble is, this section is
      too early, before the definitions of struct ethtool_eth_mac_stats et.al.
      Move that section at the end and append these new helpers to it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      449c5459
    • Vladimir Oltean's avatar
      docs: ethtool: document ETHTOOL_A_STATS_SRC and ETHTOOL_A_PAUSE_STATS_SRC · c319df10
      Vladimir Oltean authored
      Two new netlink attributes were added to PAUSE_GET and STATS_GET and
      their replies. Document them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c319df10
    • Vladimir Oltean's avatar
      net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC) · 04692c90
      Vladimir Oltean authored
      IEEE 802.3-2018 clause 99 defines a MAC Merge sublayer which contains an
      Express MAC and a Preemptible MAC. Both MACs are hidden to higher and
      lower layers and visible as a single MAC (packet classification to eMAC
      or pMAC on TX is done based on priority; classification on RX is done
      based on SFD).
      
      For devices which support a MAC Merge sublayer, it is desirable to
      retrieve individual packet counters from the eMAC and the pMAC, as well
      as aggregate statistics (their sum).
      
      Introduce a new ETHTOOL_A_STATS_SRC attribute which is part of the
      policy of ETHTOOL_MSG_STATS_GET and, and an ETHTOOL_A_PAUSE_STATS_SRC
      which is part of the policy of ETHTOOL_MSG_PAUSE_GET (accepted when
      ETHTOOL_FLAG_STATS is set in the common ethtool header). Both of these
      take values from enum ethtool_mac_stats_src, defaulting to "aggregate"
      in the absence of the attribute.
      
      Existing drivers do not need to pay attention to this enum which was
      added to all driver-facing structures, just the ones which report the
      MAC merge layer as supported.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04692c90
    • Vladimir Oltean's avatar
      docs: ethtool-netlink: document interface for MAC Merge layer · 37000004
      Vladimir Oltean authored
      Show details about the structures passed back and forth related to MAC
      Merge layer configuration, state and statistics. The rendered htmldocs
      will be much more verbose due to the kerneldoc references.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37000004
    • Vladimir Oltean's avatar
      net: ethtool: add support for MAC Merge layer · 2b30f829
      Vladimir Oltean authored
      The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2
      specifications (the other being Frame Preemption; IEEE 802.1Q-2018
      clause 6.7.2), which work together to minimize latency caused by frame
      interference at TX. The overall goal of TSN is for normal traffic and
      traffic with a bounded deadline to be able to cohabitate on the same L2
      network and not bother each other too much.
      
      The standards achieve this (partly) by introducing the concept of
      preemptible traffic, i.e. Ethernet frames that have a custom value for
      the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented
      and reassembled at L2 on a link-local basis. The non-preemptible frames
      are called express traffic, they are transmitted using a normal SFD, and
      they can preempt preemptible frames, therefore having lower latency,
      which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
      frames around 9K). Preemption is not recursive, i.e. a P frame cannot
      preempt another P frame. Preemption also does not depend upon priority,
      or otherwise said, an E frame with prio 0 will still preempt a P frame
      with prio 7.
      
      In terms of implementation, the standards talk about the presence of an
      express MAC (eMAC) which handles express traffic, and a preemptible MAC
      (pMAC) which handles preemptible traffic, and these MACs are multiplexed
      on the same MII by a MAC merge layer.
      
      To support frame preemption, the definition of the SFD was generalized
      to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an
      Ethernet frame fragment, or a complete frame. Stations unaware of an SMD
      value different from the standard SFD will treat P frames as error
      frames. To prevent that from happening, a negotiation process is
      defined.
      
      On RX, packets are dispatched to the eMAC or pMAC after being filtered
      by their SMD. On TX, the eMAC/pMAC classification decision is taken by
      the 802.1Q spec, based on packet priority (each of the 8 user priority
      values may have an admin-status of preemptible or express).
      
      The MAC Merge layer and the Frame Preemption parameters have some degree
      of independence in terms of how software stacks are supposed to deal
      with them. The activation of the MM layer is supposed to be controlled
      by an LLDP daemon (after it has been communicated that the link partner
      also supports it), after which a (hardware-based or not) verification
      handshake takes place, before actually enabling the feature. So the
      process is intended to be relatively plug-and-play. Whereas FP settings
      are supposed to be coordinated across a network using something
      approximating NETCONF.
      
      The support contained here is exclusively for the 802.3 (MAC Merge)
      portions and not for the 802.1Q (Frame Preemption) parts. This API is
      sufficient for an LLDP daemon to do its job. The FP adminStatus variable
      from 802.1Q is outside the scope of an LLDP daemon.
      
      I have taken a few creative licenses and augmented the Linux kernel UAPI
      compared to the standard managed objects recommended by IEEE 802.3.
      These are:
      
      - ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive
        Processing state diagram, a MAC Merge layer is always supposed to be
        able to receive P frames. However, this implies keeping the pMAC
        powered on, which will consume needless power in applications where FP
        will never be used. If LLDP is used, the reception of an Additional
        Ethernet Capabilities TLV from the link partner is sufficient
        indication that the pMAC should be enabled. So my proposal is that in
        Linux, we keep the pMAC turned off by default and that user space
        turns it on when needed.
      
      - ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called
        aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in
        the boolean netlink attributes offered, so this is also positive here.
        Other than the meaning being reversed, they correspond to the same
        thing.
      
      - ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP
        daemon to maximize the verifyTime variable (delay between SMD-V
        transmissions), to maximize its chances that the LP replies. IEEE says
        that the verifyTime can range between 1 and 128 ms, but the NXP ENETC
        stupidly keeps this variable in a 7 bit register, so the maximum
        supported value is 127 ms. I could have chosen to hardcode this in the
        LLDP daemon to a lower value, but why not let the kernel expose its
        supported range directly.
      
      - ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called
        aMACMergeAddFragSize, and expresses the "additional" fragment size
        (on top of ETH_ZLEN), whereas this expresses the absolute value of the
        fragment size.
      
      - ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed
        object mandated by the standard, but user space clearly needs to know
        what is the minimum supported fragment size of our local receiver,
        since LLDP must advertise a value no lower than that.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b30f829
    • Peilin Ye's avatar
      net/sock: Introduce trace_sk_data_ready() · 40e0b090
      Peilin Ye authored
      As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
      callback implementations.  For example:
      
      <...>
        iperf-609  [002] .....  70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
        iperf-609  [002] .....  70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
      <...>
      Suggested-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40e0b090
  2. 21 Jan, 2023 17 commits
  3. 20 Jan, 2023 11 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b3c588cd
      Jakub Kicinski authored
      drivers/net/ipa/ipa_interrupt.c
      drivers/net/ipa/ipa_interrupt.h
        9ec9b2a3 ("net: ipa: disable ipa interrupt during suspend")
        8e461e1f ("net: ipa: introduce ipa_interrupt_enable()")
        d50ed355 ("net: ipa: enable IPA interrupt handlers separate from registration")
      https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/
      https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b3c588cd
    • Linus Torvalds's avatar
      Merge tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5deaa985
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless, bluetooth, bpf and netfilter.
      
        Current release - regressions:
      
         - Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
           addrconf", fix nsna_ping mode of team
      
         - wifi: mt76: fix bugs in Rx queue handling and DMA mapping
      
         - eth: mlx5:
            - add missing mutex_unlock in error reporter
            - protect global IPsec ASO with a lock
      
        Current release - new code bugs:
      
         - rxrpc: fix wrong error return in rxrpc_connect_call()
      
        Previous releases - regressions:
      
         - bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2
      
         - wifi:
            - mac80211: fix crashes on Rx due to incorrect initialization of
              rx->link and rx->link_sta
            - mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
              aggregation handling, crashes
            - brcmfmac: fix regression for Broadcom PCIe wifi devices
            - rndis_wlan: prevent buffer overflow in rndis_query_oid
      
         - netfilter: conntrack: handle tcp challenge acks during connection
           reuse
      
         - sched: avoid grafting on htb_destroy_class_offload when destroying
      
         - virtio-net: correctly enable callback during start_xmit, fix stalls
      
         - tcp: avoid the lookup process failing to get sk in ehash table
      
         - ipa: disable ipa interrupt during suspend
      
         - eth: stmmac: enable all safety features by default
      
        Previous releases - always broken:
      
         - bpf:
            - fix pointer-leak due to insufficient speculative store bypass
              mitigation (Spectre v4)
            - skip task with pid=1 in send_signal_common() to avoid a splat
            - fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
              PERF_BPF_EVENT_PROG_UNLOAD events
            - fix potential deadlock in htab_lock_bucket from same bucket
              index but different map_locked index
      
         - bluetooth:
            - fix a buffer overflow in mgmt_mesh_add()
            - hci_qca: fix driver shutdown on closed serdev
            - ISO: fix possible circular locking dependency
            - CIS: hci_event: fix invalid wait context
      
         - wifi: brcmfmac: fixes for survey dump handling
      
         - mptcp: explicitly specify sock family at subflow creation time
      
         - netfilter: nft_payload: incorrect arithmetics when fetching VLAN
           header bits
      
         - tcp: fix rate_app_limited to default to 1
      
         - l2tp: close all race conditions in l2tp_tunnel_register()
      
         - eth: mlx5: fixes for QoS config and eswitch configuration
      
         - eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
      
         - eth: stmmac: fix invalid call to mdiobus_get_phy()
      
        Misc:
      
         - ethtool: add netlink attr in rss get reply only if the value is not
           empty"
      
      * tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
        Revert "Merge branch 'octeontx2-af-CPT'"
        tcp: fix rate_app_limited to default to 1
        bnxt: Do not read past the end of test names
        net: stmmac: enable all safety features by default
        octeontx2-af: add mbox to return CPT_AF_FLT_INT info
        octeontx2-af: update cpt lf alloc mailbox
        octeontx2-af: restore rxc conf after teardown sequence
        octeontx2-af: optimize cpt pf identification
        octeontx2-af: modify FLR sequence for CPT
        octeontx2-af: add mbox for CPT LF reset
        octeontx2-af: recover CPT engine when it gets fault
        net: dsa: microchip: ksz9477: port map correction in ALU table entry register
        selftests/net: toeplitz: fix race on tpacket_v3 block close
        net/ulp: use consistent error code when blocking ULP
        octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
        tcp: avoid the lookup process failing to get sk in ehash table
        Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf"
        MAINTAINERS: add networking entries for Willem
        net: sched: gred: prevent races when adding offloads to stats
        l2tp: prevent lockdep issue in l2tp_tunnel_register()
        ...
      5deaa985
    • Jakub Kicinski's avatar
      Revert "Merge branch 'octeontx2-af-CPT'" · 45a919bb
      Jakub Kicinski authored
      This reverts commit b4fbf0b2, reversing
      changes made to 6c977c5c.
      
      This seems like net-next material.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45a919bb
    • Jakub Kicinski's avatar
      Merge branch 'octeontx2-af-miscellaneous-changes-for-cpt' · 7a590bd6
      Jakub Kicinski authored
      Srujana Challa says:
      
      ====================
      octeontx2-af: Miscellaneous changes for CPT
      
      This patchset consists of miscellaneous changes for CPT.
      - Adds a new mailbox to reset the requested CPT LF.
      - Modify FLR sequence as per HW team suggested.
      - Adds support to recover CPT engines when they gets fault.
      - Updates CPT inbound inline IPsec configuration mailbox,
        as per new generation of the OcteonTX2 chips.
      - Adds a new mailbox to return CPT FLT Interrupt info.
      ====================
      
      Link: https://lore.kernel.org/r/20230118120354.1017961-1-schalla@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a590bd6
    • Srujana Challa's avatar
      octeontx2-af: add mbox to return CPT_AF_FLT_INT info · b814cc90
      Srujana Challa authored
      CPT HW would trigger the CPT AF FLT interrupt when CPT engines
      hits some uncorrectable errors and AF is the one which receives
      the interrupt and recovers the engines.
      This patch adds a mailbox for CPT VFs to request for CPT faulted
      and recovered engines info.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b814cc90
    • Srujana Challa's avatar
      octeontx2-af: update cpt lf alloc mailbox · d1e1de10
      Srujana Challa authored
      The CN10K CPT coprocessor contains a context processor
      to accelerate updates to the IPsec security association
      contexts. The context processor contains a context cache.
      This patch updates CPT LF ALLOC mailbox to config ctx_ilen
      requested by VFs. CPT_LF_ALLOC:ctx_ilen is the size of
      initial context fetch.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1e1de10
    • Nithin Dabilpuram's avatar
      octeontx2-af: restore rxc conf after teardown sequence · e2784acb
      Nithin Dabilpuram authored
      CN10K CPT coprocessor includes a component named RXC which
      is responsible for reassembly of inner IP packets. RXC has
      the feature to evict oldest entries based on age/threshold.
      The age/threshold is being set to minimum values to evict
      all entries at the time of teardown.
      This patch adds code to restore timeout and threshold config
      after teardown sequence is complete as it is global config.
      Signed-off-by: default avatarNithin Dabilpuram <ndabilpuram@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e2784acb
    • Srujana Challa's avatar
      octeontx2-af: optimize cpt pf identification · 41b166e5
      Srujana Challa authored
      Optimize CPT PF identification in mbox handling for faster
      mbox response by doing it at AF driver probe instead of
      every mbox message.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41b166e5
    • Srujana Challa's avatar
      octeontx2-af: modify FLR sequence for CPT · 5c22fce6
      Srujana Challa authored
      On OcteonTX2 platform CPT instruction enqueue is only
      possible via LMTST operations.
      The existing FLR sequence mentioned in HRM requires
      a dummy LMTST to CPT but LMTST can't be submitted from
      AF driver. So, HW team provided a new sequence to avoid
      dummy LMTST. This patch adds code for the same.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5c22fce6
    • Srujana Challa's avatar
      octeontx2-af: add mbox for CPT LF reset · b7e41527
      Srujana Challa authored
      On OcteonTX2 SoC, the admin function (AF) is the only one with all
      priviliges to configure HW and alloc resources, PFs and it's VFs
      have to request AF via mailbox for all their needs.
      This patch adds a new mailbox for CPT VFs to request for CPT LF
      reset.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b7e41527
    • Srujana Challa's avatar
      octeontx2-af: recover CPT engine when it gets fault · e625dad8
      Srujana Challa authored
      When CPT engine has uncorrectable errors, it will get halted and
      must be disabled and re-enabled. This patch adds code for the same.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e625dad8