• Vladimir Oltean's avatar
    net: ethtool: add support for MAC Merge layer · 2b30f829
    Vladimir Oltean authored
    The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2
    specifications (the other being Frame Preemption; IEEE 802.1Q-2018
    clause 6.7.2), which work together to minimize latency caused by frame
    interference at TX. The overall goal of TSN is for normal traffic and
    traffic with a bounded deadline to be able to cohabitate on the same L2
    network and not bother each other too much.
    
    The standards achieve this (partly) by introducing the concept of
    preemptible traffic, i.e. Ethernet frames that have a custom value for
    the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented
    and reassembled at L2 on a link-local basis. The non-preemptible frames
    are called express traffic, they are transmitted using a normal SFD, and
    they can preempt preemptible frames, therefore having lower latency,
    which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
    frames around 9K). Preemption is not recursive, i.e. a P frame cannot
    preempt another P frame. Preemption also does not depend upon priority,
    or otherwise said, an E frame with prio 0 will still preempt a P frame
    with prio 7.
    
    In terms of implementation, the standards talk about the presence of an
    express MAC (eMAC) which handles express traffic, and a preemptible MAC
    (pMAC) which handles preemptible traffic, and these MACs are multiplexed
    on the same MII by a MAC merge layer.
    
    To support frame preemption, the definition of the SFD was generalized
    to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an
    Ethernet frame fragment, or a complete frame. Stations unaware of an SMD
    value different from the standard SFD will treat P frames as error
    frames. To prevent that from happening, a negotiation process is
    defined.
    
    On RX, packets are dispatched to the eMAC or pMAC after being filtered
    by their SMD. On TX, the eMAC/pMAC classification decision is taken by
    the 802.1Q spec, based on packet priority (each of the 8 user priority
    values may have an admin-status of preemptible or express).
    
    The MAC Merge layer and the Frame Preemption parameters have some degree
    of independence in terms of how software stacks are supposed to deal
    with them. The activation of the MM layer is supposed to be controlled
    by an LLDP daemon (after it has been communicated that the link partner
    also supports it), after which a (hardware-based or not) verification
    handshake takes place, before actually enabling the feature. So the
    process is intended to be relatively plug-and-play. Whereas FP settings
    are supposed to be coordinated across a network using something
    approximating NETCONF.
    
    The support contained here is exclusively for the 802.3 (MAC Merge)
    portions and not for the 802.1Q (Frame Preemption) parts. This API is
    sufficient for an LLDP daemon to do its job. The FP adminStatus variable
    from 802.1Q is outside the scope of an LLDP daemon.
    
    I have taken a few creative licenses and augmented the Linux kernel UAPI
    compared to the standard managed objects recommended by IEEE 802.3.
    These are:
    
    - ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive
      Processing state diagram, a MAC Merge layer is always supposed to be
      able to receive P frames. However, this implies keeping the pMAC
      powered on, which will consume needless power in applications where FP
      will never be used. If LLDP is used, the reception of an Additional
      Ethernet Capabilities TLV from the link partner is sufficient
      indication that the pMAC should be enabled. So my proposal is that in
      Linux, we keep the pMAC turned off by default and that user space
      turns it on when needed.
    
    - ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called
      aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in
      the boolean netlink attributes offered, so this is also positive here.
      Other than the meaning being reversed, they correspond to the same
      thing.
    
    - ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP
      daemon to maximize the verifyTime variable (delay between SMD-V
      transmissions), to maximize its chances that the LP replies. IEEE says
      that the verifyTime can range between 1 and 128 ms, but the NXP ENETC
      stupidly keeps this variable in a 7 bit register, so the maximum
      supported value is 127 ms. I could have chosen to hardcode this in the
      LLDP daemon to a lower value, but why not let the kernel expose its
      supported range directly.
    
    - ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called
      aMACMergeAddFragSize, and expresses the "additional" fragment size
      (on top of ETH_ZLEN), whereas this expresses the absolute value of the
      fragment size.
    
    - ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed
      object mandated by the standard, but user space clearly needs to know
      what is the minimum supported fragment size of our local receiver,
      since LLDP must advertise a value no lower than that.
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    2b30f829
netlink.h 18.1 KB