• Vladimir Oltean's avatar
    net/sched: taprio: give higher priority to higher TCs in software dequeue mode · 2f530df7
    Vladimir Oltean authored
    Current taprio software implementation is haunted by the shadow of the
    igb/igc hardware model. It iterates over child qdiscs in increasing
    order of TXQ index, therefore giving higher xmit priority to TXQ 0 and
    lower to TXQ N. According to discussions with Vinicius, that is the
    default (perhaps even unchangeable) prioritization scheme used for the
    NICs that taprio was first written for (igb, igc), and we have a case of
    two bugs canceling out, resulting in a functional setup on igb/igc, but
    a less sane one on other NICs.
    
    To the best of my understanding, taprio should prioritize based on the
    traffic class, so it should really dequeue starting with the highest
    traffic class and going down from there. We get to the TXQ using the
    tc_to_txq[] netdev property.
    
    TXQs within the same TC have the same (strict) priority, so we should
    pick from them as fairly as we can. We can achieve that by implementing
    something very similar to q->curband from multiq_dequeue().
    
    Since igb/igc really do have TXQ 0 of higher hardware priority than
    TXQ 1 etc, we need to preserve the behavior for them as well. We really
    have no choice, because in txtime-assist mode, taprio is essentially a
    software scheduler towards offloaded child tc-etf qdiscs, so the TXQ
    selection really does matter (not all igb TXQs support ETF/SO_TXTIME,
    says Kurt Kanzenbach).
    
    To preserve the behavior, we need a capability bit so that taprio can
    determine if it's running on igb/igc, or on something else. Because igb
    doesn't offload taprio at all, we can't piggyback on the
    qdisc_offload_query_caps() call from taprio_enable_offload(), but
    instead we need a separate call which is also made for software
    scheduling.
    
    Introduce two static keys to minimize the performance penalty on systems
    which only have igb/igc NICs, and on systems which only have other NICs.
    For mixed systems, taprio will have to dynamically check whether to
    dequeue using one prioritization algorithm or using the other.
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    2f530df7
pkt_sched.h 6.5 KB