• Yunsheng Lin's avatar
    net: sched: fix tx action rescheduling issue during deactivation · 102b55ee
    Yunsheng Lin authored
    Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
    qdisc before calling __qdisc_run(), which ultimately clear the
    STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
    is set before clearing STATE_MISSED, there may be rescheduling
    of net_tx_action() at the end of qdisc_run_end(), see below:
    
    CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
              .                   .                     .
              .            set STATE_MISSED             .
              .           __netif_schedule()            .
              .                   .           set STATE_DEACTIVATED
              .                   .                qdisc_reset()
              .                   .                     .
              .<---------------   .              synchronize_net()
    clear __QDISC_STATE_SCHED  |  .                     .
              .                |  .                     .
              .                |  .            some_qdisc_is_busy()
              .                |  .               return *false*
              .                |  .                     .
      test STATE_DEACTIVATED   |  .                     .
    __qdisc_run() *not* called |  .                     .
              .                |  .                     .
       test STATE_MISS         |  .                     .
     __netif_schedule()--------|  .                     .
              .                   .                     .
              .                   .                     .
    
    __qdisc_run() is not called by net_tx_atcion() in CPU0 because
    CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
    STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
    is called at the end of qdisc_run_end(), causing tx action
    rescheduling problem.
    
    qdisc_run() called by net_tx_action() runs in the softirq context,
    which should has the same semantic as the qdisc_run() called by
    __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
    synchronize_net() between STATE_DEACTIVATED flag being set and
    qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
    bail out for the deactived lockless qdisc in net_tx_action(), and
    qdisc_reset() will reset all skb not dequeued yet.
    
    So add the rcu_read_lock() explicitly to protect the qdisc_run()
    and do the STATE_DEACTIVATED checking in net_tx_action() before
    calling qdisc_run_begin(). Another option is to do the checking in
    the qdisc_run_end(), but it will add unnecessary overhead for
    non-tx_action case, because __dev_queue_xmit() will not see qdisc
    with STATE_DEACTIVATED after synchronize_net(), the qdisc with
    STATE_DEACTIVATED can only be seen by net_tx_action() because of
    __netif_schedule().
    
    The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
    between net_tx_action() and qdisc_reset(), see:
    commit d518d2ed ("net/sched: fix race between deactivation
    and dequeue for NOLOCK qdisc"). As the bailout added above for
    deactived lockless qdisc in net_tx_action() provides better
    protection for the race without calling qdisc_run() at all, so
    remove the STATE_DEACTIVATED checking in qdisc_run().
    
    After qdisc_reset(), there is no skb in qdisc to be dequeued, so
    clear the STATE_MISSED in dev_reset_queue() too.
    
    Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
    Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
    Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
    V8: Clearing STATE_MISSED before calling __netif_schedule() has
        avoid the endless rescheduling problem, but there may still
        be a unnecessary rescheduling, so adjust the commit log.
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    102b55ee
sch_generic.c 34.1 KB