• Claudiu Manoil's avatar
    gianfar: Fix device reset races (oops) for Tx · 0851133b
    Claudiu Manoil authored
    The device reset procedure, stop_gfar()/startup_gfar(), has
    concurrency issues.
    "Kernel access of bad area" oopses show up during Tx timeout
    device reset or other reset cases (like changing MTU) that
    happen while the interface still has traffic. The oopses
    happen in start_xmit and clean_tx_ring when accessing tx_queue->
    tx_skbuff which is NULL. The race comes from de-allocating the
    tx_skbuff while transmission and napi processing are still
    active. Though the Tx queues get temoprarily stopped when Tx
    timeout occurs, they get re-enabled as a result of Tx congestion
    handling inside the napi context (see clean_tx_ring()). Not
    disabling the napi during reset is also a bug, because
    clean_tx_ring() will try to access tx_skbuff while it is being
    de-alloc'ed and re-alloc'ed.
    
    To fix this, stop_gfar() needs to disable napi processing
    after stopping the Tx queues. However, in order to prevent
    clean_tx_ring() to re-enable the Tx queue before the napi
    gets disabled, the device state DOWN has been introduced.
    It prevents the Tx congestion management from re-enabling the
    de-congested Tx queue while the device is brought down.
    An additional locking state, RESETTING, has been introduced
    to prevent simultaneous resets or to prevent configuring the
    device while it is resetting.
    The bogus 'rxlock's (for each Rx queue) have been removed since
    their purpose is not justified, as they don't prevent nor are
    suited to prevent device reset/reconfig races (such as this one).
    Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    0851133b
gianfar.c 86.8 KB