• Emmanuel Grumbach's avatar
    mac80211: fix a race between restart and CSA flows · f3ffb6c3
    Emmanuel Grumbach authored
    We hit a problem with iwlwifi that was caused by a bug in
    mac80211. A bug in iwlwifi caused the firwmare to crash in
    certain cases in channel switch. Because of that bug,
    drv_pre_channel_switch would fail and trigger the restart
    flow.
    Now we had the hw restart worker which runs on the system's
    workqueue and the csa_connection_drop_work worker that runs
    on mac80211's workqueue that can run together. This is
    obviously problematic since the restart work wants to
    reconfigure the connection, while the csa_connection_drop_work
    worker does the exact opposite: it tries to disconnect.
    
    Fix this by cancelling the csa_connection_drop_work worker
    in the restart worker.
    
    Note that this can sound racy: we could have:
    
    driver   iface_work   CSA_work   restart_work
    +++++++++++++++++++++++++++++++++++++++++++++
                  |
     <--drv_cs ---|
    <FW CRASH!>
    -CS FAILED-->
                  |                       |
                  |                 cancel_work(CSA)
               schedule                   |
               CSA work                   |
                             |            |
                            Race between those 2
    
    But this is not possible because we flush the workqueue
    in the restart worker before we cancel the CSA worker.
    That would be bullet proof if we could guarantee that
    we schedule the CSA worker only from the iface_work
    which runs on the workqueue (and not on the system's
    workqueue), but unfortunately we do have an instance
    in which we schedule the CSA work outside the context
    of the workqueue (ieee80211_chswitch_done).
    
    Note also that we should probably cancel other workers
    like beacon_connection_loss_work and possibly others
    for different types of interfaces, at the very least,
    IBSS should suffer from the exact same problem, but for
    now, do the minimum to fix the actual bug that was actually
    experienced and reproduced.
    Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
    Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
    f3ffb6c3
main.c 36.1 KB