• nikolay@redhat.com's avatar
    bonding: fix miimon and arp_interval delayed work race conditions · fbb0c41b
    nikolay@redhat.com authored
    First I would give three observations which will be used later.
    Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq)
     This usage is wrong because the pending bit is cleared just before the
     work's fn is executed and if the function re-arms itself we might end up
     with the work still running. It's safe to call cancel_delayed_work_sync()
     even if the work is not queued at all.
    Observation 2: Use of INIT_DELAYED_WORK()
     Work needs to be initialized only once prior to (de/en)queueing.
    Observation 3: IFF_UP is set only after ndo_open is called
    
    Related race conditions:
    1. Race between bonding_store_miimon() and bonding_store_arp_interval()
     Because of Obs.1 we can end up having both works enqueued.
    2. Multiple races with INIT_DELAYED_WORK()
     Since the works are not protected by anything between INIT_DELAYED_WORK()
     and calls to (en/de)queue it is possible for races between the following
     functions:
     (races are also possible between the calls to INIT_DELAYED_WORK()
      and workqueue code)
     bonding_store_miimon() - bonding_store_arp_interval(), bond_close(),
    			  bond_open(), enqueued functions
     bonding_store_arp_interval() - bonding_store_miimon(), bond_close(),
    				bond_open(), enqueued functions
    3. By Obs.1 we need to change bond_cancel_all()
    
    Bugs 1 and 2 are fixed by moving all work initializations in bond_open
    which by Obs. 2 and Obs. 3 and the fact that we make sure that all works
    are cancelled in bond_close(), is guaranteed not to have any work
    enqueued.
    Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so
    they can't race with bond_close and bond_open. The opposing work is
    cancelled only if the IFF_UP flag is set and it is cancelled
    unconditionally. The opposing work is already cancelled if the interface
    is down so no need to cancel it again. This way we don't need new
    synchronizations for the bonding workqueue. These bugs (and fixes) are
    tied together and belong in the same patch.
    Note: I have left 1 line intentionally over 80 characters (84) because I
          didn't like how it looks broken down. If you'd prefer it otherwise,
          then simply break it.
    
     v2: Make description text < 75 columns
    Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
    Signed-off-by: default avatarJay Vosburgh <fubar@us.ibm.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    fbb0c41b
bond_main.c 132 KB