• NeilBrown's avatar
    md/raid5: ensure sync and DISCARD don't happen at the same time. · f8dfcffd
    NeilBrown authored
    A number of problems can occur due to races between
    resync/recovery and discard.
    
    - if sync_request calls handle_stripe() while a discard is
      happening on the stripe, it might call handle_stripe_clean_event
      before all of the individual discard requests have completed
      (so some devices are still locked, but not all).
      Since commit ca64cae9
         md/raid5: Make sure we clear R5_Discard when discard is finished.
      this will cause R5_Discard to be cleared for the parity device,
      so handle_stripe_clean_event() will not be called when the other
      devices do become unlocked, so their ->written will not be cleared.
      This ultimately leads to a WARN_ON in init_stripe and a lock-up.
    
    - If handle_stripe_clean_event() does clear R5_UPTODATE at an awkward
      time for resync, it can lead to s->uptodate being less than disks
      in handle_parity_checks5(), which triggers a BUG (because it is
      one).
    
    So:
     - keep R5_Discard on the parity device until all other devices have
       completed their discard request
     - make sure we don't try to have a 'discard' and a 'sync' action at
       the same time.
       This involves a new stripe flag to we know when a 'discard' is
       happening, and the use of R5_Overlap on the parity disk so when a
       discard is wanted while a sync is active, so we know to wake up
       the discard at the appropriate time.
    
    Discard support for RAID5 was added in 3.7, so this is suitable for
    any -stable kernel since 3.7.
    
    Cc: stable@vger.kernel.org (v3.7+)
    Reported-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
    Tested-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    f8dfcffd
raid5.c 182 KB