• Jun'ichi Nomura's avatar
    [PATCH] drivers/md/dm-raid1.c: Fix inconsistent mirroring after interrupted recovery · 930d332a
    Jun'ichi Nomura authored
    dm-mirror has potential data corruption problem: while on-disk log shows
    that all disk contents are in-sync, actual contents of the disks are not
    synchronized.  This problem occurs if initial recovery (synching) is
    interrupted and resumed.
    
    Attached patch fixes this problem.
    
    Background:
    
    rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
    (in-sync), which results in the corresponding bit of clean_bits being set.
    
    This is harmful if on-disk log is used and the map is removed/suspended
    before the initial sync is completed.  The clean_bits is written down to
    the on-disk log at the map removal, and, upon resume, it's read and copied
    to sync_bits.  Since the recovery process refers to the sync_bits to find a
    region to be recovered, the region whose state was changed from RH_NOSYNC
    to RH_CLEAN is no longer recovered.
    
    If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
    sometimes ago, the contents of the mirrored disk just corrupt silently.  If
    you have, balanced read may get bogus data from out-of-sync disks.
    
    The patch keeps RH_NOSYNC state unchanged.  It will be changed to
    RH_RECOVERING when recovery starts and get reclaimed when the recovery
    completes.  So it doesn't leak the region hash entry.
    
    Description:
    
    Keep RH_NOSYNC state unchanged when I/O on the region completes.
    
    rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
    (in-sync), which results in the corresponding bit of clean_bits being set.
    
    This is harmful if on-disk log is used and the map is removed/suspended
    before the initial sync is completed.  The clean_bits is written down to
    the on-disk log at the map removal, and, upon resume, it's read and copied
    to sync_bits.  Since the recovery process refers to the sync_bits to find a
    region to be recovered, the region whose state was changed from RH_NOSYNC
    to RH_CLEAN is no longer recovered.
    
    If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
    sometimes ago, the contents of the mirrored disk just corrupt silently.  If
    you have, balanced read may get bogus data from out-of-sync disks.
    
    The RH_NOSYNC region will be changed to RH_RECOVERING when recovery starts
    on the region and get reclaimed when the recovery completes.  So it doesn't
    leak the region hash entry.
    
    Alasdair said:
    
      I've analysed the relevant part of the state machine and I believe that
      the patch is correct.
    
      (Further work on this code is still needed - this patch has the
      side-effect of holding onto memory unnecessarily for long periods of time
      under certain workloads - but better that than corrupting data.)
    Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
    Acked-by: default avatarAlasdair G Kergon <agk@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    930d332a
dm-raid1.c 28.7 KB