• Nikos Tsironis's avatar
    dm clone: Flush destination device before committing metadata · 8b3fd1f5
    Nikos Tsironis authored
    dm-clone maintains an on-disk bitmap which records which regions are
    valid in the destination device, i.e., which regions have already been
    hydrated, or have been written to directly, via user I/O.
    
    Setting a bit in the on-disk bitmap meas the corresponding region is
    valid in the destination device and we redirect all I/O regarding it to
    the destination device.
    
    Suppose the destination device has a volatile write-back cache and the
    following sequence of events occur:
    
    1. A region gets hydrated, either through the background hydration or
       because it was written to directly, via user I/O.
    
    2. The commit timeout expires and we commit the metadata, marking that
       region as valid in the destination device.
    
    3. The system crashes and the destination device's cache has not been
       flushed, meaning the region's data are lost.
    
    The next time we read that region we read it from the destination
    device, since the metadata have been successfully committed, but the
    data are lost due to the crash, so we read garbage instead of the old
    data.
    
    This has several implications:
    
    1. In case of background hydration or of writes with size smaller than
       the region size (which means we first copy the whole region and then
       issue the smaller write), we corrupt data that the user never
       touched.
    
    2. In case of writes with size equal to the device's logical block size,
       we fail to provide atomic sector writes. When the system recovers the
       user will read garbage from the sector instead of the old data or the
       new data.
    
    3. In case of writes without the FUA flag set, after the system
       recovers, the written sectors will contain garbage instead of a
       random mix of sectors containing either old data or new data, thus we
       fail again to provide atomic sector writes.
    
    4. Even when the user flushes the dm-clone device, because we first
       commit the metadata and then pass down the flush, the same risk for
       corruption exists (if the system crashes after the metadata have been
       committed but before the flush is passed down).
    
    The only case which is unaffected is that of writes with size equal to
    the region size and with the FUA flag set. But, because FUA writes
    trigger metadata commits, this case can trigger the corruption
    indirectly.
    
    To solve this and avoid the potential data corruption we flush the
    destination device **before** committing the metadata.
    
    This ensures that any freshly hydrated regions, for which we commit the
    metadata, are properly written to non-volatile storage and won't be lost
    in case of a crash.
    
    Fixes: 7431b783 ("dm: add clone target")
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    8b3fd1f5
dm-clone-target.c 54.7 KB