• Ilya Dryomov's avatar
    dm cache metadata: set dirty on all cache blocks after a crash · 5b1fe7be
    Ilya Dryomov authored
    Quoting Documentation/device-mapper/cache.txt:
    
      The 'dirty' state for a cache block changes far too frequently for us
      to keep updating it on the fly.  So we treat it as a hint.  In normal
      operation it will be written when the dm device is suspended.  If the
      system crashes all cache blocks will be assumed dirty when restarted.
    
    This got broken in commit f177940a ("dm cache metadata: switch to
    using the new cursor api for loading metadata") in 4.9, which removed
    the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
    flag) when loading cache blocks.  This results in data corruption on an
    unclean shutdown with dirty cache blocks on the fast device.  After the
    crash those blocks are considered clean and may get evicted from the
    cache at any time.  This can be demonstrated by doing a lot of reads
    to trigger individual evictions, but uncache is more predictable:
    
      ### Disable auto-activation in lvm.conf to be able to do uncache in
      ### time (i.e. see uncache doing flushing) when the fix is applied.
    
      # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb
      # vgcreate vg_cache /dev/vdb /dev/vdc
      # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb
      # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc
      # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc
      # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev
      # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev
      # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev
      # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
      0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      # dmsetup status vg_cache-lv_slowdev
      0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw -
                                                                ^^^^
                                    7065 * 64k = 441M yet to be written to the slow device
      # echo b >/proc/sysrq-trigger
    
      # vgchange -ay vg_cache
      # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
      0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      # lvconvert --uncache vg_cache/lv_slowdev
      Flushing 0 blocks for cache vg_cache/lv_slowdev.
      Logical volume "lv_cachedev" successfully removed
      Logical volume vg_cache/lv_slowdev is not cached.
      # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
      0fe00000:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
      0fe00010:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
    
    This is the case with both v1 and v2 cache pool metatata formats.
    
    After applying this patch:
    
      # vgchange -ay vg_cache
      # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
      0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      # lvconvert --uncache vg_cache/lv_slowdev
      Flushing 3724 blocks for cache vg_cache/lv_slowdev.
      ...
      Flushing 71 blocks for cache vg_cache/lv_slowdev.
      Logical volume "lv_cachedev" successfully removed
      Logical volume vg_cache/lv_slowdev is not cached.
      # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
      0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
    
    Cc: stable@vger.kernel.org
    Fixes: f177940a ("dm cache metadata: switch to using the new cursor api for loading metadata")
    Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    5b1fe7be
dm-cache-metadata.c 41.9 KB