• Filipe Manana's avatar
    Btrfs: fix hole punching when using the no-holes feature · 2959a32a
    Filipe Manana authored
    When we are using the no-holes feature, if we punch a hole into a file
    range that already contains a hole which overlaps the range we are passing
    to fallocate(), we end up removing the extent map that represents the
    existing hole without adding a new one. This happens because with the
    no-holes feature we do not have explicit extent items to represent holes
    and therefore the call to __btrfs_drop_extents(), made from
    btrfs_punch_hole(), returns an end offset to the variable drop_end that
    is smaller than the end of the range passed to fallocate(), while it
    drops all existing extent maps in that range.
    Normally having a missing extent map is not a problem, for example for
    a readpages() operation we just end up building the extent map by
    looking at the fs/subvol tree for a matching extent item (or a lack of
    one for implicit holes). However for an fsync that uses the fast path,
    which needs to look at the list of modified extent maps, this means
    the fsync will not record information about the complete hole we had
    before the fallocate() call into the log tree, resulting in a file with
    content/layout that does not match what we had neither before nor after
    the hole punch operation.
    
    The following test case for fstests reproduces the issue. It fails without
    this change because we get a file with a different digest after the fsync
    log replay and also with a different extent/hole layout.
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
         _cleanup_flakey
         rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
      . ./common/punch
      . ./common/dmflakey
    
      # real QA test starts here
      _need_to_be_root
      _supported_fs generic
      _supported_os Linux
      _require_scratch
      _require_xfs_io_command "fpunch"
      _require_xfs_io_command "fiemap"
      _require_dm_target flakey
      _require_metadata_journaling $SCRATCH_DEV
    
      # This test was motivated by an issue found in btrfs when the btrfs
      # no-holes feature is enabled (introduced in kernel 3.14). So enable
      # the feature if the fs being tested is btrfs.
      if [ $FSTYP == "btrfs" ]; then
          _require_btrfs_fs_feature "no_holes"
          _require_btrfs_mkfs_feature "no-holes"
          MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes"
      fi
    
      rm -f $seqres.full
    
      _scratch_mkfs >>$seqres.full 2>&1
      _init_flakey
      _mount_flakey
    
      # Create out test file with some data and then fsync it.
      # We do the fsync only to make sure the last fsync we do in this test
      # triggers the fast code path of btrfs' fsync implementation, a
      # condition necessary to trigger the bug btrfs had.
      $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 128K" \
                      -c "fsync"                  \
                      $SCRATCH_MNT/foobar | _filter_xfs_io
    
      # Now punch a hole against the range [96K, 128K[.
      $XFS_IO_PROG -c "fpunch 96K 32K" $SCRATCH_MNT/foobar
    
      # Punch another hole against a range that overlaps the previous range
      # and ends beyond eof.
      $XFS_IO_PROG -c "fpunch 64K 128K" $SCRATCH_MNT/foobar
    
      # Punch another hole against a range that overlaps the first range
      # ([96K, 128K[) and ends at eof.
      $XFS_IO_PROG -c "fpunch 32K 96K" $SCRATCH_MNT/foobar
    
      # Fsync our file. We want to verify that, after a power failure and
      # mounting the filesystem again, the file content reflects all the hole
      # punch operations.
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
    
      echo "File digest before power failure:"
      md5sum $SCRATCH_MNT/foobar | _filter_scratch
    
      echo "Fiemap before power failure:"
      $XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foobar | _filter_fiemap
    
      # Silently drop all writes and umount to simulate a crash/power failure.
      _load_flakey_table $FLAKEY_DROP_WRITES
      _unmount_flakey
    
      # Allow writes again, mount to trigger log replay and validate file
      # contents.
      _load_flakey_table $FLAKEY_ALLOW_WRITES
      _mount_flakey
    
      echo "File digest after log replay:"
      # Must match the same digest we got before the power failure.
      md5sum $SCRATCH_MNT/foobar | _filter_scratch
    
      echo "Fiemap after log replay:"
      # Must match the same extent listing we got before the power failure.
      $XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foobar | _filter_fiemap
    
      _unmount_flakey
    
      status=0
      exit
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    2959a32a
file.c 79 KB