• Filipe Manana's avatar
    Btrfs: fix read corruption of compressed and shared extents · 005efedf
    Filipe Manana authored
    If a file has a range pointing to a compressed extent, followed by
    another range that points to the same compressed extent and a read
    operation attempts to read both ranges (either completely or part of
    them), the pages that correspond to the second range are incorrectly
    filled with zeroes.
    
    Consider the following example:
    
      File layout
      [0 - 8K]                      [8K - 24K]
          |                             |
          |                             |
       points to extent X,         points to extent X,
       offset 4K, length of 8K     offset 0, length 16K
    
      [extent X, compressed length = 4K uncompressed length = 16K]
    
    If a readpages() call spans the 2 ranges, a single bio to read the extent
    is submitted - extent_io.c:submit_extent_page() would only create a new
    bio to cover the second range pointing to the extent if the extent it
    points to had a different logical address than the extent associated with
    the first range. This has a consequence of the compressed read end io
    handler (compression.c:end_compressed_bio_read()) finish once the extent
    is decompressed into the pages covering the first range, leaving the
    remaining pages (belonging to the second range) filled with zeroes (done
    by compression.c:btrfs_clear_biovec_end()).
    
    So fix this by submitting the current bio whenever we find a range
    pointing to a compressed extent that was preceded by a range with a
    different extent map. This is the simplest solution for this corner
    case. Making the end io callback populate both ranges (or more, if we
    have multiple pointing to the same extent) is a much more complex
    solution since each bio is tightly coupled with a single extent map and
    the extent maps associated to the ranges pointing to the shared extent
    can have different offsets and lengths.
    
    The following test case for fstests triggers the issue:
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
          rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
    
      # real QA test starts here
      _need_to_be_root
      _supported_fs btrfs
      _supported_os Linux
      _require_scratch
      _require_cloner
    
      rm -f $seqres.full
    
      test_clone_and_read_compressed_extent()
      {
          local mount_opts=$1
    
          _scratch_mkfs >>$seqres.full 2>&1
          _scratch_mount $mount_opts
    
          # Create a test file with a single extent that is compressed (the
          # data we write into it is highly compressible no matter which
          # compression algorithm is used, zlib or lzo).
          $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K"        \
                          -c "pwrite -S 0xbb 4K 8K"        \
                          -c "pwrite -S 0xcc 12K 4K"       \
                          $SCRATCH_MNT/foo | _filter_xfs_io
    
          # Now clone our extent into an adjacent offset.
          $CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
              $SCRATCH_MNT/foo $SCRATCH_MNT/foo
    
          # Same as before but for this file we clone the extent into a lower
          # file offset.
          $XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K"         \
                          -c "pwrite -S 0xbb 12K 8K"        \
                          -c "pwrite -S 0xcc 20K 4K"        \
                          $SCRATCH_MNT/bar | _filter_xfs_io
    
          $CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
              $SCRATCH_MNT/bar $SCRATCH_MNT/bar
    
          echo "File digests before unmounting filesystem:"
          md5sum $SCRATCH_MNT/foo | _filter_scratch
          md5sum $SCRATCH_MNT/bar | _filter_scratch
    
          # Evicting the inode or clearing the page cache before reading
          # again the file would also trigger the bug - reads were returning
          # all bytes in the range corresponding to the second reference to
          # the extent with a value of 0, but the correct data was persisted
          # (it was a bug exclusively in the read path). The issue happened
          # only if the same readpages() call targeted pages belonging to the
          # first and second ranges that point to the same compressed extent.
          _scratch_remount
    
          echo "File digests after mounting filesystem again:"
          # Must match the same digests we got before.
          md5sum $SCRATCH_MNT/foo | _filter_scratch
          md5sum $SCRATCH_MNT/bar | _filter_scratch
      }
    
      echo -e "\nTesting with zlib compression..."
      test_clone_and_read_compressed_extent "-o compress=zlib"
    
      _scratch_unmount
    
      echo -e "\nTesting with lzo compression..."
      test_clone_and_read_compressed_extent "-o compress=lzo"
    
      status=0
      exit
    
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: Qu Wenruo<quwenruo@cn.fujitsu.com>
    Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
    005efedf
extent_io.c 144 KB