• Chris Mason's avatar
    btrfs: only subtract from len_to_oe_boundary when it is tracking an extent · 09c3717c
    Chris Mason authored
    bio_ctrl->len_to_oe_boundary is used to make sure we stay inside a zone
    as we submit bios for writes.  Every time we add a page to the bio, we
    decrement those bytes from len_to_oe_boundary, and then we submit the
    bio if we happen to hit zero.
    
    Most of the time, len_to_oe_boundary gets set to U32_MAX.
    submit_extent_page() adds pages into our bio, and the size of the bio
    ends up limited by:
    
    - Are we contiguous on disk?
    - Does bio_add_page() allow us to stuff more in?
    - is len_to_oe_boundary > 0?
    
    The len_to_oe_boundary math starts with U32_MAX, which isn't page or
    sector aligned, and subtracts from it until it hits zero.  In the
    non-zoned case, the last IO we submit before we hit zero is going to be
    unaligned, triggering BUGs.
    
    This is hard to trigger because bio_add_page() isn't going to make a bio
    of U32_MAX size unless you give it a perfect set of pages and fully
    contiguous extents on disk.  We can hit it pretty reliably while making
    large swapfiles during provisioning because the machine is freshly
    booted, mostly idle, and the disk is freshly formatted.  It's also
    possible to trigger with reads when read_ahead_kb is set to 4GB.
    
    The code has been clean up and shifted around a few times, but this flaw
    has been lurking since the counter was added.  I think the commit
    24e6c808 ("btrfs: simplify main loop in submit_extent_page") ended
    up exposing the bug.
    
    The fix used here is to skip doing math on len_to_oe_boundary unless
    we've changed it from the default U32_MAX value.  bio_add_page() is the
    real limit we want, and there's no reason to do extra math when block
    layer is doing it for us.
    
    Sample reproducer, note you'll need to change the path to the bdi and
    device:
    
      SUBVOL=/btrfs/swapvol
      SWAPFILE=$SUBVOL/swapfile
      SZMB=8192
    
      mkfs.btrfs -f /dev/vdb
      mount /dev/vdb /btrfs
    
      btrfs subvol create $SUBVOL
      chattr +C $SUBVOL
      dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
      sync
    
      echo 4 > /proc/sys/vm/drop_caches
    
      echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb
    
      while true; do
    	  echo 1 > /proc/sys/vm/drop_caches
    	  echo 1 > /proc/sys/vm/drop_caches
    	  dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock
      done
    
    Fixes: 24e6c808 ("btrfs: simplify main loop in submit_extent_page")
    CC: stable@vger.kernel.org # 6.4+
    Reviewed-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    09c3717c
extent_io.c 131 KB