• Josef Bacik's avatar
    btrfs: shrink delalloc pages instead of full inodes · e076ab2a
    Josef Bacik authored
    Commit 38d715f4 ("btrfs: use btrfs_start_delalloc_roots in
    shrink_delalloc") cleaned up how we do delalloc shrinking by utilizing
    some infrastructure we have in place to flush inodes that we use for
    device replace and snapshot.  However this introduced a pretty serious
    performance regression.  To reproduce the user untarred the source
    tarball of Firefox (360MiB xz compressed/1.5GiB uncompressed), and would
    see it take anywhere from 5 to 20 times as long to untar in 5.10
    compared to 5.9. This was observed on fast devices (SSD and better) and
    not on HDD.
    
    The root cause is because before we would generally use the normal
    writeback path to reclaim delalloc space, and for this we would provide
    it with the number of pages we wanted to flush.  The referenced commit
    changed this to flush that many inodes, which drastically increased the
    amount of space we were flushing in certain cases, which severely
    affected performance.
    
    We cannot revert this patch unfortunately because of 3d45f221
    ("btrfs: fix deadlock when cloning inline extent and low on free
    metadata space") which requires the ability to skip flushing inodes that
    are being cloned in certain scenarios, which means we need to keep using
    our flushing infrastructure or risk re-introducing the deadlock.
    
    Instead to fix this problem we can go back to providing
    btrfs_start_delalloc_roots with a number of pages to flush, and then set
    up a writeback_control and utilize sync_inode() to handle the flushing
    for us.  This gives us the same behavior we had prior to the fix, while
    still allowing us to avoid the deadlock that was fixed by Filipe.  I
    redid the users original test and got the following results on one of
    our test machines (256GiB of ram, 56 cores, 2TiB Intel NVMe drive)
    
      5.9		0m54.258s
      5.10		1m26.212s
      5.10+patch	0m38.800s
    
    5.10+patch is significantly faster than plain 5.9 because of my patch
    series "Change data reservations to use the ticketing infra" which
    contained the patch that introduced the regression, but generally
    improved the overall ENOSPC flushing mechanisms.
    
    Additional testing on consumer-grade SSD (8GiB ram, 8 CPU) confirm
    the results:
    
      5.10.5            4m00s
      5.10.5+patch      1m08s
      5.11-rc2	    5m14s
      5.11-rc2+patch    1m30s
    Reported-by: default avatarRené Rebe <rene@exactcode.de>
    Fixes: 38d715f4 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
    CC: stable@vger.kernel.org # 5.10
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Tested-by: default avatarDavid Sterba <dsterba@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    [ add my test results ]
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    e076ab2a
inode.c 289 KB