• Qu Wenruo's avatar
    btrfs: qgroup: fix data leak caused by race between writeback and truncate · fa91e4aa
    Qu Wenruo authored
    [BUG]
    When running tests like generic/013 on test device with btrfs quota
    enabled, it can normally lead to data leak, detected at unmount time:
    
      BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
      ------------[ cut here ]------------
      WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
      RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
      Call Trace:
       btrfs_put_super+0x15/0x17 [btrfs]
       generic_shutdown_super+0x72/0x110
       kill_anon_super+0x18/0x30
       btrfs_kill_super+0x17/0x30 [btrfs]
       deactivate_locked_super+0x3b/0xa0
       deactivate_super+0x40/0x50
       cleanup_mnt+0x135/0x190
       __cleanup_mnt+0x12/0x20
       task_work_run+0x64/0xb0
       __prepare_exit_to_usermode+0x1bc/0x1c0
       __syscall_return_slowpath+0x47/0x230
       do_syscall_64+0x64/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ---[ end trace caf08beafeca2392 ]---
      BTRFS error (device dm-3): qgroup reserved space leaked
    
    [CAUSE]
    In the offending case, the offending operations are:
    2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
    2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
    
    The following sequence of events could happen after the writev():
    	CPU1 (writeback)		|		CPU2 (truncate)
    -----------------------------------------------------------------
    btrfs_writepages()			|
    |- extent_write_cache_pages()		|
       |- Got page for 1003520		|
       |  1003520 is Dirty, no writeback	|
       |  So (!clear_page_dirty_for_io())   |
       |  gets called for it		|
       |- Now page 1003520 is Clean.	|
       |					| btrfs_setattr()
       |					| |- btrfs_setsize()
       |					|    |- truncate_setsize()
       |					|       New i_size is 18388
       |- __extent_writepage()		|
       |  |- page_offset() > i_size		|
          |- btrfs_invalidatepage()		|
    	 |- Page is clean, so no qgroup |
    	    callback executed
    
    This means, the qgroup reserved data space is not properly released in
    btrfs_invalidatepage() as the page is Clean.
    
    [FIX]
    Instead of checking the dirty bit of a page, call
    btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
    
    As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
    io_tree, not bound to page status, thus we won't cause double freeing
    anyway.
    
    Fixes: 0b34c261 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    fa91e4aa
inode.c 284 KB