1. 25 May, 2020 40 commits
    • Anand Jain's avatar
      btrfs: unexport btrfs_compress_set_level() · adbab642
      Anand Jain authored
      btrfs_compress_set_level() can be static function in the file
      compression.c.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      adbab642
    • David Sterba's avatar
      btrfs: simplify iget helpers · 0202e83f
      David Sterba authored
      The inode lookup starting at btrfs_iget takes the full location key,
      while only the objectid is used to match the inode, because the lookup
      happens inside the given root thus the inode number is unique.
      The entire location key is properly set up in btrfs_init_locked_inode.
      
      Simplify the helpers and pass only inode number, renaming it to 'ino'
      instead of 'objectid'. This allows to remove temporary variables key,
      saving some stack space.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0202e83f
    • David Sterba's avatar
      btrfs: open code read_fs_root · a820feb5
      David Sterba authored
      After the update to btrfs_get_fs_root, read_fs_root has become trivial
      wrapper that can be open coded.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a820feb5
    • David Sterba's avatar
      btrfs: simplify root lookup by id · 56e9357a
      David Sterba authored
      The main function to lookup a root by its id btrfs_get_fs_root takes the
      whole key, while only using the objectid. The value of offset is preset
      to (u64)-1 but not actually used until btrfs_find_root that does the
      actual search.
      
      Switch btrfs_get_fs_root to use only objectid and remove all local
      variables that existed just for the lookup. The actual key for search is
      set up in btrfs_get_fs_root, reusing another key variable.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      56e9357a
    • Qu Wenruo's avatar
      btrfs: reloc: clear DEAD_RELOC_TREE bit for orphan roots to prevent runaway balance · 1dae7e0e
      Qu Wenruo authored
      [BUG]
      There are several reported runaway balance, that balance is flooding the
      log with "found X extents" where the X never changes.
      
      [CAUSE]
      Commit d2311e69 ("btrfs: relocation: Delay reloc tree deletion after
      merge_reloc_roots") introduced BTRFS_ROOT_DEAD_RELOC_TREE bit to
      indicate that one subvolume has finished its tree blocks swap with its
      reloc tree.
      
      However if balance is canceled or hits ENOSPC halfway, we didn't clear
      the BTRFS_ROOT_DEAD_RELOC_TREE bit, leaving that bit hanging forever
      until unmount.
      
      Any subvolume root with that bit, would cause backref cache to skip this
      tree block, as it has finished its tree block swap.  This would cause
      all tree blocks of that root be ignored by balance, leading to runaway
      balance.
      
      [FIX]
      Fix the problem by also clearing the BTRFS_ROOT_DEAD_RELOC_TREE bit for
      the original subvolume of orphan reloc root.
      
      Add an umount check for the stale bit still set.
      
      Fixes: d2311e69 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      1dae7e0e
    • Qu Wenruo's avatar
      btrfs: reloc: fix reloc root leak and NULL pointer dereference · 51415b6c
      Qu Wenruo authored
      [BUG]
      When balance is canceled, there is a pretty high chance that unmounting
      the fs can lead to lead the NULL pointer dereference:
      
        BTRFS warning (device dm-3): page private not zero on page 223158272
        ...
        BTRFS warning (device dm-3): page private not zero on page 223162368
        BTRFS error (device dm-3): leaked root 18446744073709551608-304 refcount 1
        BUG: kernel NULL pointer dereference, address: 0000000000000168
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 2 PID: 5793 Comm: umount Tainted: G           O      5.7.0-rc5-custom+ #53
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:__lock_acquire+0x5dc/0x24c0
        Call Trace:
         lock_acquire+0xab/0x390
         _raw_spin_lock+0x39/0x80
         btrfs_release_extent_buffer_pages+0xd7/0x200 [btrfs]
         release_extent_buffer+0xb2/0x170 [btrfs]
         free_extent_buffer+0x66/0xb0 [btrfs]
         btrfs_put_root+0x8e/0x130 [btrfs]
         btrfs_check_leaked_roots.cold+0x5/0x5d [btrfs]
         btrfs_free_fs_info+0xe5/0x120 [btrfs]
         btrfs_kill_super+0x1f/0x30 [btrfs]
         deactivate_locked_super+0x3b/0x80
         deactivate_super+0x3e/0x50
         cleanup_mnt+0x109/0x160
         __cleanup_mnt+0x12/0x20
         task_work_run+0x67/0xa0
         exit_to_usermode_loop+0xc5/0xd0
         syscall_return_slowpath+0x205/0x360
         do_syscall_64+0x6e/0xb0
         entry_SYSCALL_64_after_hwframe+0x49/0xb3
        RIP: 0033:0x7fd028ef740b
      
      [CAUSE]
      When balance is canceled, all reloc roots are marked as orphan, and
      orphan reloc roots are going to be cleaned up.
      
      However for orphan reloc roots and merged reloc roots, their lifespan
      are quite different:
      
      	Merged reloc roots	|	Orphan reloc roots by cancel
      --------------------------------------------------------------------
      create_reloc_root()		| create_reloc_root()
      |- refs == 1			| |- refs == 1
      				|
      btrfs_grab_root(reloc_root);	| btrfs_grab_root(reloc_root);
      |- refs == 2			| |- refs == 2
      				|
      root->reloc_root = reloc_root;	| root->reloc_root = reloc_root;
      		>>> No difference so far <<<
      				|
      prepare_to_merge()		| prepare_to_merge()
      |- btrfs_set_root_refs(item, 1);| |- if (!err) (err == -EINTR)
      				|
      merge_reloc_roots()		| merge_reloc_roots()
      |- merge_reloc_root()		| |- Doing nothing to put reloc root
         |- insert_dirty_subvol()	| |- refs == 2
            |- __del_reloc_root()	|
               |- btrfs_put_root()	|
                  |- refs == 1	|
      		>>> Now orphan reloc roots still have refs 2 <<<
      				|
      clean_dirty_subvols()		| clean_dirty_subvols()
      |- btrfs_drop_snapshot()	| |- btrfS_drop_snapshot()
         |- reloc_root get freed	|    |- reloc_root still has refs 2
      				|	related ebs get freed, but
      				|	reloc_root still recorded in
      				|	allocated_roots
      btrfs_check_leaked_roots()	| btrfs_check_leaked_roots()
      |- No leaked roots		| |- Leaked reloc_roots detected
      				| |- btrfs_put_root()
      				|    |- free_extent_buffer(root->node);
      				|       |- eb already freed, caused NULL
      				|	   pointer dereference
      
      [FIX]
      The fix is to clear fs_root->reloc_root and put it at
      merge_reloc_roots() time, so that we won't leak reloc roots.
      
      Fixes: d2311e69 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
      CC: stable@vger.kernel.org # 5.1+
      Tested-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      51415b6c
    • Robbie Ko's avatar
      btrfs: reduce lock contention when creating snapshot · c11fbb6e
      Robbie Ko authored
      When creating a snapshot, ordered extents need to be flushed and this
      can take a long time.
      
      In create_snapshot there are two locks held when this happens:
      
        1. Destination directory inode lock
        2. Global subvolume semaphore
      
      This will unnecessarily block other operations like subvolume destroy,
      create, or setflag until the snapshot is created.
      
      We can fix that by moving the flush outside the locked section as this
      does not depend on the aforementioned locks.  The code factors out the
      snapshot related work from create_snapshot to btrfs_mksnapshot.
      
      __btrfs_ioctl_snap_create
        btrfs_mksubvol
          create_subvol
        btrfs_mksnapshot
          <flush>
          btrfs_mksubvol
            create_snapshot
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarRobbie Ko <robbieko@synology.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c11fbb6e
    • Qu Wenruo's avatar
      btrfs: don't set SHAREABLE flag for data reloc tree · aeb935a4
      Qu Wenruo authored
      SHAREABLE flag is set for subvolumes because users can create snapshot
      for subvolumes, thus sharing tree blocks of them.
      
      But data reloc tree is not exposed to user space, as it's only an
      internal tree for data relocation, thus it doesn't need the full path
      replacement handling at all.
      
      This patch will make data reloc tree a non-shareable tree, and add
      btrfs_fs_info::data_reloc_root for data reloc tree, so relocation code
      can grab it from fs_info directly.
      
      This would slightly improve tree relocation, as now data reloc tree
      can go through regular COW routine to get relocated, without bothering
      the complex tree reloc tree routine.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      aeb935a4
    • Qu Wenruo's avatar
      btrfs: inode: cleanup the log-tree exceptions in btrfs_truncate_inode_items() · 82028e0a
      Qu Wenruo authored
      There are a lot of root owner checks in btrfs_truncate_inode_items()
      like:
      
      	if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) ||
      	    root == fs_info->tree_root)
      
      But considering that, only these trees can have INODE_ITEMs:
      
      - tree root (for v1 space cache)
      - subvolume trees
      - tree reloc trees
      - data reloc tree
      - log trees
      
      And since subvolume/tree reloc/data reloc trees all have SHAREABLE bit,
      and we're checking tree root manually, so above check is just excluding
      log trees.
      
      This patch will replace two of such checks to a simpler one:
      
      	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID)
      
      This would merge btrfs_drop_extent_cache() and lock_extent_bits() call
      into the same if branch.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      82028e0a
    • Qu Wenruo's avatar
      btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE · 92a7cc42
      Qu Wenruo authored
      The name BTRFS_ROOT_REF_COWS is not very clear about the meaning.
      
      In fact, that bit can only be set to those trees:
      
      - Subvolume roots
      - Data reloc root
      - Reloc roots for above roots
      
      All other trees won't get this bit set.  So just by the result, it is
      obvious that, roots with this bit set can have tree blocks shared with
      other trees.  Either shared by snapshots, or by reloc roots (an special
      snapshot created by relocation).
      
      This patch will rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE to
      make it easier to understand, and update all comment mentioning
      "reference counted" to follow the rename.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      92a7cc42
    • Anand Jain's avatar
      btrfs: drop stale reference to volume_mutex · ae3e715f
      Anand Jain authored
      Commit dccdb07b ("btrfs: kill btrfs_fs_info::volume_mutex") removed
      the last use of the volume_mutex, forgetting to update the comment.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ae3e715f
    • David Sterba's avatar
      583e4a23
    • David Sterba's avatar
      btrfs: optimize split page write in btrfs_set_token_##bits · f472d3c2
      David Sterba authored
      The fallback path calls helper write_extent_buffer to do write of the
      data spanning two extent buffer pages. As the size is known, we can do
      the write directly in two steps.  This removes one function call and
      compiler can optimize memcpy as the sizes are known at compile time. The
      cached token address is set to the second page.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f472d3c2
    • David Sterba's avatar
      btrfs: optimize split page write in btrfs_set_##bits · f4ca8c51
      David Sterba authored
      The helper write_extent_buffer is called to do write of the data
      spanning two extent buffer pages. As the size is known, we can do the
      write directly in two steps.  This removes one function call and
      compiler can optimize memcpy as the sizes are known at compile time.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f4ca8c51
    • David Sterba's avatar
      btrfs: optimize split page read in btrfs_get_token_##bits · ba8a9a05
      David Sterba authored
      The fallback path calls helper read_extent_buffer to do read of the data
      spanning two extent buffer pages. As the size is known, we can do the
      read directly in two steps.  This removes one function call and compiler
      can optimize memcpy as the sizes are known at compile time. The cached
      token address is set to the second page.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ba8a9a05
    • David Sterba's avatar
      btrfs: optimize split page read in btrfs_get_##bits · 84da071f
      David Sterba authored
      The helper read_extent_buffer is called to do read of the data spanning
      two extent buffer pages. As the size is known, we can do the read
      directly in two steps.  This removes one function call and compiler can
      optimize memcpy as the sizes are known at compile time.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      84da071f
    • David Sterba's avatar
      btrfs: drop unnecessary offset_in_page in extent buffer helpers · c60ac0ff
      David Sterba authored
      Helpers that iterate over extent buffer pages set up several variables,
      one of them is finding out offset of the extent buffer start within a
      page. Right now we have extent buffers aligned to page sizes so this is
      effectively storing zero. This makes the code harder the follow and can
      be simplified.
      
      The same change is done in all the helpers:
      
      * remove: size_t start_offset = offset_in_page(eb->start);
      * simplify code using start_offset
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c60ac0ff
    • David Sterba's avatar
      btrfs: constify extent_buffer in the API functions · 2b48966a
      David Sterba authored
      There are many helpers around extent buffers, found in extent_io.h and
      ctree.h. Most of them can be converted to take constified eb as there
      are no changes to the extent buffer structure itself but rather the
      pages.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2b48966a
    • David Sterba's avatar
      btrfs: remove unused map_private_extent_buffer · db3756c8
      David Sterba authored
      All uses of map_private_extent_buffer have been replaced by more
      effective way. The set/get helpers have their own bounds checker.
      The function name was confusing since the non-private helper was removed
      in a6591715 ("Btrfs: stop using highmem for extent_buffers") many
      years ago.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      db3756c8
    • David Sterba's avatar
      btrfs: speed up and simplify generic_bin_search · 5cd17f34
      David Sterba authored
      The bin search jumps over the extent buffer item keys, comparing
      directly the bytes if the key is in one page, or storing it in a
      temporary buffer in case it spans two pages.
      
      The mapping start and length are obtained from map_private_extent_buffer,
      which is heavy weight compared to what we need. We know the key size and
      can find out the eb page in a simple way.  For keys spanning two pages
      the fallback read_extent_buffer is used.
      
      The temporary variables are reduced and moved to the scope of use.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5cd17f34
    • David Sterba's avatar
      btrfs: speed up btrfs_set_token_##bits helpers · ce7afe87
      David Sterba authored
      The set/get token helpers either use the cached address in the token or
      unconditionally call map_private_extent_buffer to get the address of
      page containing the requested offset plus the mapping start and length.
      Depending on the return value, the fast path uses unaligned put to write
      data within a page, or fall back to write_extent_buffer that can handle
      writes spanning more pages.
      
      This is all wasteful. We know the number of bytes to write, 1/2/4/8 and
      can find out the page. Then simply check if it's contained or the
      fallback is needed. The token address is updated to the page, or the on
      the next index, expecting that the next write will use that.
      
      This saves one function call to map_private_extent_buffer and several
      unnecessary temporary variables.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ce7afe87
    • David Sterba's avatar
      btrfs: speed up btrfs_set_##bits helpers · 029e4a42
      David Sterba authored
      The helpers unconditionally call map_private_extent_buffer to get the
      address of page containing the requested offset plus the mapping start
      and length. Depending on the return value, the fast path uses unaligned
      put to write data within a page, or fall back to write_extent_buffer
      that can handle writes spanning more pages.
      
      This is all wasteful. We know the number of bytes to write, 1/2/4/8 and
      can find out the page. Then simply check if it's contained or the
      fallback is needed.
      
      This saves one function call to map_private_extent_buffer and several
      unnecessary temporary variables.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      029e4a42
    • David Sterba's avatar
      btrfs: speed up btrfs_get_token_##bits helpers · 8f9da810
      David Sterba authored
      The set/get token helpers either use the cached address in the token or
      unconditionally call map_private_extent_buffer to get the address of
      page containing the requested offset plus the mapping start and length.
      Depending on the return value, the fast path uses unaligned read to get
      data within a page, or fall back to read_extent_buffer that can handle
      reads spanning more pages.
      
      This is all wasteful. We know the number of bytes to read, 1/2/4/8 and
      can find out the page. Then simply check if it's contained or the
      fallback is needed. The token address is updated to the page, or the on
      the next index, expecting that the next read will use that.
      
      This saves one function call to map_private_extent_buffer and several
      unnecessary temporary variables.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8f9da810
    • David Sterba's avatar
      btrfs: speed up btrfs_get_##bits helpers · 1441ed9b
      David Sterba authored
      The helpers unconditionally call map_private_extent_buffer to get the
      address of page containing the requested offset plus the mapping start
      and length. Depending on the return value, the fast path uses unaligned
      read to get data within a page, or fall back to read_extent_buffer that
      can handle reads spanning more pages.
      
      This is all wasteful. We know the number of bytes to read, 1/2/4/8 and
      can find out the page. Then simply check if it's contained or the
      fallback is needed.
      
      This saves one function call to map_private_extent_buffer and several
      unnecessary temporary variables.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      1441ed9b
    • David Sterba's avatar
      btrfs: add separate bounds checker for set/get helpers · 5e394689
      David Sterba authored
      The bounds checking is now done in map_private_extent_buffer but that
      will be removed in following patches and some sanity checks should still
      be done.
      
      There are two separate checks to see the kind of out of bounds access:
      partial (start offset is in the buffer) or complete (both start and end
      are out).
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5e394689
    • David Sterba's avatar
      btrfs: preset set/get token with first page and drop condition · 870b388d
      David Sterba authored
      All the set/get helpers first check if the token contains a cached
      address. After first use the address is always valid, but the extra
      check is done for each call.
      
      The token initialization can optimistically set it to the first extent
      buffer page, that we know always exists. Then the condition in all
      btrfs_token_*/btrfs_set_token_* can be simplified by removing the
      address check from the condition, but for development the assertion
      still makes sure it's valid.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      870b388d
    • David Sterba's avatar
      btrfs: don't use set/get token in leaf_space_used · a31356b9
      David Sterba authored
      The token is supposed to cache the last page used by the set/get
      helpers. In leaf_space_used the first and last items are accessed, it's
      not likely they'd be on the same page so there's some overhead caused
      updating the token address but not using it.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a31356b9
    • David Sterba's avatar
      btrfs: don't use set/get token for single assignment in overwrite_item · 60d48e2e
      David Sterba authored
      The set/get token is supposed to cache the last page that was accessed
      so it speeds up subsequential access to the eb. It does not make sense
      to use that for just one change, which is the case of inode size in
      overwrite_item.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      60d48e2e
    • David Sterba's avatar
      btrfs: drop eb parameter from set/get token helpers · cc4c13d5
      David Sterba authored
      Now that all set/get helpers use the eb from the token, we don't need to
      pass it to many btrfs_token_*/btrfs_set_token_* helpers, saving some
      stack space.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      cc4c13d5
    • David Sterba's avatar
      btrfs: use the token::eb for all set/get helpers · 4dae666a
      David Sterba authored
      The token stores a copy of the extent buffer pointer but does not make
      any use of it besides sanity checks. We can use it and drop the eb
      parameter from several functions, this patch only switches the use
      inside the set/get helpers.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4dae666a
    • Tiezhu Yang's avatar
      btrfs: remove duplicated include in block-group.c · f2998ebd
      Tiezhu Yang authored
      disk-io.h is included more than once in block-group.c, remove it.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f2998ebd
    • Qu Wenruo's avatar
      btrfs: block-group: rename write_one_cache_group() · 3be4d8ef
      Qu Wenruo authored
      The name of this function contains the word "cache", which is left from
      the times where btrfs_block_group was called btrfs_block_group_cache.
      
      Now this "cache" doesn't match anything, and we have better namings for
      functions like read/insert/remove_block_group_item().
      
      Rename it to update_block_group_item().
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3be4d8ef
    • Qu Wenruo's avatar
      btrfs: block-group: refactor how we insert a block group item · 97f4728a
      Qu Wenruo authored
      Currently the block group item insert is pretty straight forward, fill
      the block group item structure and insert it into extent tree.
      
      However the incoming skinny block group feature is going to change this,
      so this patch will refactor insertion into a new function,
      insert_block_group_item(), to make the incoming feature easier to add.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      97f4728a
    • Qu Wenruo's avatar
      btrfs: block-group: refactor how we delete one block group item · 7357623a
      Qu Wenruo authored
      When deleting a block group item, it's pretty straight forward, just
      delete the item pointed by the key.  However it will not be that
      straight-forward for incoming skinny block group item.
      
      So refactor the block group item deletion into a new function,
      remove_block_group_item(), also to make the already lengthy
      btrfs_remove_block_group() a little shorter.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7357623a
    • Qu Wenruo's avatar
      btrfs: block-group: refactor how we read one block group item · 9afc6649
      Qu Wenruo authored
      Structure btrfs_block_group has the following members which are
      currently read from on-disk block group item and key:
      
      - length - from item key
      - used
      - flags - from block group item
      
      However for incoming skinny block group tree, we are going to read those
      members from different sources.
      
      This patch will refactor such read by:
      
      - Don't initialize btrfs_block_group::length at allocation
        Caller should initialize them manually.
        Also to avoid possible (well, only two callers) missing
        initialization, add extra ASSERT() in btrfs_add_block_group_cache().
      
      - Refactor length/used/flags initialization into one function
        The new function, fill_one_block_group() will handle the
        initialization of such members.
      
      - Use btrfs_block_group::length to replace key::offset
        Since skinny block group item would have a different meaning for its
        key offset.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9afc6649
    • Qu Wenruo's avatar
      btrfs: block-group: don't set the wrong READA flag for btrfs_read_block_groups() · 83fe9e12
      Qu Wenruo authored
      Regular block group items in extent tree are scattered inside the huge
      tree, thus forward readahead makes no sense.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      83fe9e12
    • Marcos Paulo de Souza's avatar
      btrfs: send: emit file capabilities after chown · 89efda52
      Marcos Paulo de Souza authored
      Whenever a chown is executed, all capabilities of the file being touched
      are lost.  When doing incremental send with a file with capabilities,
      there is a situation where the capability can be lost on the receiving
      side. The sequence of actions bellow shows the problem:
      
        $ mount /dev/sda fs1
        $ mount /dev/sdb fs2
      
        $ touch fs1/foo.bar
        $ setcap cap_sys_nice+ep fs1/foo.bar
        $ btrfs subvolume snapshot -r fs1 fs1/snap_init
        $ btrfs send fs1/snap_init | btrfs receive fs2
      
        $ chgrp adm fs1/foo.bar
        $ setcap cap_sys_nice+ep fs1/foo.bar
      
        $ btrfs subvolume snapshot -r fs1 fs1/snap_complete
        $ btrfs subvolume snapshot -r fs1 fs1/snap_incremental
      
        $ btrfs send fs1/snap_complete | btrfs receive fs2
        $ btrfs send -p fs1/snap_init fs1/snap_incremental | btrfs receive fs2
      
      At this point, only a chown was emitted by "btrfs send" since only the
      group was changed. This makes the cap_sys_nice capability to be dropped
      from fs2/snap_incremental/foo.bar
      
      To fix that, only emit capabilities after chown is emitted. The current
      code first checks for xattrs that are new/changed, emits them, and later
      emit the chown. Now, __process_new_xattr skips capabilities, letting
      only finish_inode_if_needed to emit them, if they exist, for the inode
      being processed.
      
      This behavior was being worked around in "btrfs receive" side by caching
      the capability and only applying it after chown. Now, xattrs are only
      emmited _after_ chown, making that workaround not needed anymore.
      
      Link: https://github.com/kdave/btrfs-progs/issues/202
      CC: stable@vger.kernel.org # 4.4+
      Suggested-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarMarcos Paulo de Souza <mpdesouza@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      89efda52
    • Filipe Manana's avatar
      btrfs: scrub, only lookup for csums if we are dealing with a data extent · 89490303
      Filipe Manana authored
      When scrubbing a stripe, whenever we find an extent we lookup for its
      checksums in the checksum tree. However we do it even for metadata extents
      which don't have checksum items stored in the checksum tree, that is
      only for data extents.
      
      So make the lookup for checksums only if we are processing with a data
      extent.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      89490303
    • Filipe Manana's avatar
      btrfs: move the block group freeze/unfreeze helpers into block-group.c · 684b752b
      Filipe Manana authored
      The helpers btrfs_freeze_block_group() and btrfs_unfreeze_block_group()
      used to be named btrfs_get_block_group_trimming() and
      btrfs_put_block_group_trimming() respectively.
      
      At the time they were added to free-space-cache.c, by commit e33e17ee
      ("btrfs: add missing discards when unpinning extents with -o discard")
      because all the trimming related functions were in free-space-cache.c.
      
      Now that the helpers were renamed and are used in scrub context as well,
      move them to block-group.c, a much more logical location for them.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      684b752b
    • Filipe Manana's avatar
      btrfs: rename member 'trimming' of block group to a more generic name · 6b7304af
      Filipe Manana authored
      Back in 2014, commit 04216820 ("Btrfs: fix race between fs trimming
      and block group remove/allocation"), I added the 'trimming' member to the
      block group structure. Its purpose was to prevent races between trimming
      and block group deletion/allocation by pinning the block group in a way
      that prevents its logical address and device extents from being reused
      while trimming is in progress for a block group, so that if another task
      deletes the block group and then another task allocates a new block group
      that gets the same logical address and device extents while the trimming
      task is still in progress.
      
      After the previous fix for scrub (patch "btrfs: fix a race between scrub
      and block group removal/allocation"), scrub now also has the same needs that
      trimming has, so the member name 'trimming' no longer makes sense.
      Since there is already a 'pinned' member in the block group that refers
      to space reservations (pinned bytes), rename the member to 'frozen',
      add a comment on top of it to describe its general purpose and rename
      the helpers to increment and decrement the counter as well, to match
      the new member name.
      
      The next patch in the series will move the helpers into a more suitable
      file (from free-space-cache.c to block-group.c).
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6b7304af