An error occurred fetching the project authors.
  1. 08 Dec, 2020 6 commits
  2. 26 Oct, 2020 1 commit
    • Josef Bacik's avatar
      btrfs: add a helper to read the tree_root commit root for backref lookup · 49d11bea
      Josef Bacik authored
      I got the following lockdep splat with tree locks converted to rwsem
      patches on btrfs/104:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.9.0+ #102 Not tainted
        ------------------------------------------------------
        btrfs-cleaner/903 is trying to acquire lock:
        ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
      
        but task is already holding lock:
        ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
      	 down_read+0x40/0x130
      	 caching_thread+0x53/0x5a0
      	 btrfs_work_helper+0xfa/0x520
      	 process_one_work+0x238/0x540
      	 worker_thread+0x55/0x3c0
      	 kthread+0x13a/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x7e/0x7b0
      	 btrfs_cache_block_group+0x1e0/0x510
      	 find_free_extent+0xb6e/0x12f0
      	 btrfs_reserve_extent+0xb3/0x1b0
      	 btrfs_alloc_tree_block+0xb1/0x330
      	 alloc_tree_block_no_bg_flush+0x4f/0x60
      	 __btrfs_cow_block+0x11d/0x580
      	 btrfs_cow_block+0x10c/0x220
      	 commit_cowonly_roots+0x47/0x2e0
      	 btrfs_commit_transaction+0x595/0xbd0
      	 sync_filesystem+0x74/0x90
      	 generic_shutdown_super+0x22/0x100
      	 kill_anon_super+0x14/0x30
      	 btrfs_kill_super+0x12/0x20
      	 deactivate_locked_super+0x36/0xa0
      	 cleanup_mnt+0x12d/0x190
      	 task_work_run+0x5c/0xa0
      	 exit_to_user_mode_prepare+0x1df/0x200
      	 syscall_exit_to_user_mode+0x54/0x280
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #1 (&space_info->groups_sem){++++}-{3:3}:
      	 down_read+0x40/0x130
      	 find_free_extent+0x2ed/0x12f0
      	 btrfs_reserve_extent+0xb3/0x1b0
      	 btrfs_alloc_tree_block+0xb1/0x330
      	 alloc_tree_block_no_bg_flush+0x4f/0x60
      	 __btrfs_cow_block+0x11d/0x580
      	 btrfs_cow_block+0x10c/0x220
      	 commit_cowonly_roots+0x47/0x2e0
      	 btrfs_commit_transaction+0x595/0xbd0
      	 sync_filesystem+0x74/0x90
      	 generic_shutdown_super+0x22/0x100
      	 kill_anon_super+0x14/0x30
      	 btrfs_kill_super+0x12/0x20
      	 deactivate_locked_super+0x36/0xa0
      	 cleanup_mnt+0x12d/0x190
      	 task_work_run+0x5c/0xa0
      	 exit_to_user_mode_prepare+0x1df/0x200
      	 syscall_exit_to_user_mode+0x54/0x280
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (btrfs-root-00){++++}-{3:3}:
      	 __lock_acquire+0x1167/0x2150
      	 lock_acquire+0xb9/0x3d0
      	 down_read_nested+0x43/0x130
      	 __btrfs_tree_read_lock+0x32/0x170
      	 __btrfs_read_lock_root_node+0x3a/0x50
      	 btrfs_search_slot+0x614/0x9d0
      	 btrfs_find_root+0x35/0x1b0
      	 btrfs_read_tree_root+0x61/0x120
      	 btrfs_get_root_ref+0x14b/0x600
      	 find_parent_nodes+0x3e6/0x1b30
      	 btrfs_find_all_roots_safe+0xb4/0x130
      	 btrfs_find_all_roots+0x60/0x80
      	 btrfs_qgroup_trace_extent_post+0x27/0x40
      	 btrfs_add_delayed_data_ref+0x3fd/0x460
      	 btrfs_free_extent+0x42/0x100
      	 __btrfs_mod_ref+0x1d7/0x2f0
      	 walk_up_proc+0x11c/0x400
      	 walk_up_tree+0xf0/0x180
      	 btrfs_drop_snapshot+0x1c7/0x780
      	 btrfs_clean_one_deleted_snapshot+0xfb/0x110
      	 cleaner_kthread+0xd4/0x140
      	 kthread+0x13a/0x150
      	 ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
        Chain exists of:
          btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&fs_info->commit_root_sem);
      				 lock(&caching_ctl->mutex);
      				 lock(&fs_info->commit_root_sem);
          lock(btrfs-root-00);
      
         *** DEADLOCK ***
      
        3 locks held by btrfs-cleaner/903:
         #0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
         #1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
         #2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
      
        stack backtrace:
        CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
        Call Trace:
         dump_stack+0x8b/0xb0
         check_noncircular+0xcf/0xf0
         __lock_acquire+0x1167/0x2150
         ? __bfs+0x42/0x210
         lock_acquire+0xb9/0x3d0
         ? __btrfs_tree_read_lock+0x32/0x170
         down_read_nested+0x43/0x130
         ? __btrfs_tree_read_lock+0x32/0x170
         __btrfs_tree_read_lock+0x32/0x170
         __btrfs_read_lock_root_node+0x3a/0x50
         btrfs_search_slot+0x614/0x9d0
         ? find_held_lock+0x2b/0x80
         btrfs_find_root+0x35/0x1b0
         ? do_raw_spin_unlock+0x4b/0xa0
         btrfs_read_tree_root+0x61/0x120
         btrfs_get_root_ref+0x14b/0x600
         find_parent_nodes+0x3e6/0x1b30
         btrfs_find_all_roots_safe+0xb4/0x130
         btrfs_find_all_roots+0x60/0x80
         btrfs_qgroup_trace_extent_post+0x27/0x40
         btrfs_add_delayed_data_ref+0x3fd/0x460
         btrfs_free_extent+0x42/0x100
         __btrfs_mod_ref+0x1d7/0x2f0
         walk_up_proc+0x11c/0x400
         walk_up_tree+0xf0/0x180
         btrfs_drop_snapshot+0x1c7/0x780
         ? btrfs_clean_one_deleted_snapshot+0x73/0x110
         btrfs_clean_one_deleted_snapshot+0xfb/0x110
         cleaner_kthread+0xd4/0x140
         ? btrfs_alloc_root+0x50/0x50
         kthread+0x13a/0x150
         ? kthread_create_worker_on_cpu+0x40/0x40
         ret_from_fork+0x1f/0x30
        BTRFS info (device sdb): disk space caching is enabled
        BTRFS info (device sdb): has skinny extents
      
      This happens because qgroups does a backref lookup when we create a
      delayed ref.  From here it may have to look up a root from an indirect
      ref, which does a normal lookup on the tree_root, which takes the read
      lock on the tree_root nodes.
      
      To fix this we need to add a variant for looking up roots that searches
      the commit root of the tree_root.  Then when we do the backref search
      using the commit root we are sure to not take any locks on the tree_root
      nodes.  This gets rid of the lockdep splat when running btrfs/104.
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      49d11bea
  3. 07 Oct, 2020 3 commits
  4. 27 Jul, 2020 1 commit
    • Qu Wenruo's avatar
      btrfs: preallocate anon block device at first phase of snapshot creation · 2dfb1e43
      Qu Wenruo authored
      [BUG]
      When the anonymous block device pool is exhausted, subvolume/snapshot
      creation fails with EMFILE (Too many files open). This has been reported
      by a user. The allocation happens in the second phase during transaction
      commit where it's only way out is to abort the transaction
      
        BTRFS: Transaction aborted (error -24)
        WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
        RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
        Call Trace:
         create_pending_snapshots+0x82/0xa0 [btrfs]
         btrfs_commit_transaction+0x275/0x8c0 [btrfs]
         btrfs_mksubvol+0x4b9/0x500 [btrfs]
         btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
         btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
         btrfs_ioctl+0x11a4/0x2da0 [btrfs]
         do_vfs_ioctl+0xa9/0x640
         ksys_ioctl+0x67/0x90
         __x64_sys_ioctl+0x1a/0x20
         do_syscall_64+0x5a/0x110
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ---[ end trace 33f2f83f3d5250e9 ]---
        BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
        BTRFS info (device sda1): forced readonly
        BTRFS warning (device sda1): Skipping commit of aborted transaction.
        BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown
      
      [CAUSE]
      When the global anonymous block device pool is exhausted, the following
      call chain will fail, and lead to transaction abort:
      
       btrfs_ioctl_snap_create_v2()
       |- btrfs_ioctl_snap_create_transid()
          |- btrfs_mksubvol()
             |- btrfs_commit_transaction()
                |- create_pending_snapshot()
                   |- btrfs_get_fs_root()
                      |- btrfs_init_fs_root()
                         |- get_anon_bdev()
      
      [FIX]
      Although we can't enlarge the anonymous block device pool, at least we
      can preallocate anon_dev for subvolume/snapshot in the first phase,
      outside of transaction context and exactly at the moment the user calls
      the creation ioctl.
      Reported-by: default avatarGreed Rong <greedrong@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2dfb1e43
  5. 25 May, 2020 2 commits
    • David Sterba's avatar
      btrfs: simplify root lookup by id · 56e9357a
      David Sterba authored
      The main function to lookup a root by its id btrfs_get_fs_root takes the
      whole key, while only using the objectid. The value of offset is preset
      to (u64)-1 but not actually used until btrfs_find_root that does the
      actual search.
      
      Switch btrfs_get_fs_root to use only objectid and remove all local
      variables that existed just for the lookup. The actual key for search is
      set up in btrfs_get_fs_root, reusing another key variable.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      56e9357a
    • Omar Sandoval's avatar
      btrfs: get rid of endio_repair_workers · 5c047a69
      Omar Sandoval authored
      This was originally added in commit 8b110e39 ("Btrfs: implement
      repair function when direct read fails") to avoid a deadlock. In that
      commit, the direct I/O read endio executes on the endio_workers
      workqueue, submits a repair bio, and waits for it to complete. The
      repair bio endio must execute on a different workqueue, otherwise it
      could block on the endio_workers workqueue becoming available, which
      won't happen because the original endio is blocked on the repair bio.
      
      As of the previous commit, the original endio doesn't wait for the
      repair bio, so this separate workqueue is unnecessary.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5c047a69
  6. 23 Mar, 2020 11 commits
  7. 20 Jan, 2020 1 commit
  8. 18 Nov, 2019 1 commit
  9. 09 Sep, 2019 1 commit
  10. 01 Jul, 2019 1 commit
  11. 29 Apr, 2019 5 commits
    • David Sterba's avatar
      btrfs: get fs_info from trans in btrfs_create_tree · 9b7a2440
      David Sterba authored
      We can read fs_info from the transaction and can drop it from the
      parameters.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9b7a2440
    • David Sterba's avatar
      btrfs: get fs_info from eb in btrfs_verify_level_key · e064d5e9
      David Sterba authored
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e064d5e9
    • David Sterba's avatar
      btrfs: get fs_info from eb in clean_tree_block · 6a884d7d
      David Sterba authored
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6a884d7d
    • David Sterba's avatar
      btrfs: move tree block wait and write helpers to tree-log · 247462a5
      David Sterba authored
      The wrapper names better describe what's happening so they're not
      deleted though they're trivial, but at least moved closer to their place
      of use.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      247462a5
    • Qu Wenruo's avatar
      btrfs: Check the first key and level for cached extent buffer · 448de471
      Qu Wenruo authored
      [BUG]
      When reading a file from a fuzzed image, kernel can panic like:
      
        BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
        assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3500!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
        Call Trace:
         btrfs_lookup_csum+0x52/0x150 [btrfs]
         __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
         btrfs_submit_bio_hook+0x103/0x170 [btrfs]
         submit_one_bio+0x59/0x80 [btrfs]
         extent_read_full_page+0x58/0x80 [btrfs]
         generic_file_read_iter+0x2f6/0x9d0
         __vfs_read+0x14d/0x1a0
         vfs_read+0x8d/0x140
         ksys_read+0x52/0xc0
         do_syscall_64+0x60/0x210
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [CAUSE]
      The fuzzed image has a corrupted leaf whose first key doesn't match its
      parent:
      
        checksum tree key (CSUM_TREE ROOT_ITEM 0)
        node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
        fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
        chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
        	...
                key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
      
        leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
        leaf 29761536 flags 0x1(WRITTEN) backref revision 1
        fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
        chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
                item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
                        range start 8798638964736 end 8798641262592 length 2297856
      
      When reading the above tree block, we have extent_buffer->refs = 2 in
      the context:
      
      - initial one from __alloc_extent_buffer()
        alloc_extent_buffer()
        |- __alloc_extent_buffer()
           |- atomic_set(&eb->refs, 1)
      
      - one being added to fs_info->buffer_radix
        alloc_extent_buffer()
        |- check_buffer_tree_ref()
           |- atomic_inc(&eb->refs)
      
      So if even we call free_extent_buffer() in read_tree_block or other
      similar situation, we only decrease the refs by 1, it doesn't reach 0
      and won't be freed right now.
      
      The staled eb and its corrupted content will still be kept cached.
      
      Furthermore, we have several extra cases where we either don't do first
      key check or the check is not proper for all callers:
      
      - scrub
        We just don't have first key in this context.
      
      - shared tree block
        One tree block can be shared by several snapshot/subvolume trees.
        In that case, the first key check for one subvolume doesn't apply to
        another.
      
      So for the above reasons, a corrupted extent buffer can sneak into the
      buffer cache.
      
      [FIX]
      Call verify_level_key in read_block_for_search to do another
      verification. For that purpose the function is exported.
      
      Due to above reasons, although we can free corrupted extent buffer from
      cache, we still need the check in read_block_for_search(), for scrub and
      shared tree blocks.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769Reported-by: default avatarYoon Jungyeon <jungyeon@gatech.edu>
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      448de471
  12. 17 Dec, 2018 1 commit
  13. 06 Aug, 2018 1 commit
  14. 12 Apr, 2018 1 commit
  15. 31 Mar, 2018 1 commit
    • Qu Wenruo's avatar
      btrfs: Validate child tree block's level and first key · 581c1760
      Qu Wenruo authored
      We have several reports about node pointer points to incorrect child
      tree blocks, which could have even wrong owner and level but still with
      valid generation and checksum.
      
      Although btrfs check could handle it and print error message like:
      leaf parent key incorrect 60670574592
      
      Kernel doesn't have enough check on this type of corruption correctly.
      At least add such check to read_tree_block() and btrfs_read_buffer(),
      where we need two new parameters @level and @first_key to verify the
      child tree block.
      
      The new @level check is mandatory and all call sites are already
      modified to extract expected level from its call chain.
      
      While @first_key is optional, the following call sites are skipping such
      check:
      1) Root node/leaf
         As ROOT_ITEM doesn't contain the first key, skip @first_key check.
      2) Direct backref
         Only parent bytenr and level is known and we need to resolve the key
         all by ourselves, skip @first_key check.
      
      Another note of this verification is, it needs extra info from nodeptr
      or ROOT_ITEM, so it can't fit into current tree-checker framework, which
      is limited to node/leaf boundary.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      581c1760
  16. 30 Mar, 2018 1 commit
  17. 26 Mar, 2018 2 commits