1. 26 Aug, 2020 9 commits
    • Josef Bacik's avatar
      btrfs: sysfs: use NOFS for device creation · 76c38196
      Josef Bacik authored
      [ Upstream commit a47bd78d ]
      
      Dave hit this splat during testing btrfs/078:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc6-default+ #1191 Not tainted
        ------------------------------------------------------
        kswapd0/75 is trying to acquire lock:
        ffffa040e9d04ff8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
      
        but task is already holding lock:
        ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (fs_reclaim){+.+.}-{0:0}:
      	 __lock_acquire+0x56f/0xaa0
      	 lock_acquire+0xa3/0x440
      	 fs_reclaim_acquire.part.0+0x25/0x30
      	 __kmalloc_track_caller+0x49/0x330
      	 kstrdup+0x2e/0x60
      	 __kernfs_new_node.constprop.0+0x44/0x250
      	 kernfs_new_node+0x25/0x50
      	 kernfs_create_link+0x34/0xa0
      	 sysfs_do_create_link_sd+0x5e/0xd0
      	 btrfs_sysfs_add_devices_dir+0x65/0x100 [btrfs]
      	 btrfs_init_new_device+0x44c/0x12b0 [btrfs]
      	 btrfs_ioctl+0xc3c/0x25c0 [btrfs]
      	 ksys_ioctl+0x68/0xa0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0xe0
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
      	 __lock_acquire+0x56f/0xaa0
      	 lock_acquire+0xa3/0x440
      	 __mutex_lock+0xa0/0xaf0
      	 btrfs_chunk_alloc+0x137/0x3e0 [btrfs]
      	 find_free_extent+0xb44/0xfb0 [btrfs]
      	 btrfs_reserve_extent+0x9b/0x180 [btrfs]
      	 btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
      	 alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
      	 __btrfs_cow_block+0x143/0x7a0 [btrfs]
      	 btrfs_cow_block+0x15f/0x310 [btrfs]
      	 push_leaf_right+0x150/0x240 [btrfs]
      	 split_leaf+0x3cd/0x6d0 [btrfs]
      	 btrfs_search_slot+0xd14/0xf70 [btrfs]
      	 btrfs_insert_empty_items+0x64/0xc0 [btrfs]
      	 __btrfs_commit_inode_delayed_items+0xb2/0x840 [btrfs]
      	 btrfs_async_run_delayed_root+0x10e/0x1d0 [btrfs]
      	 btrfs_work_helper+0x2f9/0x650 [btrfs]
      	 process_one_work+0x22c/0x600
      	 worker_thread+0x50/0x3b0
      	 kthread+0x137/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
      	 check_prev_add+0x98/0xa20
      	 validate_chain+0xa8c/0x2a00
      	 __lock_acquire+0x56f/0xaa0
      	 lock_acquire+0xa3/0x440
      	 __mutex_lock+0xa0/0xaf0
      	 __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
      	 btrfs_evict_inode+0x3bf/0x560 [btrfs]
      	 evict+0xd6/0x1c0
      	 dispose_list+0x48/0x70
      	 prune_icache_sb+0x54/0x80
      	 super_cache_scan+0x121/0x1a0
      	 do_shrink_slab+0x175/0x420
      	 shrink_slab+0xb1/0x2e0
      	 shrink_node+0x192/0x600
      	 balance_pgdat+0x31f/0x750
      	 kswapd+0x206/0x510
      	 kthread+0x137/0x150
      	 ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
        Chain exists of:
          &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(fs_reclaim);
      				 lock(&fs_info->chunk_mutex);
      				 lock(fs_reclaim);
          lock(&delayed_node->mutex);
      
         *** DEADLOCK ***
      
        3 locks held by kswapd0/75:
         #0: ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
         #1: ffffffff8b0b50b8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x54/0x2e0
         #2: ffffa040e057c0e8 (&type->s_umount_key#26){++++}-{3:3}, at: trylock_super+0x16/0x50
      
        stack backtrace:
        CPU: 2 PID: 75 Comm: kswapd0 Not tainted 5.8.0-rc6-default+ #1191
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x16f/0x190
         check_prev_add+0x98/0xa20
         validate_chain+0xa8c/0x2a00
         __lock_acquire+0x56f/0xaa0
         lock_acquire+0xa3/0x440
         ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         __mutex_lock+0xa0/0xaf0
         ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         ? __lock_acquire+0x56f/0xaa0
         ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         ? lock_acquire+0xa3/0x440
         ? btrfs_evict_inode+0x138/0x560 [btrfs]
         ? btrfs_evict_inode+0x2fe/0x560 [btrfs]
         ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         btrfs_evict_inode+0x3bf/0x560 [btrfs]
         evict+0xd6/0x1c0
         dispose_list+0x48/0x70
         prune_icache_sb+0x54/0x80
         super_cache_scan+0x121/0x1a0
         do_shrink_slab+0x175/0x420
         shrink_slab+0xb1/0x2e0
         shrink_node+0x192/0x600
         balance_pgdat+0x31f/0x750
         kswapd+0x206/0x510
         ? _raw_spin_unlock_irqrestore+0x3e/0x50
         ? finish_wait+0x90/0x90
         ? balance_pgdat+0x750/0x750
         kthread+0x137/0x150
         ? kthread_stop+0x2a0/0x2a0
         ret_from_fork+0x1f/0x30
      
      This is because we're holding the chunk_mutex while adding this device
      and adding its sysfs entries.  We actually hold different locks in
      different places when calling this function, the dev_replace semaphore
      for instance in dev replace, so instead of moving this call around
      simply wrap it's operations in NOFS.
      
      CC: stable@vger.kernel.org # 4.14+
      Reported-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      76c38196
    • Qu Wenruo's avatar
      btrfs: inode: fix NULL pointer dereference if inode doesn't need compression · 35c15768
      Qu Wenruo authored
      [ Upstream commit 1e6e238c ]
      
      [BUG]
      There is a bug report of NULL pointer dereference caused in
      compress_file_extent():
      
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
        NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
        LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
        Call Trace:
        [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
        [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
        [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
        [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
        [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
        [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
        [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
        ---[ end trace f16954aa20d822f6 ]---
      
      [CAUSE]
      For the following execution route of compress_file_range(), it's
      possible to hit NULL pointer dereference:
      
       compress_file_extent()
       |- pages = NULL;
       |- start = async_chunk->start = 0;
       |- end = async_chunk = 4095;
       |- nr_pages = 1;
       |- inode_need_compress() == false; <<< Possible, see later explanation
       |  Now, we have nr_pages = 1, pages = NULL
       |- cont:
       |- 		ret = cow_file_range_inline();
       |- 		if (ret <= 0) {
       |-		for (i = 0; i < nr_pages; i++) {
       |-			WARN_ON(pages[i]->mapping);	<<< Crash
      
      To enter above call execution branch, we need the following race:
      
          Thread 1 (chattr)     |            Thread 2 (writeback)
      --------------------------+------------------------------
                                | btrfs_run_delalloc_range
                                | |- inode_need_compress = true
                                | |- cow_file_range_async()
      btrfs_ioctl_set_flag()    |
      |- binode_flags |=        |
         BTRFS_INODE_NOCOMPRESS |
                                | compress_file_range()
                                | |- inode_need_compress = false
                                | |- nr_page = 1 while pages = NULL
                                | |  Then hit the crash
      
      [FIX]
      This patch will fix it by checking @pages before doing accessing it.
      This patch is only designed as a hot fix and easy to backport.
      
      More elegant fix may make btrfs only check inode_need_compress() once to
      avoid such race, but that would be another story.
      Reported-by: default avatarLuciano Chavez <chavez@us.ibm.com>
      Fixes: 4d3a800e ("btrfs: merge nr_pages input and output parameter in compress_pages")
      CC: stable@vger.kernel.org # 4.14.x: cecc8d90: btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      35c15768
    • Nikolay Borisov's avatar
      btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range · 2a3d84f1
      Nikolay Borisov authored
      [ Upstream commit cecc8d90 ]
      
      This label is only executed if compress_file_range fails to create an
      inline extent. So move its code in the semantically related inline
      extent handling branch. No functional changes.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a3d84f1
    • Josef Bacik's avatar
      btrfs: don't show full path of bind mounts in subvol= · f50a0aba
      Josef Bacik authored
      [ Upstream commit 3ef3959b ]
      
      Chris Murphy reported a problem where rpm ostree will bind mount a bunch
      of things for whatever voodoo it's doing.  But when it does this
      /proc/mounts shows something like
      
        /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
        /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo/bar 0 0
      
      Despite subvolid=256 being subvol=/foo.  This is because we're just
      spitting out the dentry of the mount point, which in the case of bind
      mounts is the source path for the mountpoint.  Instead we should spit
      out the path to the actual subvol.  Fix this by looking up the name for
      the subvolid we have mounted.  With this fix the same test looks like
      this
      
        /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
        /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
      Reported-by: default avatarChris Murphy <chris@colorremedies.com>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f50a0aba
    • Marcos Paulo de Souza's avatar
      btrfs: export helpers for subvolume name/id resolution · dd39b6f6
      Marcos Paulo de Souza authored
      [ Upstream commit c0c907a4 ]
      
      The functions will be used outside of export.c and super.c to allow
      resolving subvolume name from a given id, eg. for subvolume deletion by
      id ioctl.
      Signed-off-by: default avatarMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ split from the next patch ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dd39b6f6
    • Hugh Dickins's avatar
      khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() · 2ef7ebb1
      Hugh Dickins authored
      [ Upstream commit f3f99d63 ]
      
      syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
      __khugepaged_enter(): yes, when one thread is about to dump core, has set
      core_state, and is waiting for others, another might do something calling
      __khugepaged_enter(), which now crashes because I lumped the core_state
      test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
      think it's best to lump them together, so just in this exceptional case,
      check mm->mm_users directly instead of khugepaged_test_exit().
      
      Fixes: bbe98f9c ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008141503370.18085@eggly.anvilsSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2ef7ebb1
    • Hugh Dickins's avatar
      khugepaged: khugepaged_test_exit() check mmget_still_valid() · 17c08ee0
      Hugh Dickins authored
      [ Upstream commit bbe98f9c ]
      
      Move collapse_huge_page()'s mmget_still_valid() check into
      khugepaged_test_exit() itself.  collapse_huge_page() is used for anon THP
      only, and earned its mmget_still_valid() check because it inserts a huge
      pmd entry in place of the page table's pmd entry; whereas
      collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
      merely clears the page table's pmd entry.  But core dumping without mmap
      lock must have been as open to mistaking a racily cleared pmd entry for a
      page table at physical page 0, as exit_mmap() was.  And we certainly have
      no interest in mapping as a THP once dumping core.
      
      Fixes: 59ea6d06 ("coredump: fix race condition between collapse_huge_page() and core dumping")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvilsSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      17c08ee0
    • Masami Hiramatsu's avatar
      perf probe: Fix memory leakage when the probe point is not found · 6cb22ed4
      Masami Hiramatsu authored
      [ Upstream commit 12d572e7 ]
      
      Fix the memory leakage in debuginfo__find_trace_events() when the probe
      point is not found in the debuginfo. If there is no probe point found in
      the debuginfo, debuginfo__find_probes() will NOT return -ENOENT, but 0.
      
      Thus the caller of debuginfo__find_probes() must check the tf.ntevs and
      release the allocated memory for the array of struct probe_trace_event.
      
      The current code releases the memory only if the debuginfo__find_probes()
      hits an error but not checks tf.ntevs. In the result, the memory allocated
      on *tevs are not released if tf.ntevs == 0.
      
      This fixes the memory leakage by checking tf.ntevs == 0 in addition to
      ret < 0.
      
      Fixes: ff741783 ("perf probe: Introduce debuginfo to encapsulate dwarf information")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/159438668346.62703.10887420400718492503.stgit@devnote2Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6cb22ed4
    • Chris Wilson's avatar
      drm/vgem: Replace opencoded version of drm_gem_dumb_map_offset() · b93a3871
      Chris Wilson authored
      [ Upstream commit 119c53d2 ]
      
      drm_gem_dumb_map_offset() now exists and does everything
      vgem_gem_dump_map does and *ought* to do.
      
      In particular, vgem_gem_dumb_map() was trying to reject mmapping an
      imported dmabuf by checking the existence of obj->filp. Unfortunately,
      we always allocated an obj->filp, even if unused for an imported dmabuf.
      Instead, the drm_gem_dumb_map_offset(), since commit 90378e58
      ("drm/gem: drm_gem_dumb_map_offset(): reject dma-buf"), uses the
      obj->import_attach to reject such invalid mmaps.
      
      This prevents vgem from allowing userspace mmapping the dumb handle and
      attempting to incorrectly fault in remote pages belonging to another
      device, where there may not even be a struct page.
      
      v2: Use the default drm_gem_dumb_map_offset() callback
      
      Fixes: af33a919 ("drm/vgem: Enable dmabuf import interfaces")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: <stable@vger.kernel.org> # v4.13+
      Link: https://patchwork.freedesktop.org/patch/msgid/20200708154911.21236-1-chris@chris-wilson.co.ukSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      b93a3871
  2. 21 Aug, 2020 31 commits