1. 23 Sep, 2022 3 commits
  2. 22 Sep, 2022 1 commit
    • Maurizio Lombardi's avatar
      mm: slub: fix flush_cpu_slab()/__free_slab() invocations in task context. · e45cc288
      Maurizio Lombardi authored
      Commit 5a836bf6 ("mm: slub: move flush_cpu_slab() invocations
      __free_slab() invocations out of IRQ context") moved all flush_cpu_slab()
      invocations to the global workqueue to avoid a problem related
      with deactivate_slab()/__free_slab() being called from an IRQ context
      on PREEMPT_RT kernels.
      
      When the flush_all_cpu_locked() function is called from a task context
      it may happen that a workqueue with WQ_MEM_RECLAIM bit set ends up
      flushing the global workqueue, this will cause a dependency issue.
      
       workqueue: WQ_MEM_RECLAIM nvme-delete-wq:nvme_delete_ctrl_work [nvme_core]
         is flushing !WQ_MEM_RECLAIM events:flush_cpu_slab
       WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637
         check_flush_dependency+0x10a/0x120
       Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
       RIP: 0010:check_flush_dependency+0x10a/0x120[  453.262125] Call Trace:
       __flush_work.isra.0+0xbf/0x220
       ? __queue_work+0x1dc/0x420
       flush_all_cpus_locked+0xfb/0x120
       __kmem_cache_shutdown+0x2b/0x320
       kmem_cache_destroy+0x49/0x100
       bioset_exit+0x143/0x190
       blk_release_queue+0xb9/0x100
       kobject_cleanup+0x37/0x130
       nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc]
       nvme_free_ctrl+0x1ac/0x2b0 [nvme_core]
      
      Fix this bug by creating a workqueue for the flush operation with
      the WQ_MEM_RECLAIM bit set.
      
      Fixes: 5a836bf6 ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      e45cc288
  3. 19 Sep, 2022 1 commit
    • Feng Tang's avatar
      mm/slab_common: fix possible double free of kmem_cache · d71608a8
      Feng Tang authored
      When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
      kunit test case cause a use-after-free error:
      
        BUG: KASAN: use-after-free in kobject_del+0x14/0x30
        Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261
      
        CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G    B            N 6.0.0-rc5-next-20220916 #17
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
        Call Trace:
         <TASK>
         dump_stack_lvl+0x34/0x48
         print_address_description.constprop.0+0x87/0x2a5
         print_report+0x103/0x1ed
         kasan_report+0xb7/0x140
         kobject_del+0x14/0x30
         kmem_cache_destroy+0x130/0x170
         test_exit+0x1a/0x30
         kunit_try_run_case+0xad/0xc0
         kunit_generic_run_threadfn_adapter+0x26/0x50
         kthread+0x17b/0x1b0
         </TASK>
      
      The cause is inside kmem_cache_destroy():
      
      kmem_cache_destroy
          acquire lock/mutex
          shutdown_cache
              schedule_work(kmem_cache_release) (if RCU flag set)
          release lock/mutex
          kmem_cache_release (if RCU flag not set)
      
      In some certain timing, the scheduled work could be run before
      the next RCU flag checking, which can then get a wrong value
      and lead to double kmem_cache_release().
      
      Fix it by caching the RCU flag inside protected area, just like 'refcnt'
      
      Fixes: 0495e337 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock")
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      d71608a8
  4. 16 Sep, 2022 6 commits
    • Thomas Gleixner's avatar
      slub: Make PREEMPT_RT support less convoluted · 1f04b07d
      Thomas Gleixner authored
      The slub code already has a few helpers depending on PREEMPT_RT. Add a few
      more and get rid of the CONFIG_PREEMPT_RT conditionals all over the place.
      
      No functional change.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      1f04b07d
    • Vlastimil Babka's avatar
      mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock() · 5875e598
      Vlastimil Babka authored
      The PREEMPT_RT specific disabling of irqs in __cmpxchg_double_slab()
      (through slab_[un]lock()) is unnecessary as bit_spin_lock() disables
      preemption and that's sufficient on PREEMPT_RT where no allocation/free
      operation is performed in hardirq context and so can't interrupt the
      current operation.
      
      That means we no longer need the slab_[un]lock() wrappers, so delete
      them and rename the current __slab_[un]lock() to slab_[un]lock().
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      5875e598
    • Vlastimil Babka's avatar
      mm/slub: convert object_map_lock to non-raw spinlock · 4ef3f5a3
      Vlastimil Babka authored
      The only remaining user of object_map_lock is list_slab_objects().
      Obtaining the lock there used to happen under slab_lock() which implied
      disabling irqs on PREEMPT_RT, thus it's a raw_spinlock. With the
      slab_lock() removed, we can convert it to a normal spinlock.
      
      Also remove the get_map()/put_map() wrappers as list_slab_objects()
      became their only remaining user.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      4ef3f5a3
    • Vlastimil Babka's avatar
      mm/slub: remove slab_lock() usage for debug operations · 41bec7c3
      Vlastimil Babka authored
      All alloc and free operations on debug caches are now serialized by
      n->list_lock, so we can remove slab_lock() usage in validate_slab()
      and list_slab_objects() as those also happen under n->list_lock.
      
      Note the usage in list_slab_objects() could happen even on non-debug
      caches, but only during cache shutdown time, so there should not be any
      parallel freeing activity anymore. Except for buggy slab users, but in
      that case the slab_lock() would not help against the common cmpxchg
      based fast paths (in non-debug caches) anyway.
      
      Also adjust documentation comments accordingly.
      Suggested-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      41bec7c3
    • Vlastimil Babka's avatar
      mm/slub: restrict sysfs validation to debug caches and make it safe · c7323a5a
      Vlastimil Babka authored
      Rongwei Wang reports [1] that cache validation triggered by writing to
      /sys/kernel/slab/<cache>/validate is racy against normal cache
      operations (e.g. freeing) in a way that can cause false positive
      inconsistency reports for caches with debugging enabled. The problem is
      that debugging actions that mark object free or active and actual
      freelist operations are not atomic, and the validation can see an
      inconsistent state.
      
      For caches that do or don't have debugging enabled, additional races
      involving n->nr_slabs are possible that result in false reports of wrong
      slab counts.
      
      This patch attempts to solve these issues while not adding overhead to
      normal (especially fastpath) operations for caches that do not have
      debugging enabled. Such overhead would not be justified to make possible
      userspace-triggered validation safe. Instead, disable the validation for
      caches that don't have debugging enabled and make their sysfs validate
      handler return -EINVAL.
      
      For caches that do have debugging enabled, we can instead extend the
      existing approach of not using percpu freelists to force all alloc/free
      operations to the slow paths where debugging flags is checked and acted
      upon. There can adjust the debug-specific paths to increase n->list_lock
      coverage against concurrent validation as necessary.
      
      The processing on free in free_debug_processing() already happens under
      n->list_lock so we can extend it to actually do the freeing as well and
      thus make it atomic against concurrent validation. As observed by
      Hyeonggon Yoo, we do not really need to take slab_lock() anymore here
      because all paths we could race with are protected by n->list_lock under
      the new scheme, so drop its usage here.
      
      The processing on alloc in alloc_debug_processing() currently doesn't
      take any locks, but we have to first allocate the object from a slab on
      the partial list (as debugging caches have no percpu slabs) and thus
      take the n->list_lock anyway. Add a function alloc_single_from_partial()
      that grabs just the allocated object instead of the whole freelist, and
      does the debug processing. The n->list_lock coverage again makes it
      atomic against validation and it is also ultimately more efficient than
      the current grabbing of freelist immediately followed by slab
      deactivation.
      
      To prevent races on n->nr_slabs updates, make sure that for caches with
      debugging enabled, inc_slabs_node() or dec_slabs_node() is called under
      n->list_lock. When allocating a new slab for a debug cache, handle the
      allocation by a new function alloc_single_from_new_slab() instead of the
      current forced deactivation path.
      
      Neither of these changes affect the fast paths at all. The changes in
      slow paths are negligible for non-debug caches.
      
      [1] https://lore.kernel.org/all/20220529081535.69275-1-rongwei.wang@linux.alibaba.com/Reported-by: default avatarRongwei Wang <rongwei.wang@linux.alibaba.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      c7323a5a
    • Peter Collingbourne's avatar
      kasan: call kasan_malloc() from __kmalloc_*track_caller() · 5373b8a0
      Peter Collingbourne authored
      We were failing to call kasan_malloc() from __kmalloc_*track_caller()
      which was causing us to sometimes fail to produce KASAN error reports
      for allocations made using e.g. devm_kcalloc(), as the KASAN poison was
      not being initialized. Fix it.
      Signed-off-by: default avatarPeter Collingbourne <pcc@google.com>
      Cc: <stable@vger.kernel.org> # 5.15
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      5373b8a0
  5. 08 Sep, 2022 1 commit
    • Chao Yu's avatar
      mm/slub: fix to return errno if kmalloc() fails · 7e9c323c
      Chao Yu authored
      In create_unique_id(), kmalloc(, GFP_KERNEL) can fail due to
      out-of-memory, if it fails, return errno correctly rather than
      triggering panic via BUG_ON();
      
      kernel BUG at mm/slub.c:5893!
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      
      Call trace:
       sysfs_slab_add+0x258/0x260 mm/slub.c:5973
       __kmem_cache_create+0x60/0x118 mm/slub.c:4899
       create_cache mm/slab_common.c:229 [inline]
       kmem_cache_create_usercopy+0x19c/0x31c mm/slab_common.c:335
       kmem_cache_create+0x1c/0x28 mm/slab_common.c:390
       f2fs_kmem_cache_create fs/f2fs/f2fs.h:2766 [inline]
       f2fs_init_xattr_caches+0x78/0xb4 fs/f2fs/xattr.c:808
       f2fs_fill_super+0x1050/0x1e0c fs/f2fs/super.c:4149
       mount_bdev+0x1b8/0x210 fs/super.c:1400
       f2fs_mount+0x44/0x58 fs/f2fs/super.c:4512
       legacy_get_tree+0x30/0x74 fs/fs_context.c:610
       vfs_get_tree+0x40/0x140 fs/super.c:1530
       do_new_mount+0x1dc/0x4e4 fs/namespace.c:3040
       path_mount+0x358/0x914 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __arm64_sys_mount+0x2f8/0x408 fs/namespace.c:3568
      
      Cc: <stable@kernel.org>
      Fixes: 81819f0f ("SLUB core")
      Reported-by: syzbot+81684812ea68216e08c5@syzkaller.appspotmail.com
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      7e9c323c
  6. 01 Sep, 2022 7 commits
  7. 25 Aug, 2022 1 commit
  8. 24 Aug, 2022 11 commits
  9. 23 Aug, 2022 2 commits
  10. 22 Aug, 2022 1 commit
  11. 21 Aug, 2022 6 commits