1. 11 Sep, 2023 1 commit
    • Rafael Aquini's avatar
      mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy() · 46a9ea66
      Rafael Aquini authored
      After the commit in Fixes:, if a module that created a slab cache does not
      release all of its allocated objects before destroying the cache (at rmmod
      time), we might end up releasing the kmem_cache object without removing it
      from the slab_caches list thus corrupting the list as kmem_cache_destroy()
      ignores the return value from shutdown_cache(), which in turn never removes
      the kmem_cache object from slabs_list in case __kmem_cache_shutdown() fails
      to release all of the cache's slabs.
      
      This is easily observable on a kernel built with CONFIG_DEBUG_LIST=y
      as after that ill release the system will immediately trip on list_add,
      or list_del, assertions similar to the one shown below as soon as another
      kmem_cache gets created, or destroyed:
      
        [ 1041.213632] list_del corruption. next->prev should be ffff89f596fb5768, but was 52f1e5016aeee75d. (next=ffff89f595a1b268)
        [ 1041.219165] ------------[ cut here ]------------
        [ 1041.221517] kernel BUG at lib/list_debug.c:62!
        [ 1041.223452] invalid opcode: 0000 [#1] PREEMPT SMP PTI
        [ 1041.225408] CPU: 2 PID: 1852 Comm: rmmod Kdump: loaded Tainted: G    B   W  OE      6.5.0 #15
        [ 1041.228244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023
        [ 1041.231212] RIP: 0010:__list_del_entry_valid+0xae/0xb0
      
      Another quick way to trigger this issue, in a kernel with CONFIG_SLUB=y,
      is to set slub_debug to poison the released objects and then just run
      cat /proc/slabinfo after removing the module that leaks slab objects,
      in which case the kernel will panic:
      
        [   50.954843] general protection fault, probably for non-canonical address 0xa56b6b6b6b6b6b8b: 0000 [#1] PREEMPT SMP PTI
        [   50.961545] CPU: 2 PID: 1495 Comm: cat Kdump: loaded Tainted: G    B   W  OE      6.5.0 #15
        [   50.966808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023
        [   50.972663] RIP: 0010:get_slabinfo+0x42/0xf0
      
      This patch fixes this issue by properly checking shutdown_cache()'s
      return value before taking the kmem_cache_release() branch.
      
      Fixes: 0495e337 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock")
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      46a9ea66
  2. 29 Aug, 2023 1 commit
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.6/random_kmalloc' into slab/for-next · 3d053e80
      Vlastimil Babka authored
      Merge the new hardening feature to make heap spraying harder, by GONG,
      Ruiqi. It creates multiple (16) copies of kmalloc caches, reducing the
      chance of an attacker-controllable allocation site to land in the same
      slab as e.g.  an allocation site with use-after-free vulnerability. The
      selection of the copy is derived from the allocation site address,
      including a per-boot random seed.
      
      In line with SLAB deprecation, this is a SLUB only feature, incompatible
      with SLUB_TINY due to the memory overhead of the extra cache copies.
      3d053e80
  3. 18 Jul, 2023 1 commit
    • GONG, Ruiqi's avatar
      Randomized slab caches for kmalloc() · 3c615294
      GONG, Ruiqi authored
      When exploiting memory vulnerabilities, "heap spraying" is a common
      technique targeting those related to dynamic memory allocation (i.e. the
      "heap"), and it plays an important role in a successful exploitation.
      Basically, it is to overwrite the memory area of vulnerable object by
      triggering allocation in other subsystems or modules and therefore
      getting a reference to the targeted memory location. It's usable on
      various types of vulnerablity including use after free (UAF), heap out-
      of-bound write and etc.
      
      There are (at least) two reasons why the heap can be sprayed: 1) generic
      slab caches are shared among different subsystems and modules, and
      2) dedicated slab caches could be merged with the generic ones.
      Currently these two factors cannot be prevented at a low cost: the first
      one is a widely used memory allocation mechanism, and shutting down slab
      merging completely via `slub_nomerge` would be overkill.
      
      To efficiently prevent heap spraying, we propose the following approach:
      to create multiple copies of generic slab caches that will never be
      merged, and random one of them will be used at allocation. The random
      selection is based on the address of code that calls `kmalloc()`, which
      means it is static at runtime (rather than dynamically determined at
      each time of allocation, which could be bypassed by repeatedly spraying
      in brute force). In other words, the randomness of cache selection will
      be with respect to the code address rather than time, i.e. allocations
      in different code paths would most likely pick different caches,
      although kmalloc() at each place would use the same cache copy whenever
      it is executed. In this way, the vulnerable object and memory allocated
      in other subsystems and modules will (most probably) be on different
      slab caches, which prevents the object from being sprayed.
      
      Meanwhile, the static random selection is further enhanced with a
      per-boot random seed, which prevents the attacker from finding a usable
      kmalloc that happens to pick the same cache with the vulnerable
      subsystem/module by analyzing the open source code. In other words, with
      the per-boot seed, the random selection is static during each time the
      system starts and runs, but not across different system startups.
      
      The overhead of performance has been tested on a 40-core x86 server by
      comparing the results of `perf bench all` between the kernels with and
      without this patch based on the latest linux-next kernel, which shows
      minor difference. A subset of benchmarks are listed below:
      
                      sched/  sched/  syscall/       mem/       mem/
                   messaging    pipe     basic     memcpy     memset
                       (sec)   (sec)     (sec)   (GB/sec)   (GB/sec)
      
      control1         0.019   5.459     0.733  15.258789  51.398026
      control2         0.019   5.439     0.730  16.009221  48.828125
      control3         0.019   5.282     0.735  16.009221  48.828125
      control_avg      0.019   5.393     0.733  15.759077  49.684759
      
      experiment1      0.019   5.374     0.741  15.500992  46.502976
      experiment2      0.019   5.440     0.746  16.276042  51.398026
      experiment3      0.019   5.242     0.752  15.258789  51.398026
      experiment_avg   0.019   5.352     0.746  15.678608  49.766343
      
      The overhead of memory usage was measured by executing `free` after boot
      on a QEMU VM with 1GB total memory, and as expected, it's positively
      correlated with # of cache copies:
      
                 control  4 copies  8 copies  16 copies
      
      total       969.8M    968.2M    968.2M     968.2M
      used         20.0M     21.9M     24.1M      26.7M
      free        936.9M    933.6M    931.4M     928.6M
      available   932.2M    928.8M    926.6M     923.9M
      Co-developed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarGONG, Ruiqi <gongruiqi@huaweicloud.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Acked-by: Dennis Zhou <dennis@kernel.org> # percpu
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      3c615294
  4. 14 Jul, 2023 2 commits
    • Vlastimil Babka's avatar
      mm/slub: remove freelist_dereference() · 1662b6c2
      Vlastimil Babka authored
      freelist_dereference() is a one-liner only used from get_freepointer().
      Remove it and make get_freepointer() call freelist_ptr_decode()
      directly to make the code easier to follow.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      1662b6c2
    • Vlastimil Babka's avatar
      mm/slub: remove redundant kasan_reset_tag() from freelist_ptr calculations · b06952cd
      Vlastimil Babka authored
      Commit d36a63a9 ("kasan, slub: fix more conflicts with
      CONFIG_SLAB_FREELIST_HARDENED") has introduced kasan_reset_tags() to
      freelist_ptr() encoding/decoding when CONFIG_SLAB_FREELIST_HARDENED is
      enabled to resolve issues when passing tagged or untagged pointers
      inconsistently would lead to incorrect calculations.
      
      Later, commit aa1ef4d7 ("kasan, mm: reset tags when accessing
      metadata") made sure all pointers have tags reset regardless of
      CONFIG_SLAB_FREELIST_HARDENED, because there was no other way to access
      the freepointer metadata safely with hw tag-based KASAN.
      
      Therefore the kasan_reset_tag() usage in freelist_ptr_encode()/decode()
      is now redundant, as all callers use kasan_reset_tag() unconditionally
      when constructing ptr_addr. Remove the redundant calls and simplify the
      code and remove obsolete comments.
      
      Also in freelist_ptr_encode() introduce an 'encoded' variable to make
      the lines shorter and make it similar to the _decode() one.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      b06952cd
  5. 11 Jul, 2023 1 commit
  6. 09 Jul, 2023 10 commits
  7. 08 Jul, 2023 24 commits