• Waiman Long's avatar
    mm: memcg/slab: properly set up gfp flags for objcg pointer array · 41eb5df1
    Waiman Long authored
    Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4.
    
    Since the merging of the new slab memory controller in v5.9, the page
    structure stores a pointer to objcg pointer array for slab pages.  When
    the slab has no used objects, it can be freed in free_slab() which will
    call kfree() to free the objcg pointer array in
    memcg_alloc_page_obj_cgroups().  If it happens that the objcg pointer
    array is the last used object in its slab, that slab may then be freed
    which may caused kfree() to be called again.
    
    With the right workload, the slab cache may be set up in a way that allows
    the recursive kfree() calling loop to nest deep enough to cause a kernel
    stack overflow and panic the system.  In fact, we have a reproducer that
    can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256
    and kmalloc-rcl-128 slabs with the following kfree() loop recursively
    called 74 times:
    
      [ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740]
    [<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741]
    [<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742]
    [<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I
    also found an issue on the allocation side.  If the objcg pointer array
    happen to come from the same slab or a circular dependency linkage is
    formed with multiple slabs, those affected slabs can never be freed again.
    
    This patch series addresses these two issues by introducing a new set of
    kmalloc-cg-<n> caches split from kmalloc-<n> caches.  The new set will
    only contain non-reclaimable and non-dma objects that are accounted in
    memory cgroups whereas the old set are now for unaccounted objects only.
    By making this split, all the objcg pointer arrays will come from the
    kmalloc-<n> caches, but those caches will never hold any objcg pointer
    array.  As a result, deeply nested kfree() call and the unfreeable slab
    problems are now gone.
    
    This patch (of 4):
    
    Since the merging of the new slab memory controller in v5.9, the page
    structure may store a pointer to obj_cgroup pointer array for slab pages.
    Currently, only the __GFP_ACCOUNT bit is masked off.  However, the array
    is not readily reclaimable and doesn't need to come from the DMA buffer.
    So those GFP bits should be masked off as well.
    
    Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure
    that it is consistently applied no matter where it is called.
    
    Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com
    Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com
    Fixes: 286e04b8 ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarRoman Gushchin <guro@fb.com>
    Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    41eb5df1
memcontrol.c 192 KB