1. 19 Nov, 2019 1 commit
  2. 11 Nov, 2019 1 commit
  3. 07 Nov, 2019 4 commits
  4. 06 Nov, 2019 3 commits
  5. 05 Nov, 2019 2 commits
  6. 04 Nov, 2019 7 commits
  7. 31 Oct, 2019 4 commits
  8. 24 Oct, 2019 1 commit
  9. 22 Oct, 2019 1 commit
  10. 21 Oct, 2019 1 commit
  11. 17 Oct, 2019 1 commit
  12. 14 Oct, 2019 14 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 5bc52f64
      Linus Torvalds authored
      Merge more fixes from Andrew Morton:
       "The usual shower of hotfixes and some followups to the recently merged
        page_owner enhancements"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
        mm/slab.c: fix kernel-doc warning for __ksize()
        xarray.h: fix kernel-doc warning
        bitmap.h: fix kernel-doc warning and typo
        fs/fs-writeback.c: fix kernel-doc warning
        fs/libfs.c: fix kernel-doc warning
        fs/direct-io.c: fix kernel-doc warning
        mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
        mm, hugetlb: allow hugepage allocations to reclaim as needed
        lib/test_meminit: add a kmem_cache_alloc_bulk() test
        mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
        lib/generic-radix-tree.c: add kmemleak annotations
        mm/slub: fix a deadlock in show_slab_objects()
        mm, page_owner: rename flag indicating that page is allocated
        mm, page_owner: decouple freeing stack trace from debug_pagealloc
        mm, page_owner: fix off-by-one error in __set_page_owner_handle()
      5bc52f64
    • Jane Chu's avatar
      mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once · 3d7fed4a
      Jane Chu authored
      Mmap /dev/dax more than once, then read the poison location using
      address from one of the mappings.  The other mappings due to not having
      the page mapped in will cause SIGKILLs delivered to the process.
      SIGKILL succeeds over SIGBUS, so user process loses the opportunity to
      handle the UE.
      
      Although one may add MAP_POPULATE to mmap(2) to work around the issue,
      MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
      isn't always an option.
      
      Details -
      
        ndctl inject-error --block=10 --count=1 namespace6.0
      
        ./read_poison -x dax6.0 -o 5120 -m 2
        mmaped address 0x7f5bb6600000
        mmaped address 0x7f3cf3600000
        doing local read at address 0x7f3cf3601400
        Killed
      
      Console messages in instrumented kernel -
      
        mce: Uncorrected hardware memory error in user-access at edbe201400
        Memory failure: tk->addr = 7f5bb6601000
        Memory failure: address edbe201: call dev_pagemap_mapping_shift
        dev_pagemap_mapping_shift: page edbe201: no PUD
        Memory failure: tk->size_shift == 0
        Memory failure: Unable to find user space address edbe201 in read_poison
        Memory failure: tk->addr = 7f3cf3601000
        Memory failure: address edbe201: call dev_pagemap_mapping_shift
        Memory failure: tk->size_shift = 21
        Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
          => to deliver SIGKILL
        Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
          => to deliver SIGBUS
      
      Link: http://lkml.kernel.org/r/1565112345-28754-3-git-send-email-jane.chu@oracle.comSigned-off-by: default avatarJane Chu <jane.chu@oracle.com>
      Suggested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d7fed4a
    • Randy Dunlap's avatar
      mm/slab.c: fix kernel-doc warning for __ksize() · 87bf4f71
      Randy Dunlap authored
      Fix kernel-doc warning in mm/slab.c:
      
        mm/slab.c:4215: warning: Function parameter or member 'objp' not described in '__ksize'
      
      Also add Return: documentation section for this function.
      
      Link: http://lkml.kernel.org/r/68c9fd7d-f09e-d376-e292-c7b2bdf1774d@infradead.org
      Fixes: 10d1f8cb ("mm/slab: refactor common ksize KASAN logic into slab_common.c")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      87bf4f71
    • Randy Dunlap's avatar
      xarray.h: fix kernel-doc warning · 13bea898
      Randy Dunlap authored
      Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:
      
        include/linux/xarray.h:232: WARNING: Unexpected indentation.
      
      Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
      Fixes: a3e4d3f9 ("XArray: Redesign xa_alloc API")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13bea898
    • Randy Dunlap's avatar
      bitmap.h: fix kernel-doc warning and typo · 2a7e582f
      Randy Dunlap authored
      Fix kernel-doc warning in <linux/bitmap.h>:
      
        include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'
      
      Also fix small typo (bitnaps).
      
      Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org
      Fixes: b9fa6442 ("cpumask: Implement cpumask_or_equal()")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a7e582f
    • Randy Dunlap's avatar
      fs/fs-writeback.c: fix kernel-doc warning · b46ec1da
      Randy Dunlap authored
      Fix kernel-doc warning in fs/fs-writeback.c:
      
        fs/fs-writeback.c:913: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id'
      
      Link: http://lkml.kernel.org/r/756645ac-0ce8-d47e-d30a-04d9e4923a4f@infradead.org
      Fixes: d62241c7 ("writeback, memcg: Implement cgroup_writeback_by_id()")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b46ec1da
    • Randy Dunlap's avatar
      fs/libfs.c: fix kernel-doc warning · 8e88bfba
      Randy Dunlap authored
      Fix kernel-doc warning in fs/libfs.c:
      
        fs/libfs.c:496: warning: Excess function parameter 'available' description in 'simple_write_end'
      
      Link: http://lkml.kernel.org/r/5fc9d70b-e377-0ec9-066a-970d49579041@infradead.org
      Fixes: ad2a722f ("libfs: Open code simple_commit_write into only user")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Boaz Harrosh <boazh@netapp.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e88bfba
    • Randy Dunlap's avatar
      fs/direct-io.c: fix kernel-doc warning · c70d868f
      Randy Dunlap authored
      Fix kernel-doc warning in fs/direct-io.c:
      
        fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'
      
      Also, don't mark this function as having kernel-doc notation since it is
      not exported.
      
      Link: http://lkml.kernel.org/r/97908511-4328-4a56-17fe-f43a1d7aa470@infradead.org
      Fixes: 6d544bb4 ("dio: centralize completion in dio_complete()")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Zach Brown <zab@zabbo.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c70d868f
    • Vlastimil Babka's avatar
      mm, compaction: fix wrong pfn handling in __reset_isolation_pfn() · a2e9a5af
      Vlastimil Babka authored
      Florian and Dave reported [1] a NULL pointer dereference in
      __reset_isolation_pfn().  While the exact cause is unclear, staring at
      the code revealed two bugs, which might be related.
      
      One bug is that if zone starts in the middle of pageblock, block_page
      might correspond to different pfn than block_pfn, and then the
      pfn_valid_within() checks will check different pfn's than those accessed
      via struct page.  This might result in acessing an unitialized page in
      CONFIG_HOLES_IN_ZONE configs.
      
      The other bug is that end_page refers to the first page of next
      pageblock and not last page of current pageblock.  The online and valid
      check is then wrong and with sections, the while (page < end_page) loop
      might wander off actual struct page arrays.
      
      [1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/
      
      Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz
      Fixes: 6b0868c8 ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarFlorian Weimer <fw@deneb.enyo.de>
      Reported-by: default avatarDave Chinner <david@fromorbit.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a2e9a5af
    • David Rientjes's avatar
      mm, hugetlb: allow hugepage allocations to reclaim as needed · 3f36d866
      David Rientjes authored
      Commit b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when
      compaction may not succeed") has chnaged the allocator to bail out from
      the allocator early to prevent from a potentially excessive memory
      reclaim.  __GFP_RETRY_MAYFAIL is designed to retry the allocation,
      reclaim and compaction loop as long as there is a reasonable chance to
      make forward progress.  Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at
      the INIT_COMPACT_PRIORITY compaction attempt gives this feedback.
      
      The most obvious affected subsystem is hugetlbfs which allocates huge
      pages based on an admin request (or via admin configured overcommit).  I
      have done a simple test which tries to allocate half of the memory for
      hugetlb pages while the memory is full of a clean page cache.  This is
      not an unusual situation because we try to cache as much of the memory
      as possible and sysctl/sysfs interface to allocate huge pages is there
      for flexibility to allocate hugetlb pages at any time.
      
      System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages
      after the memory is prefilled by a clean page cache:
      
        root@test1:~# cat hugetlb_test.sh
      
        set -x
        echo 0 > /proc/sys/vm/nr_hugepages
        echo 3 > /proc/sys/vm/drop_caches
        echo 1 > /proc/sys/vm/compact_memory
        dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10))
        TS=$(date +%s)
        echo 256 > /proc/sys/vm/nr_hugepages
        cat /proc/sys/vm/nr_hugepages
      
      The results for 2 consecutive runs on clean 5.3
      
        root@test1:~# sh hugetlb_test.sh
        + echo 0
        + echo 3
        + echo 1
        + dd if=/mnt/data/file-1G of=/dev/null bs=4096
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s
        + date +%s
        + TS=1569905284
        + echo 256
        + cat /proc/sys/vm/nr_hugepages
        256
        root@test1:~# sh hugetlb_test.sh
        + echo 0
        + echo 3
        + echo 1
        + dd if=/mnt/data/file-1G of=/dev/null bs=4096
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s
        + date +%s
        + TS=1569905311
        + echo 256
        + cat /proc/sys/vm/nr_hugepages
        256
      
      Now with b39d0ee2 applied
      
        root@test1:~# sh hugetlb_test.sh
        + echo 0
        + echo 3
        + echo 1
        + dd if=/mnt/data/file-1G of=/dev/null bs=4096
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s
        + date +%s
        + TS=1569905516
        + echo 256
        + cat /proc/sys/vm/nr_hugepages
        11
        root@test1:~# sh hugetlb_test.sh
        + echo 0
        + echo 3
        + echo 1
        + dd if=/mnt/data/file-1G of=/dev/null bs=4096
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s
        + date +%s
        + TS=1569905541
        + echo 256
        + cat /proc/sys/vm/nr_hugepages
        12
      
      The success rate went down by factor of 20!
      
      Although hugetlb allocation requests might fail and it is reasonable to
      expect them to under extremely fragmented memory or when the memory is
      under a heavy pressure but the above situation is not that case.
      
      Fix the regression by reverting back to the previous behavior for
      __GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for
      those requests.
      
      Mike said:
      
      : hugetlbfs allocations are commonly done via sysctl/sysfs shortly after
      : boot where this may not be as much of an issue.  However, I am aware of at
      : least three use cases where allocations are made after the system has been
      : up and running for quite some time:
      :
      : - DB reconfiguration.  If sysctl/sysfs fails to get required number of
      :   huge pages, system is rebooted to perform allocation after boot.
      :
      : - VM provisioning.  If unable get required number of huge pages, fall
      :   back to base pages.
      :
      : - An application that does not preallocate pool, but rather allocates
      :   pages at fault time for optimal NUMA locality.
      :
      : In all cases, I would expect b39d0ee2 to cause regressions and
      : noticable behavior changes.
      :
      : My quick/limited testing in
      : https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90bd7e@oracle.com
      : was insufficient.  It was also mentioned that if something like
      : b39d0ee2 went forward, I would like exemptions for __GFP_RETRY_MAYFAIL
      : requests as in this patch.
      
      [mhocko@suse.com: reworded changelog]
      Link: http://lkml.kernel.org/r/20191007075548.12456-1-mhocko@kernel.org
      Fixes: b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f36d866
    • Alexander Potapenko's avatar
      lib/test_meminit: add a kmem_cache_alloc_bulk() test · 03a9349a
      Alexander Potapenko authored
      Make sure allocations from kmem_cache_alloc_bulk() and
      kmem_cache_free_bulk() are properly initialized.
      
      Link: http://lkml.kernel.org/r/20191007091605.30530-2-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Thibaut Sautereau <thibaut@sautereau.fr>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03a9349a
    • Alexander Potapenko's avatar
      mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations · 0f181f9f
      Alexander Potapenko authored
      slab_alloc_node() already zeroed out the freelist pointer if
      init_on_free was on.  Thibaut Sautereau noticed that the same needs to
      be done for kmem_cache_alloc_bulk(), which performs the allocations
      separately.
      
      kmem_cache_alloc_bulk() is currently used in two places in the kernel,
      so this change is unlikely to have a major performance impact.
      
      SLAB doesn't require a similar change, as auto-initialization makes the
      allocator store the freelist pointers off-slab.
      
      Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com
      Fixes: 6471384a ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reported-by: default avatarThibaut Sautereau <thibaut@sautereau.fr>
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f181f9f
    • Eric Biggers's avatar
      lib/generic-radix-tree.c: add kmemleak annotations · 3c52b0af
      Eric Biggers authored
      Kmemleak is falsely reporting a leak of the slab allocation in
      sctp_stream_init_ext():
      
        BUG: memory leak
        unreferenced object 0xffff8881114f5d80 (size 96):
         comm "syz-executor934", pid 7160, jiffies 4294993058 (age 31.950s)
         hex dump (first 32 bytes):
           00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
           00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
         backtrace:
           [<00000000ce7a1326>] kmemleak_alloc_recursive  include/linux/kmemleak.h:55 [inline]
           [<00000000ce7a1326>] slab_post_alloc_hook mm/slab.h:439 [inline]
           [<00000000ce7a1326>] slab_alloc mm/slab.c:3326 [inline]
           [<00000000ce7a1326>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
           [<000000007abb7ac9>] kmalloc include/linux/slab.h:547 [inline]
           [<000000007abb7ac9>] kzalloc include/linux/slab.h:742 [inline]
           [<000000007abb7ac9>] sctp_stream_init_ext+0x2b/0xa0  net/sctp/stream.c:157
           [<0000000048ecb9c1>] sctp_sendmsg_to_asoc+0x946/0xa00  net/sctp/socket.c:1882
           [<000000004483ca2b>] sctp_sendmsg+0x2a8/0x990 net/sctp/socket.c:2102
           [...]
      
      But it's freed later.  Kmemleak misses the allocation because its
      pointer is stored in the generic radix tree sctp_stream::out, and the
      generic radix tree uses raw pages which aren't tracked by kmemleak.
      
      Fix this by adding the kmemleak hooks to the generic radix tree code.
      
      Link: http://lkml.kernel.org/r/20191004065039.727564-1-ebiggers@kernel.orgSigned-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reported-by: <syzbot+7f3b6b106be8dcdcdeec@syzkaller.appspotmail.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c52b0af
    • Qian Cai's avatar
      mm/slub: fix a deadlock in show_slab_objects() · e4f8e513
      Qian Cai authored
      A long time ago we fixed a similar deadlock in show_slab_objects() [1].
      However, it is apparently due to the commits like 01fb58bc ("slab:
      remove synchronous synchronize_sched() from memcg cache deactivation
      path") and 03afc0e2 ("slab: get_online_mems for
      kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
      just reading files in /sys/kernel/slab which will generate a lockdep
      splat below.
      
      Since the "mem_hotplug_lock" here is only to obtain a stable online node
      mask while racing with NUMA node hotplug, in the worst case, the results
      may me miscalculated while doing NUMA node hotplug, but they shall be
      corrected by later reads of the same files.
      
        WARNING: possible circular locking dependency detected
        ------------------------------------------------------
        cat/5224 is trying to acquire lock:
        ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
        show_slab_objects+0x94/0x3a8
      
        but task is already holding lock:
        b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (kn->count#45){++++}:
               lock_acquire+0x31c/0x360
               __kernfs_remove+0x290/0x490
               kernfs_remove+0x30/0x44
               sysfs_remove_dir+0x70/0x88
               kobject_del+0x50/0xb0
               sysfs_slab_unlink+0x2c/0x38
               shutdown_cache+0xa0/0xf0
               kmemcg_cache_shutdown_fn+0x1c/0x34
               kmemcg_workfn+0x44/0x64
               process_one_work+0x4f4/0x950
               worker_thread+0x390/0x4bc
               kthread+0x1cc/0x1e8
               ret_from_fork+0x10/0x18
      
        -> #1 (slab_mutex){+.+.}:
               lock_acquire+0x31c/0x360
               __mutex_lock_common+0x16c/0xf78
               mutex_lock_nested+0x40/0x50
               memcg_create_kmem_cache+0x38/0x16c
               memcg_kmem_cache_create_func+0x3c/0x70
               process_one_work+0x4f4/0x950
               worker_thread+0x390/0x4bc
               kthread+0x1cc/0x1e8
               ret_from_fork+0x10/0x18
      
        -> #0 (mem_hotplug_lock.rw_sem){++++}:
               validate_chain+0xd10/0x2bcc
               __lock_acquire+0x7f4/0xb8c
               lock_acquire+0x31c/0x360
               get_online_mems+0x54/0x150
               show_slab_objects+0x94/0x3a8
               total_objects_show+0x28/0x34
               slab_attr_show+0x38/0x54
               sysfs_kf_seq_show+0x198/0x2d4
               kernfs_seq_show+0xa4/0xcc
               seq_read+0x30c/0x8a8
               kernfs_fop_read+0xa8/0x314
               __vfs_read+0x88/0x20c
               vfs_read+0xd8/0x10c
               ksys_read+0xb0/0x120
               __arm64_sys_read+0x54/0x88
               el0_svc_handler+0x170/0x240
               el0_svc+0x8/0xc
      
        other info that might help us debug this:
      
        Chain exists of:
          mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(kn->count#45);
                                       lock(slab_mutex);
                                       lock(kn->count#45);
          lock(mem_hotplug_lock.rw_sem);
      
         *** DEADLOCK ***
      
        3 locks held by cat/5224:
         #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
         #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
         #2: b8ff009693eee398 (kn->count#45){++++}, at:
        kernfs_seq_start+0x44/0xf0
      
        stack backtrace:
        Call trace:
         dump_backtrace+0x0/0x248
         show_stack+0x20/0x2c
         dump_stack+0xd0/0x140
         print_circular_bug+0x368/0x380
         check_noncircular+0x248/0x250
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc
      
      I think it is important to mention that this doesn't expose the
      show_slab_objects to use-after-free.  There is only a single path that
      might really race here and that is the slab hotplug notifier callback
      __kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
      doesn't really destroy kmem_cache_node data structures.
      
      [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html
      
      [akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
      Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
      Fixes: 01fb58bc ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
      Fixes: 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4f8e513