1. 14 Oct, 2019 10 commits
    • Vlastimil Babka's avatar
      mm, compaction: fix wrong pfn handling in __reset_isolation_pfn() · a2e9a5af
      Vlastimil Babka authored
      Florian and Dave reported [1] a NULL pointer dereference in
      __reset_isolation_pfn().  While the exact cause is unclear, staring at
      the code revealed two bugs, which might be related.
      
      One bug is that if zone starts in the middle of pageblock, block_page
      might correspond to different pfn than block_pfn, and then the
      pfn_valid_within() checks will check different pfn's than those accessed
      via struct page.  This might result in acessing an unitialized page in
      CONFIG_HOLES_IN_ZONE configs.
      
      The other bug is that end_page refers to the first page of next
      pageblock and not last page of current pageblock.  The online and valid
      check is then wrong and with sections, the while (page < end_page) loop
      might wander off actual struct page arrays.
      
      [1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/
      
      Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz
      Fixes: 6b0868c8 ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarFlorian Weimer <fw@deneb.enyo.de>
      Reported-by: default avatarDave Chinner <david@fromorbit.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a2e9a5af
    • David Rientjes's avatar
      mm, hugetlb: allow hugepage allocations to reclaim as needed · 3f36d866
      David Rientjes authored
      Commit b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when
      compaction may not succeed") has chnaged the allocator to bail out from
      the allocator early to prevent from a potentially excessive memory
      reclaim.  __GFP_RETRY_MAYFAIL is designed to retry the allocation,
      reclaim and compaction loop as long as there is a reasonable chance to
      make forward progress.  Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at
      the INIT_COMPACT_PRIORITY compaction attempt gives this feedback.
      
      The most obvious affected subsystem is hugetlbfs which allocates huge
      pages based on an admin request (or via admin configured overcommit).  I
      have done a simple test which tries to allocate half of the memory for
      hugetlb pages while the memory is full of a clean page cache.  This is
      not an unusual situation because we try to cache as much of the memory
      as possible and sysctl/sysfs interface to allocate huge pages is there
      for flexibility to allocate hugetlb pages at any time.
      
      System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages
      after the memory is prefilled by a clean page cache:
      
        root@test1:~# cat hugetlb_test.sh
      
        set -x
        echo 0 > /proc/sys/vm/nr_hugepages
        echo 3 > /proc/sys/vm/drop_caches
        echo 1 > /proc/sys/vm/compact_memory
        dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10))
        TS=$(date +%s)
        echo 256 > /proc/sys/vm/nr_hugepages
        cat /proc/sys/vm/nr_hugepages
      
      The results for 2 consecutive runs on clean 5.3
      
        root@test1:~# sh hugetlb_test.sh
        + echo 0
        + echo 3
        + echo 1
        + dd if=/mnt/data/file-1G of=/dev/null bs=4096
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s
        + date +%s
        + TS=1569905284
        + echo 256
        + cat /proc/sys/vm/nr_hugepages
        256
        root@test1:~# sh hugetlb_test.sh
        + echo 0
        + echo 3
        + echo 1
        + dd if=/mnt/data/file-1G of=/dev/null bs=4096
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s
        + date +%s
        + TS=1569905311
        + echo 256
        + cat /proc/sys/vm/nr_hugepages
        256
      
      Now with b39d0ee2 applied
      
        root@test1:~# sh hugetlb_test.sh
        + echo 0
        + echo 3
        + echo 1
        + dd if=/mnt/data/file-1G of=/dev/null bs=4096
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s
        + date +%s
        + TS=1569905516
        + echo 256
        + cat /proc/sys/vm/nr_hugepages
        11
        root@test1:~# sh hugetlb_test.sh
        + echo 0
        + echo 3
        + echo 1
        + dd if=/mnt/data/file-1G of=/dev/null bs=4096
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s
        + date +%s
        + TS=1569905541
        + echo 256
        + cat /proc/sys/vm/nr_hugepages
        12
      
      The success rate went down by factor of 20!
      
      Although hugetlb allocation requests might fail and it is reasonable to
      expect them to under extremely fragmented memory or when the memory is
      under a heavy pressure but the above situation is not that case.
      
      Fix the regression by reverting back to the previous behavior for
      __GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for
      those requests.
      
      Mike said:
      
      : hugetlbfs allocations are commonly done via sysctl/sysfs shortly after
      : boot where this may not be as much of an issue.  However, I am aware of at
      : least three use cases where allocations are made after the system has been
      : up and running for quite some time:
      :
      : - DB reconfiguration.  If sysctl/sysfs fails to get required number of
      :   huge pages, system is rebooted to perform allocation after boot.
      :
      : - VM provisioning.  If unable get required number of huge pages, fall
      :   back to base pages.
      :
      : - An application that does not preallocate pool, but rather allocates
      :   pages at fault time for optimal NUMA locality.
      :
      : In all cases, I would expect b39d0ee2 to cause regressions and
      : noticable behavior changes.
      :
      : My quick/limited testing in
      : https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90bd7e@oracle.com
      : was insufficient.  It was also mentioned that if something like
      : b39d0ee2 went forward, I would like exemptions for __GFP_RETRY_MAYFAIL
      : requests as in this patch.
      
      [mhocko@suse.com: reworded changelog]
      Link: http://lkml.kernel.org/r/20191007075548.12456-1-mhocko@kernel.org
      Fixes: b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f36d866
    • Alexander Potapenko's avatar
      lib/test_meminit: add a kmem_cache_alloc_bulk() test · 03a9349a
      Alexander Potapenko authored
      Make sure allocations from kmem_cache_alloc_bulk() and
      kmem_cache_free_bulk() are properly initialized.
      
      Link: http://lkml.kernel.org/r/20191007091605.30530-2-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Thibaut Sautereau <thibaut@sautereau.fr>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03a9349a
    • Alexander Potapenko's avatar
      mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations · 0f181f9f
      Alexander Potapenko authored
      slab_alloc_node() already zeroed out the freelist pointer if
      init_on_free was on.  Thibaut Sautereau noticed that the same needs to
      be done for kmem_cache_alloc_bulk(), which performs the allocations
      separately.
      
      kmem_cache_alloc_bulk() is currently used in two places in the kernel,
      so this change is unlikely to have a major performance impact.
      
      SLAB doesn't require a similar change, as auto-initialization makes the
      allocator store the freelist pointers off-slab.
      
      Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com
      Fixes: 6471384a ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reported-by: default avatarThibaut Sautereau <thibaut@sautereau.fr>
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f181f9f
    • Eric Biggers's avatar
      lib/generic-radix-tree.c: add kmemleak annotations · 3c52b0af
      Eric Biggers authored
      Kmemleak is falsely reporting a leak of the slab allocation in
      sctp_stream_init_ext():
      
        BUG: memory leak
        unreferenced object 0xffff8881114f5d80 (size 96):
         comm "syz-executor934", pid 7160, jiffies 4294993058 (age 31.950s)
         hex dump (first 32 bytes):
           00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
           00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
         backtrace:
           [<00000000ce7a1326>] kmemleak_alloc_recursive  include/linux/kmemleak.h:55 [inline]
           [<00000000ce7a1326>] slab_post_alloc_hook mm/slab.h:439 [inline]
           [<00000000ce7a1326>] slab_alloc mm/slab.c:3326 [inline]
           [<00000000ce7a1326>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
           [<000000007abb7ac9>] kmalloc include/linux/slab.h:547 [inline]
           [<000000007abb7ac9>] kzalloc include/linux/slab.h:742 [inline]
           [<000000007abb7ac9>] sctp_stream_init_ext+0x2b/0xa0  net/sctp/stream.c:157
           [<0000000048ecb9c1>] sctp_sendmsg_to_asoc+0x946/0xa00  net/sctp/socket.c:1882
           [<000000004483ca2b>] sctp_sendmsg+0x2a8/0x990 net/sctp/socket.c:2102
           [...]
      
      But it's freed later.  Kmemleak misses the allocation because its
      pointer is stored in the generic radix tree sctp_stream::out, and the
      generic radix tree uses raw pages which aren't tracked by kmemleak.
      
      Fix this by adding the kmemleak hooks to the generic radix tree code.
      
      Link: http://lkml.kernel.org/r/20191004065039.727564-1-ebiggers@kernel.orgSigned-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reported-by: <syzbot+7f3b6b106be8dcdcdeec@syzkaller.appspotmail.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c52b0af
    • Qian Cai's avatar
      mm/slub: fix a deadlock in show_slab_objects() · e4f8e513
      Qian Cai authored
      A long time ago we fixed a similar deadlock in show_slab_objects() [1].
      However, it is apparently due to the commits like 01fb58bc ("slab:
      remove synchronous synchronize_sched() from memcg cache deactivation
      path") and 03afc0e2 ("slab: get_online_mems for
      kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
      just reading files in /sys/kernel/slab which will generate a lockdep
      splat below.
      
      Since the "mem_hotplug_lock" here is only to obtain a stable online node
      mask while racing with NUMA node hotplug, in the worst case, the results
      may me miscalculated while doing NUMA node hotplug, but they shall be
      corrected by later reads of the same files.
      
        WARNING: possible circular locking dependency detected
        ------------------------------------------------------
        cat/5224 is trying to acquire lock:
        ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
        show_slab_objects+0x94/0x3a8
      
        but task is already holding lock:
        b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (kn->count#45){++++}:
               lock_acquire+0x31c/0x360
               __kernfs_remove+0x290/0x490
               kernfs_remove+0x30/0x44
               sysfs_remove_dir+0x70/0x88
               kobject_del+0x50/0xb0
               sysfs_slab_unlink+0x2c/0x38
               shutdown_cache+0xa0/0xf0
               kmemcg_cache_shutdown_fn+0x1c/0x34
               kmemcg_workfn+0x44/0x64
               process_one_work+0x4f4/0x950
               worker_thread+0x390/0x4bc
               kthread+0x1cc/0x1e8
               ret_from_fork+0x10/0x18
      
        -> #1 (slab_mutex){+.+.}:
               lock_acquire+0x31c/0x360
               __mutex_lock_common+0x16c/0xf78
               mutex_lock_nested+0x40/0x50
               memcg_create_kmem_cache+0x38/0x16c
               memcg_kmem_cache_create_func+0x3c/0x70
               process_one_work+0x4f4/0x950
               worker_thread+0x390/0x4bc
               kthread+0x1cc/0x1e8
               ret_from_fork+0x10/0x18
      
        -> #0 (mem_hotplug_lock.rw_sem){++++}:
               validate_chain+0xd10/0x2bcc
               __lock_acquire+0x7f4/0xb8c
               lock_acquire+0x31c/0x360
               get_online_mems+0x54/0x150
               show_slab_objects+0x94/0x3a8
               total_objects_show+0x28/0x34
               slab_attr_show+0x38/0x54
               sysfs_kf_seq_show+0x198/0x2d4
               kernfs_seq_show+0xa4/0xcc
               seq_read+0x30c/0x8a8
               kernfs_fop_read+0xa8/0x314
               __vfs_read+0x88/0x20c
               vfs_read+0xd8/0x10c
               ksys_read+0xb0/0x120
               __arm64_sys_read+0x54/0x88
               el0_svc_handler+0x170/0x240
               el0_svc+0x8/0xc
      
        other info that might help us debug this:
      
        Chain exists of:
          mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(kn->count#45);
                                       lock(slab_mutex);
                                       lock(kn->count#45);
          lock(mem_hotplug_lock.rw_sem);
      
         *** DEADLOCK ***
      
        3 locks held by cat/5224:
         #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
         #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
         #2: b8ff009693eee398 (kn->count#45){++++}, at:
        kernfs_seq_start+0x44/0xf0
      
        stack backtrace:
        Call trace:
         dump_backtrace+0x0/0x248
         show_stack+0x20/0x2c
         dump_stack+0xd0/0x140
         print_circular_bug+0x368/0x380
         check_noncircular+0x248/0x250
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc
      
      I think it is important to mention that this doesn't expose the
      show_slab_objects to use-after-free.  There is only a single path that
      might really race here and that is the slab hotplug notifier callback
      __kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
      doesn't really destroy kmem_cache_node data structures.
      
      [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html
      
      [akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
      Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
      Fixes: 01fb58bc ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
      Fixes: 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4f8e513
    • Vlastimil Babka's avatar
      mm, page_owner: rename flag indicating that page is allocated · fdf3bf80
      Vlastimil Babka authored
      Commit 37389167 ("mm, page_owner: keep owner info when freeing the
      page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page
      is tracked as being allocated.  Kirril suggested naming it
      PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat
      loaded term for a page".
      
      Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Suggested-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Walter Wu <walter-zh.wu@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fdf3bf80
    • Vlastimil Babka's avatar
      mm, page_owner: decouple freeing stack trace from debug_pagealloc · 0fe9a448
      Vlastimil Babka authored
      Commit 8974558f ("mm, page_owner, debug_pagealloc: save and dump
      freeing stack trace") enhanced page_owner to also store freeing stack
      trace, when debug_pagealloc is also enabled.  KASAN would also like to
      do this [1] to improve error reports to debug e.g. UAF issues.
      
      Kirill has suggested that the freeing stack trace saving should be also
      possible to be enabled separately from KASAN or debug_pagealloc, i.e.
      with an extra boot option.  Qian argued that we have enough options
      already, and avoiding the extra overhead is not worth the complications
      in the case of a debugging option.  Kirill noted that the extra stack
      handle in struct page_owner requires 0.1% of memory.
      
      This patch therefore enables free stack saving whenever page_owner is
      enabled, regardless of whether debug_pagealloc or KASAN is also enabled.
      KASAN kernels booted with page_owner=on will thus benefit from the
      improved error reports.
      
      [1] https://bugzilla.kernel.org/show_bug.cgi?id=203967
      
      [vbabka@suse.cz: v3]
        Link: http://lkml.kernel.org/r/20191007091808.7096-3-vbabka@suse.cz
      Link: http://lkml.kernel.org/r/20190930122916.14969-3-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarQian Cai <cai@lca.pw>
      Suggested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Suggested-by: default avatarWalter Wu <walter-zh.wu@mediatek.com>
      Suggested-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Suggested-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Suggested-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0fe9a448
    • Vlastimil Babka's avatar
      mm, page_owner: fix off-by-one error in __set_page_owner_handle() · 5556cfe8
      Vlastimil Babka authored
      Patch series "followups to debug_pagealloc improvements through
      page_owner", v3.
      
      These are followups to [1] which made it to Linus meanwhile.  Patches 1
      and 3 are based on Kirill's review, patch 2 on KASAN request [2].  It
      would be nice if all of this made it to 5.4 with [1] already there (or
      at least Patch 1).
      
      This patch (of 3):
      
      As noted by Kirill, commit 7e2f2a0c ("mm, page_owner: record page
      owner for each subpage") has introduced an off-by-one error in
      __set_page_owner_handle() when looking up page_ext for subpages.  As a
      result, the head page page_owner info is set twice, while for the last
      tail page, it's not set at all.
      
      Fix this and also make the code more efficient by advancing the page_ext
      pointer we already have, instead of calling lookup_page_ext() for each
      subpage.  Since the full size of struct page_ext is not known at compile
      time, we can't use a simple page_ext++ statement, so introduce a
      page_ext_next() inline function for that.
      
      Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
      Fixes: 7e2f2a0c ("mm, page_owner: record page owner for each subpage")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Reported-by: default avatarMiles Chen <miles.chen@mediatek.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Walter Wu <walter-zh.wu@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5556cfe8
    • Catalin Marinas's avatar
      kmemleak: Do not corrupt the object_list during clean-up · 2abd839a
      Catalin Marinas authored
      In case of an error (e.g. memory pool too small), kmemleak disables
      itself and cleans up the already allocated metadata objects. However, if
      this happens early before the RCU callback mechanism is available,
      put_object() skips call_rcu() and frees the object directly. This is not
      safe with the RCU list traversal in __kmemleak_do_cleanup().
      
      Change the list traversal in __kmemleak_do_cleanup() to
      list_for_each_entry_safe() and remove the rcu_read_{lock,unlock} since
      the kmemleak is already disabled at this point. In addition, avoid an
      unnecessary metadata object rb-tree look-up since it already has the
      struct kmemleak_object pointer.
      
      Fixes: c5665868 ("mm: kmemleak: use the memory pool for early allocations")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reported-by: default avatarMarc Dionne <marc.c.dionne@gmail.com>
      Reported-by: default avatarTed Ts'o <tytso@mit.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2abd839a
  2. 13 Oct, 2019 16 commits
  3. 12 Oct, 2019 14 commits
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · da940012
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small char/misc driver fixes for 5.4-rc3.
      
        Nothing huge here. Some binder driver fixes (although it is still
        being discussed if these all fix the reported issues or not, so more
        might be coming later), some mei device ids and fixes, and a google
        firmware driver bugfix that fixes a regression, as well as some other
        tiny fixes.
      
        All have been in linux-next with no reported issues"
      
      * tag 'char-misc-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        firmware: google: increment VPD key_len properly
        w1: ds250x: Fix build error without CRC16
        virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr
        binder: Fix comment headers on binder_alloc_prepare_to_free()
        binder: prevent UAF read in print_binder_transaction_log_entry()
        misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach
        mei: avoid FW version request on Ibex Peak and earlier
        mei: me: add comet point (lake) LP device ids
      da940012
    • Linus Torvalds's avatar
      Merge tag 'staging-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 9cbc6348
      Linus Torvalds authored
      Pull staging/IIO driver fixes from Greg KH:
       "Here are some staging and IIO driver fixes for 5.4-rc3.
      
        The "biggest" thing here is a removal of the fbtft device and flexfb
        code as they have been abandoned by their authors and are no longer
        needed for that hardware.
      
        Other than that, the usual amount of staging driver and iio driver
        fixes for reported issues, and some speakup sysfs file documentation,
        which has been long awaited for.
      
        All have been in linux-next with no reported issues"
      
      * tag 'staging-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (32 commits)
        iio: Fix an undefied reference error in noa1305_probe
        iio: light: opt3001: fix mutex unlock race
        iio: adc: ad799x: fix probe error handling
        iio: light: add missing vcnl4040 of_compatible
        iio: light: fix vcnl4000 devicetree hooks
        iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller
        iio: adc: axp288: Override TS pin bias current for some models
        iio: imu: adis16400: fix memory leak
        iio: imu: adis16400: release allocated memory on failure
        iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
        iio: adc: stm32-adc: move registers definitions
        iio: accel: adxl372: Perform a reset at start up
        iio: accel: adxl372: Fix push to buffers lost samples
        iio: accel: adxl372: Fix/remove limitation for FIFO samples
        iio: adc: hx711: fix bug in sampling of data
        staging: vt6655: Fix memory leak in vt6655_probe
        staging: exfat: Use kvzalloc() instead of kzalloc() for exfat_sb_info
        Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc
        staging: speakup: document sysfs attributes
        staging: rtl8188eu: fix HighestRate check in odm_ARFBRefresh_8188E()
        ...
      9cbc6348
    • Linus Torvalds's avatar
      Merge tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 82c87e7d
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are some small tty and serial driver fixes for 5.4-rc3 that
        resolve a number of reported issues and regressions.
      
        None of these are huge, full details are in the shortlog. There's also
        a MAINTAINERS update that I think you might have already taken in your
        tree already, but git should handle that merge easily.
      
        All have been in linux-next with no reported issues"
      
      * tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        MAINTAINERS: kgdb: Add myself as a reviewer for kgdb/kdb
        tty: serial: imx: Use platform_get_irq_optional() for optional IRQs
        serial: fix kernel-doc warning in comments
        serial: 8250_omap: Fix gpio check for auto RTS/CTS
        serial: mctrl_gpio: Check for NULL pointer
        tty: serial: fsl_lpuart: Fix lpuart_flush_buffer()
        tty: serial: Fix PORT_LINFLEXUART definition
        tty: n_hdlc: fix build on SPARC
        serial: uartps: Fix uartps_major handling
        serial: uartlite: fix exit path null pointer
        tty: serial: linflexuart: Fix magic SysRq handling
        serial: sh-sci: Use platform_get_irq_optional() for optional interrupts
        dt-bindings: serial: sh-sci: Document r8a774b1 bindings
        serial/sifive: select SERIAL_EARLYCON
        tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
        tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
      82c87e7d
    • Linus Torvalds's avatar
      Merge tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 6c90bbd0
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a lot of small USB driver fixes for 5.4-rc3.
      
        syzbot has stepped up its testing of the USB driver stack, now able to
        trigger fun race conditions between disconnect and probe functions.
        Because of that we have a lot of fixes in here from Johan and others
        fixing these reported issues that have been around since almost all
        time.
      
        We also are just deleting the rio500 driver, making all of the syzbot
        bugs found in it moot as it turns out no one has been using it for
        years as there is a userspace version that is being used instead.
      
        There are also a number of other small fixes in here, all resolving
        reported issues or regressions.
      
        All have been in linux-next without any reported issues"
      
      * tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (65 commits)
        USB: yurex: fix NULL-derefs on disconnect
        USB: iowarrior: use pr_err()
        USB: iowarrior: drop redundant iowarrior mutex
        USB: iowarrior: drop redundant disconnect mutex
        USB: iowarrior: fix use-after-free after driver unbind
        USB: iowarrior: fix use-after-free on release
        USB: iowarrior: fix use-after-free on disconnect
        USB: chaoskey: fix use-after-free on release
        USB: adutux: fix use-after-free on release
        USB: ldusb: fix NULL-derefs on driver unbind
        USB: legousbtower: fix use-after-free on release
        usb: cdns3: Fix for incorrect DMA mask.
        usb: cdns3: fix cdns3_core_init_role()
        usb: cdns3: gadget: Fix full-speed mode
        USB: usb-skeleton: drop redundant in-urb check
        USB: usb-skeleton: fix use-after-free after driver unbind
        USB: usb-skeleton: fix NULL-deref on disconnect
        usb:cdns3: Fix for CV CH9 running with g_zero driver.
        usb: dwc3: Remove dev_err() on platform_get_irq() failure
        usb: dwc3: Switch to platform_get_irq_byname_optional()
        ...
      6c90bbd0
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 328fefad
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Two fixes: a guest-cputime accounting fix, and a cgroup bandwidth
        quota precision fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/vtime: Fix guest/system mis-accounting on task switch
        sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
      328fefad
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 465a7e29
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Mostly tooling fixes, but also a couple of updates for new Intel
        models (which are technically hw-enablement, but to users it's a fix
        to perf behavior on those new CPUs - hope this is fine), an AUX
        inheritance fix, event time-sharing fix, and a fix for lost non-perf
        NMI events on AMD systems"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
        perf/x86/cstate: Add Tiger Lake CPU support
        perf/x86/msr: Add Tiger Lake CPU support
        perf/x86/intel: Add Tiger Lake CPU support
        perf/x86/cstate: Update C-state counters for Ice Lake
        perf/x86/msr: Add new CPU model numbers for Ice Lake
        perf/x86/cstate: Add Comet Lake CPU support
        perf/x86/msr: Add Comet Lake CPU support
        perf/x86/intel: Add Comet Lake CPU support
        perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
        perf/core: Fix corner case in perf_rotate_context()
        perf/core: Rework memory accounting in perf_mmap()
        perf/core: Fix inheritance of aux_output groups
        perf annotate: Don't return -1 for error when doing BPF disassembly
        perf annotate: Return appropriate error code for allocation failures
        perf annotate: Fix arch specific ->init() failure errors
        perf annotate: Propagate the symbol__annotate() error return
        perf annotate: Fix the signedness of failure returns
        perf annotate: Propagate perf_env__arch() error
        perf evsel: Fall back to global 'perf_env' in perf_evsel__env()
        perf tools: Propagate get_cpuid() error
        ...
      465a7e29
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9b4e40c8
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Misc EFI fixes all across the map: CPER error report fixes, fixes to
        TPM event log parsing, fix for a kexec hang, a Sparse fix and other
        fixes"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/tpm: Fix sanity check of unsigned tbl_size being less than zero
        efi/x86: Do not clean dummy variable in kexec path
        efi: Make unexported efi_rci2_sysfs_init() static
        efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing
        efi/tpm: Don't traverse an event log with no events
        efi/tpm: Don't access event->count when it isn't mapped
        efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
        efi/cper: Fix endianness of PCIe class code
      9b4e40c8
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fcb45a28
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware
        guest support fix when built under Clang, and new CPU model number
        definitions"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Add Comet Lake to the Intel CPU models header
        lib/string: Make memzero_explicit() inline instead of external
        x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
        x86/asm: Fix MWAITX C-state hint value
      fcb45a28
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e9ec3588
      Linus Torvalds authored
      Pull x86 license tag fixlets from Ingo Molnar:
       "Fix a couple of SPDX tags in x86 headers to follow the canonical
        pattern"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Use the correct SPDX License Identifier in headers
      e9ec3588
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 48acba98
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
      
       - Fix several bugs in the breakpoint trap handler
      
       - Drop an unnecessary loop around calls to preempt_schedule_irq()
      
      * tag 'riscv/for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: entry: Remove unneeded need_resched() loop
        riscv: Correct the handling of unexpected ebreak in do_trap_break()
        riscv: avoid sending a SIGTRAP to a user thread trapped in WARN()
        riscv: avoid kernel hangs when trapped in BUG()
      48acba98
    • Linus Torvalds's avatar
      Merge tag 'mips_fixes_5.4_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 63f9bff5
      Linus Torvalds authored
      Pull MIPS fixes from Paul Burton:
      
       - Build fixes for CONFIG_OPTIMIZE_INLINING=y builds in which the
         compiler may choose not to inline __xchg() & __cmpxchg().
      
       - A build fix for Loongson configurations with GCC 9.x.
      
       - Expose some extra HWCAP bits to indicate support for various
         instruction set extensions to userland.
      
       - Fix bad stack access in firmware handling code for old SNI
         RM200/300/400 machines.
      
      * tag 'mips_fixes_5.4_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: Disable Loongson MMI instructions for kernel build
        MIPS: elf_hwcap: Export userspace ASEs
        MIPS: fw: sni: Fix out of bounds init of o32 stack
        MIPS: include: Mark __xchg as __always_inline
        MIPS: include: Mark __cmpxchg as __always_inline
      63f9bff5
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · db60a5a0
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fix a kernel crash in spufs_create_root() on Cell machines, since the
        new mount API went in.
      
        Fix a regression in our KVM code caused by our recent PCR changes.
      
        Avoid a warning message about a failing hypervisor API on systems that
        don't have that API.
      
        A couple of minor build fixes.
      
        Thanks to: Alexey Kardashevskiy, Alistair Popple, Desnes A. Nunes do
        Rosario, Emmanuel Nicolet, Jordan Niethe, Laurent Dufour, Stephen
        Rothwell"
      
      * tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        spufs: fix a crash in spufs_create_root()
        powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_host
        selftests/powerpc: Fix compile error on tlbie_test due to newer gcc
        powerpc/pseries: Remove confusing warning message.
        powerpc/64s/radix: Fix build failure with RADIX_MMU=n
      db60a5a0
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 680b5b3c
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - correct panic handling when running as a Xen guest
      
       - cleanup the Xen grant driver to remove printing a pointer being
         always NULL
      
       - remove a soon to be wrong call of of_dma_configure()
      
      * tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: Stop abusing DT of_dma_configure API
        xen/grant-table: remove unnecessary printing
        x86/xen: Return from panic notifier
      680b5b3c
    • Linus Torvalds's avatar
      Merge tag 's390-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · f154988a
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix virtio-ccw DMA regression
      
       - Fix compiler warnings in uaccess
      
      * tag 's390-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/uaccess: avoid (false positive) compiler warnings
        s390/cio: fix virtio-ccw DMA without PV
      f154988a