1. 15 Jun, 2019 23 commits
    • Qian Cai's avatar
      mm/slab.c: fix an infinite loop in leaks_show() · ff75e0ec
      Qian Cai authored
      [ Upstream commit 745e1014 ]
      
      "cat /proc/slab_allocators" could hang forever on SMP machines with
      kmemleak or object debugging enabled due to other CPUs running do_drain()
      will keep making kmemleak_object or debug_objects_cache dirty and unable
      to escape the first loop in leaks_show(),
      
      do {
      	set_store_user_clean(cachep);
      	drain_cpu_caches(cachep);
      	...
      
      } while (!is_store_user_clean(cachep));
      
      For example,
      
      do_drain
        slabs_destroy
          slab_destroy
            kmem_cache_free
              __cache_free
                ___cache_free
                  kmemleak_free_recursive
                    delete_object_full
                      __delete_object
                        put_object
                          free_object_rcu
                            kmem_cache_free
                              cache_free_debugcheck --> dirty kmemleak_object
      
      One approach is to check cachep->name and skip both kmemleak_object and
      debug_objects_cache in leaks_show().  The other is to set store_user_clean
      after drain_cpu_caches() which leaves a small window between
      drain_cpu_caches() and set_store_user_clean() where per-CPU caches could
      be dirty again lead to slightly wrong information has been stored but
      could also speed up things significantly which sounds like a good
      compromise.  For example,
      
       # cat /proc/slab_allocators
       0m42.778s # 1st approach
       0m0.737s  # 2nd approach
      
      [akpm@linux-foundation.org: tweak comment]
      Link: http://lkml.kernel.org/r/20190411032635.10325-1-cai@lca.pw
      Fixes: d31676df ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ff75e0ec
    • Yue Hu's avatar
      mm/cma_debug.c: fix the break condition in cma_maxchunk_get() · ecadffd7
      Yue Hu authored
      [ Upstream commit f0fd5050 ]
      
      If not find zero bit in find_next_zero_bit(), it will return the size
      parameter passed in, so the start bit should be compared with bitmap_maxno
      rather than cma->count.  Although getting maxchunk is working fine due to
      zero value of order_per_bit currently, the operation will be stuck if
      order_per_bit is set as non-zero.
      
      Link: http://lkml.kernel.org/r/20190319092734.276-1-zbestahu@gmail.comSigned-off-by: default avatarYue Hu <huyue2@yulong.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Safonov <d.safonov@partner.samsung.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ecadffd7
    • Aneesh Kumar K.V's avatar
      mm: page_mkclean vs MADV_DONTNEED race · 397b073f
      Aneesh Kumar K.V authored
      [ Upstream commit 024eee0e ]
      
      MADV_DONTNEED is handled with mmap_sem taken in read mode.  We call
      page_mkclean without holding mmap_sem.
      
      MADV_DONTNEED implies that pages in the region are unmapped and subsequent
      access to the pages in that range is handled as a new page fault.  This
      implies that if we don't have parallel access to the region when
      MADV_DONTNEED is run we expect those range to be unallocated.
      
      w.r.t page_mkclean() we need to make sure that we don't break the
      MADV_DONTNEED semantics.  MADV_DONTNEED check for pmd_none without holding
      pmd_lock.  This implies we skip the pmd if we temporarily mark pmd none.
      Avoid doing that while marking the page clean.
      
      Keep the sequence same for dax too even though we don't support
      MADV_DONTNEED for dax mapping
      
      The bug was noticed by code review and I didn't observe any failures w.r.t
      test run.  This is similar to
      
      commit 58ceeb6b
      Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Date:   Thu Apr 13 14:56:26 2017 -0700
      
          thp: fix MADV_DONTNEED vs. MADV_FREE race
      
      commit ced10803
      Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Date:   Thu Apr 13 14:56:20 2017 -0700
      
          thp: fix MADV_DONTNEED vs. numa balancing race
      
      Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc:"Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      397b073f
    • Yue Hu's avatar
      mm/cma.c: fix the bitmap status to show failed allocation reason · 9e3c2b37
      Yue Hu authored
      [ Upstream commit 2b59e01a ]
      
      Currently one bit in cma bitmap represents number of pages rather than
      one page, cma->count means cma size in pages. So to find available pages
      via find_next_zero_bit()/find_next_bit() we should use cma size not in
      pages but in bits although current free pages number is correct due to
      zero value of order_per_bit. Once order_per_bit is changed the bitmap
      status will be incorrect.
      
      The size input in cma_debug_show_areas() is not correct.  It will
      affect the available pages at some position to debug the failure issue.
      
      This is an example with order_per_bit = 1
      
      Before this change:
      [    4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages
      
      After this change:
      [    4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages
      
      Obviously the bitmap status before is incorrect.
      
      Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.comSigned-off-by: default avatarYue Hu <huyue2@yulong.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9e3c2b37
    • Baoquan He's avatar
      mm/memory_hotplug.c: fix the wrong usage of N_HIGH_MEMORY · e39f78af
      Baoquan He authored
      [ Upstream commit d3ba3ae1 ]
      
      In node_states_check_changes_online(), N_HIGH_MEMORY is used to substitute
      ZONE_HIGHMEM directly.  This is not right.  N_HIGH_MEMORY is to mark the
      memory state of node.  Here zone index is checked, which should be
      compared with 'ZONE_HIGHMEM' accordingly.
      
      Replace it with ZONE_HIGHMEM.
      
      This is a code cleanup - no known runtime effects.
      
      Link: http://lkml.kernel.org/r/20190320080732.14933-1-bhe@redhat.com
      Fixes: 8efe33f4 ("mm/memory_hotplug.c: simplify node_states_check_changes_online")
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e39f78af
    • Qian Cai's avatar
      mm/compaction.c: fix an undefined behaviour · f3918ea4
      Qian Cai authored
      [ Upstream commit dd7ef7bd ]
      
      In a low-memory situation, cc->fast_search_fail can keep increasing as it
      is unable to find an available page to isolate in
      fast_isolate_freepages().  As the result, it could trigger an error below,
      so just compare with the maximum bits can be shifted first.
      
      UBSAN: Undefined behaviour in mm/compaction.c:1160:30
      shift exponent 64 is too large for 64-bit type 'unsigned long'
      CPU: 131 PID: 1308 Comm: kcompactd1 Kdump: loaded Tainted: G
      W    L    5.0.0+ #17
      Call trace:
       dump_backtrace+0x0/0x450
       show_stack+0x20/0x2c
       dump_stack+0xc8/0x14c
       __ubsan_handle_shift_out_of_bounds+0x7e8/0x8c4
       compaction_alloc+0x2344/0x2484
       unmap_and_move+0xdc/0x1dbc
       migrate_pages+0x274/0x1310
       compact_zone+0x26ec/0x43bc
       kcompactd+0x15b8/0x1a24
       kthread+0x374/0x390
       ret_from_fork+0x10/0x18
      
      [akpm@linux-foundation.org: code cleanup]
      Link: http://lkml.kernel.org/r/20190320203338.53367-1-cai@lca.pw
      Fixes: 70b44595 ("mm, compaction: use free lists to quickly locate a migration source")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f3918ea4
    • Christoph Hellwig's avatar
      initramfs: free initrd memory if opening /initrd.image fails · a52e81f3
      Christoph Hellwig authored
      [ Upstream commit 54c7a891 ]
      
      Patch series "initramfs tidyups".
      
      I've spent some time chasing down behavior in initramfs and found
      plenty of opportunity to improve the code.  A first stab on that is
      contained in this series.
      
      This patch (of 7):
      
      We free the initrd memory for all successful or error cases except for the
      case where opening /initrd.image fails, which looks like an oversight.
      
      Steven said:
      
      : This also changes the behaviour when CONFIG_INITRAMFS_FORCE is enabled
      : - specifically it means that the initrd is freed (previously it was
      : ignored and never freed).  But that seems like reasonable behaviour and
      : the previous behaviour looks like another oversight.
      
      Link: http://lkml.kernel.org/r/20190213174621.29297-3-hch@lst.deSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a52e81f3
    • Yue Hu's avatar
      mm/cma.c: fix crash on CMA allocation if bitmap allocation fails · 7df45315
      Yue Hu authored
      [ Upstream commit 1df3a339 ]
      
      f022d8cb ("mm: cma: Don't crash on allocation if CMA area can't be
      activated") fixes the crash issue when activation fails via setting
      cma->count as 0, same logic exists if bitmap allocation fails.
      
      Link: http://lkml.kernel.org/r/20190325081309.6004-1-zbestahu@gmail.comSigned-off-by: default avatarYue Hu <huyue2@yulong.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7df45315
    • Linxu Fang's avatar
      mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE · a72a64c9
      Linxu Fang authored
      [ Upstream commit 299c83dc ]
      
      342332e6 ("mm/page_alloc.c: introduce kernelcore=mirror option") and
      later patches rewrote the calculation of node spanned pages.
      
      e506b996 ("mem-hotplug: fix node spanned pages when we have a movable
      node"), but the current code still has problems,
      
      When we have a node with only zone_movable and the node id is not zero,
      the size of node spanned pages is double added.
      
      That's because we have an empty normal zone, and zone_start_pfn or
      zone_end_pfn is not between arch_zone_lowest_possible_pfn and
      arch_zone_highest_possible_pfn, so we need to use clamp to constrain the
      range just like the commit <96e907d1> (bootmem: Reimplement
      __absent_pages_in_range() using for_each_mem_pfn_range()).
      
      e.g.
      Zone ranges:
        DMA      [mem 0x0000000000001000-0x0000000000ffffff]
        DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
        Normal   [mem 0x0000000100000000-0x000000023fffffff]
      Movable zone start for each node
        Node 0: 0x0000000100000000
        Node 1: 0x0000000140000000
      Early memory node ranges
        node   0: [mem 0x0000000000001000-0x000000000009efff]
        node   0: [mem 0x0000000000100000-0x00000000bffdffff]
        node   0: [mem 0x0000000100000000-0x000000013fffffff]
        node   1: [mem 0x0000000140000000-0x000000023fffffff]
      
      node 0 DMA	spanned:0xfff   present:0xf9e   absent:0x61
      node 0 DMA32	spanned:0xff000 present:0xbefe0	absent:0x40020
      node 0 Normal	spanned:0	present:0	absent:0
      node 0 Movable	spanned:0x40000 present:0x40000 absent:0
      On node 0 totalpages(node_present_pages): 1048446
      node_spanned_pages:1310719
      node 1 DMA	spanned:0	    present:0		absent:0
      node 1 DMA32	spanned:0	    present:0		absent:0
      node 1 Normal	spanned:0x100000    present:0x100000	absent:0
      node 1 Movable	spanned:0x100000    present:0x100000	absent:0
      On node 1 totalpages(node_present_pages): 2097152
      node_spanned_pages:2097152
      Memory: 6967796K/12582392K available (16388K kernel code, 3686K rwdata,
      4468K rodata, 2160K init, 10444K bss, 5614596K reserved, 0K
      cma-reserved)
      
      It shows that the current memory of node 1 is double added.
      After this patch, the problem is fixed.
      
      node 0 DMA	spanned:0xfff   present:0xf9e   absent:0x61
      node 0 DMA32	spanned:0xff000 present:0xbefe0	absent:0x40020
      node 0 Normal	spanned:0	present:0	absent:0
      node 0 Movable	spanned:0x40000 present:0x40000 absent:0
      On node 0 totalpages(node_present_pages): 1048446
      node_spanned_pages:1310719
      node 1 DMA	spanned:0	    present:0		absent:0
      node 1 DMA32	spanned:0	    present:0		absent:0
      node 1 Normal	spanned:0	    present:0		absent:0
      node 1 Movable	spanned:0x100000    present:0x100000	absent:0
      On node 1 totalpages(node_present_pages): 1048576
      node_spanned_pages:1048576
      memory: 6967796K/8388088K available (16388K kernel code, 3686K rwdata,
      4468K rodata, 2160K init, 10444K bss, 1420292K reserved, 0K
      cma-reserved)
      
      Link: http://lkml.kernel.org/r/1554178276-10372-1-git-send-email-fanglinxu@huawei.comSigned-off-by: default avatarLinxu Fang <fanglinxu@huawei.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a72a64c9
    • David Hildenbrand's avatar
      mm/memory_hotplug: release memory resource after arch_remove_memory() · 8fc09bd2
      David Hildenbrand authored
      [ Upstream commit d9eb1417 ]
      
      Patch series "mm/memory_hotplug: Better error handling when removing
      memory", v1.
      
      Error handling when removing memory is somewhat messed up right now.  Some
      errors result in warnings, others are completely ignored.  Memory unplug
      code can essentially not deal with errors properly as of now.
      remove_memory() will never fail.
      
      We have basically two choices:
      1. Allow arch_remov_memory() and friends to fail, propagating errors via
         remove_memory(). Might be problematic (e.g. DIMMs consisting of multiple
         pieces added/removed separately).
      2. Don't allow the functions to fail, handling errors in a nicer way.
      
      It seems like most errors that can theoretically happen are really corner
      cases and mostly theoretical (e.g.  "section not valid").  However e.g.
      aborting removal of sections while all callers simply continue in case of
      errors is not nice.
      
      If we can gurantee that removal of memory always works (and WARN/skip in
      case of theoretical errors so we can figure out what is going on), we can
      go ahead and implement better error handling when adding memory.
      
      E.g. via add_memory():
      
      arch_add_memory()
      ret = do_stuff()
      if (ret) {
      	arch_remove_memory();
      	goto error;
      }
      
      Handling here that arch_remove_memory() might fail is basically
      impossible.  So I suggest, let's avoid reporting errors while removing
      memory, warning on theoretical errors instead and continuing instead of
      aborting.
      
      This patch (of 4):
      
      __add_pages() doesn't add the memory resource, so __remove_pages()
      shouldn't remove it.  Let's factor it out.  Especially as it is a special
      case for memory used as system memory, added via add_memory() and friends.
      
      We now remove the resource after removing the sections instead of doing it
      the other way around.  I don't think this change is problematic.
      
      add_memory()
      	register memory resource
      	arch_add_memory()
      
      remove_memory
      	arch_remove_memory()
      	release memory resource
      
      While at it, explain why we ignore errors and that it only happeny if
      we remove memory in a different granularity as we added it.
      
      [david@redhat.com: fix printk warning]
        Link: http://lkml.kernel.org/r/20190417120204.6997-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20190409100148.24703-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8fc09bd2
    • Mike Kravetz's avatar
      hugetlbfs: on restore reserve error path retain subpool reservation · eb27e2ac
      Mike Kravetz authored
      [ Upstream commit 0919e1b6 ]
      
      When a huge page is allocated, PagePrivate() is set if the allocation
      consumed a reservation.  When freeing a huge page, PagePrivate is checked.
      If set, it indicates the reservation should be restored.  PagePrivate
      being set at free huge page time mostly happens on error paths.
      
      When huge page reservations are created, a check is made to determine if
      the mapping is associated with an explicitly mounted filesystem.  If so,
      pages are also reserved within the filesystem.  The default action when
      freeing a huge page is to decrement the usage count in any associated
      explicitly mounted filesystem.  However, if the reservation is to be
      restored the reservation/use count within the filesystem should not be
      decrementd.  Otherwise, a subsequent page allocation and free for the same
      mapping location will cause the file filesystem usage to go 'negative'.
      
      Filesystem                         Size  Used Avail Use% Mounted on
      nodev                              4.0G -4.0M  4.1G    - /opt/hugepool
      
      To fix, when freeing a huge page do not adjust filesystem usage if
      PagePrivate() is set to indicate the reservation should be restored.
      
      I did not cc stable as the problem has been around since reserves were
      added to hugetlbfs and nobody has noticed.
      
      Link: http://lkml.kernel.org/r/20190328234704.27083-2-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eb27e2ac
    • Jérôme Glisse's avatar
      mm/hmm: select mmu notifier when selecting HMM · 1d453434
      Jérôme Glisse authored
      [ Upstream commit 734fb899 ]
      
      To avoid random config build issue, select mmu notifier when HMM is
      selected.  In any cases when HMM get selected it will be by users that
      will also wants the mmu notifier.
      
      Link: http://lkml.kernel.org/r/20190403193318.16478-2-jglisse@redhat.comSigned-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1d453434
    • Arnd Bergmann's avatar
      ARM: prevent tracing IPI_CPU_BACKTRACE · dd39a7e5
      Arnd Bergmann authored
      [ Upstream commit be167862 ]
      
      Patch series "compiler: allow all arches to enable
      CONFIG_OPTIMIZE_INLINING", v3.
      
      This patch (of 11):
      
      When function tracing for IPIs is enabled, we get a warning for an
      overflow of the ipi_types array with the IPI_CPU_BACKTRACE type as
      triggered by raise_nmi():
      
        arch/arm/kernel/smp.c: In function 'raise_nmi':
        arch/arm/kernel/smp.c:489:2: error: array subscript is above array bounds [-Werror=array-bounds]
          trace_ipi_raise(target, ipi_types[ipinr]);
      
      This is a correct warning as we actually overflow the array here.
      
      This patch raise_nmi() to call __smp_cross_call() instead of
      smp_cross_call(), to avoid calling into ftrace.  For clarification, I'm
      also adding a two new code comments describing how this one is special.
      
      The warning appears to have shown up after commit e7273ff4 ("ARM:
      8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI"), which changed the
      number assignment from '15' to '8', but as far as I can tell has existed
      since the IPI tracepoints were first introduced.  If we decide to
      backport this patch to stable kernels, we probably need to backport
      e7273ff4 as well.
      
      [yamada.masahiro@socionext.com: rebase on v5.1-rc1]
      Link: http://lkml.kernel.org/r/20190423034959.13525-2-yamada.masahiro@socionext.com
      Fixes: e7273ff4 ("ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI")
      Fixes: 365ec7b1 ("ARM: add IPI tracepoints") # v3.17
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Boris Brezillon <bbrezillon@kernel.org>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Marek Vasut <marek.vasut@gmail.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dd39a7e5
    • Mike Rapoport's avatar
      mm/mprotect.c: fix compilation warning because of unused 'mm' variable · 966a46f8
      Mike Rapoport authored
      [ Upstream commit 94393c78 ]
      
      Since 0cbe3e26 ("mm: update ptep_modify_prot_start/commit to take
      vm_area_struct as arg") the only place that uses the local 'mm' variable
      in change_pte_range() is the call to set_pte_at().
      
      Many architectures define set_pte_at() as macro that does not use the 'mm'
      parameter, which generates the following compilation warning:
      
       CC      mm/mprotect.o
      mm/mprotect.c: In function 'change_pte_range':
      mm/mprotect.c:42:20: warning: unused variable 'mm' [-Wunused-variable]
        struct mm_struct *mm = vma->vm_mm;
                          ^~
      
      Fix it by passing vma->mm to set_pte_at() and dropping the local 'mm'
      variable in change_pte_range().
      
      [liu.song.a23@gmail.com: fix missed conversions]
        Link: http://lkml.kernel.org/r/CAPhsuW6wcQgYLHNdBdw6m0YiR4RWsS4XzfpSKU7wBLLeOCTbpw@mail.gmail.comLink: http://lkml.kernel.org/r/1557305432-4940-1-git-send-email-rppt@linux.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      966a46f8
    • Guenter Roeck's avatar
      drm/pl111: Initialize clock spinlock early · fa70cd41
      Guenter Roeck authored
      [ Upstream commit 3e01ae26 ]
      
      The following warning is seen on systems with broken clock divider.
      
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      CPU: 0 PID: 1 Comm: swapper Not tainted 5.1.0-09698-g1fb3b526 #1
      Hardware name: ARM Integrator/CP (Device Tree)
      [<c0011be8>] (unwind_backtrace) from [<c000ebb8>] (show_stack+0x10/0x18)
      [<c000ebb8>] (show_stack) from [<c07d3fd0>] (dump_stack+0x18/0x24)
      [<c07d3fd0>] (dump_stack) from [<c0060d48>] (register_lock_class+0x674/0x6f8)
      [<c0060d48>] (register_lock_class) from [<c005de2c>]
      	(__lock_acquire+0x68/0x2128)
      [<c005de2c>] (__lock_acquire) from [<c0060408>] (lock_acquire+0x110/0x21c)
      [<c0060408>] (lock_acquire) from [<c07f755c>] (_raw_spin_lock+0x34/0x48)
      [<c07f755c>] (_raw_spin_lock) from [<c0536c8c>]
      	(pl111_display_enable+0xf8/0x5fc)
      [<c0536c8c>] (pl111_display_enable) from [<c0502f54>]
      	(drm_atomic_helper_commit_modeset_enables+0x1ec/0x244)
      
      Since commit eedd6033 ("drm/pl111: Support variants with broken clock
      divider"), the spinlock is not initialized if the clock divider is broken.
      Initialize it earlier to fix the problem.
      
      Fixes: eedd6033 ("drm/pl111: Support variants with broken clock divider")
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/1557758781-23586-1-git-send-email-linux@roeck-us.netSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      fa70cd41
    • Brian Masney's avatar
      drm/msm: correct attempted NULL pointer dereference in debugfs · 5b655c4c
      Brian Masney authored
      [ Upstream commit 90f94660 ]
      
      msm_gem_describe() would attempt to dereference a NULL pointer via the
      address space pointer when no IOMMU is present. Correct this by adding
      the appropriate check.
      Signed-off-by: default avatarBrian Masney <masneyb@onstation.org>
      Fixes: 575f0485 ("drm/msm: Clean up and enhance the output of the 'gem' debugfs node")
      Signed-off-by: default avatarSean Paul <seanpaul@chromium.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190513234105.7531-2-masneyb@onstation.orgSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      5b655c4c
    • Li Rongqing's avatar
      ipc: prevent lockup on alloc_msg and free_msg · 52dd41c1
      Li Rongqing authored
      [ Upstream commit d6a2946a ]
      
      msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
      enabled on large memory SMP systems, the pages initialization can take a
      long time, if msgctl10 requests a huge block memory, and it will block
      rcu scheduler, so release cpu actively.
      
      After adding schedule() in free_msg, free_msg can not be called when
      holding spinlock, so adding msg to a tmp list, and free it out of
      spinlock
      
        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
        rcu:     (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
        msgctl10        R  running task    21608 32505   2794 0x00000082
        Call Trace:
         preempt_schedule_irq+0x4c/0xb0
         retint_kernel+0x1b/0x2d
        RIP: 0010:__is_insn_slot_addr+0xfb/0x250
        Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff <41> c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
        RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
        RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
        RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
        RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
        R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
        R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
         kernel_text_address+0xc1/0x100
         __kernel_text_address+0xe/0x30
         unwind_get_return_address+0x2f/0x50
         __save_stack_trace+0x92/0x100
         create_object+0x380/0x650
         __kmalloc+0x14c/0x2b0
         load_msg+0x38/0x1a0
         do_msgsnd+0x19e/0xcf0
         do_syscall_64+0x117/0x400
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
        rcu:     (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
        msgctl10        R  running task    21608 32170  32155 0x00000082
        Call Trace:
         preempt_schedule_irq+0x4c/0xb0
         retint_kernel+0x1b/0x2d
        RIP: 0010:lock_acquire+0x4d/0x340
        Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 <48> c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
        RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
        RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
        RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
        R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
         is_bpf_text_address+0x32/0xe0
         kernel_text_address+0xec/0x100
         __kernel_text_address+0xe/0x30
         unwind_get_return_address+0x2f/0x50
         __save_stack_trace+0x92/0x100
         save_stack+0x32/0xb0
         __kasan_slab_free+0x130/0x180
         kfree+0xfa/0x2d0
         free_msg+0x24/0x50
         do_msgrcv+0x508/0xe60
         do_syscall_64+0x117/0x400
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Davidlohr said:
       "So after releasing the lock, the msg rbtree/list is empty and new
        calls will not see those in the newly populated tmp_msg list, and
        therefore they cannot access the delayed msg freeing pointers, which
        is good. Also the fact that the node_cache is now freed before the
        actual messages seems to be harmless as this is wanted for
        msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
        info->lock the thing is freed anyway so it should not change things"
      
      Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.comSigned-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarZhang Yu <zhangyu31@baidu.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      52dd41c1
    • Christian Brauner's avatar
      sysctl: return -EINVAL if val violates minmax · 33bc00a5
      Christian Brauner authored
      [ Upstream commit e260ad01 ]
      
      Currently when userspace gives us a values that overflow e.g.  file-max
      and other callers of __do_proc_doulongvec_minmax() we simply ignore the
      new value and leave the current value untouched.
      
      This can be problematic as it gives the illusion that the limit has
      indeed be bumped when in fact it failed.  This commit makes sure to
      return EINVAL when an overflow is detected.  Please note that this is a
      userspace facing change.
      
      Link: http://lkml.kernel.org/r/20190210203943.8227-4-christian@brauner.ioSigned-off-by: default avatarChristian Brauner <christian@brauner.io>
      Acked-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Waiman Long <longman@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      33bc00a5
    • Hou Tao's avatar
      fs/fat/file.c: issue flush after the writeback of FAT · cafeee0e
      Hou Tao authored
      [ Upstream commit bd8309de ]
      
      fsync() needs to make sure the data & meta-data of file are persistent
      after the return of fsync(), even when a power-failure occurs later.  In
      the case of fat-fs, the FAT belongs to the meta-data of file, so we need
      to issue a flush after the writeback of FAT instead before.
      
      Also bail out early when any stage of fsync fails.
      
      Link: http://lkml.kernel.org/r/20190409030158.136316-1-houtao1@huawei.comSigned-off-by: default avatarHou Tao <houtao1@huawei.com>
      Acked-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cafeee0e
    • Kangjie Lu's avatar
      rapidio: fix a NULL pointer dereference when create_workqueue() fails · e87dffa3
      Kangjie Lu authored
      [ Upstream commit 23015b22 ]
      
      In case create_workqueue fails, the fix releases resources and returns
      -ENOMEM to avoid NULL pointer dereference.
      Signed-off-by: default avatarKangjie Lu <kjlu@umn.edu>
      Acked-by: default avatarAlexandre Bounine <alex.bou9@gmail.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e87dffa3
    • Jonas Karlman's avatar
      media: rockchip/vpu: Add missing dont_use_autosuspend() calls · 520a9d95
      Jonas Karlman authored
      [ Upstream commit 5c5b90f5 ]
      
      Those calls are needed to restore a clean PM state when the probe fails
      or when the driver is unloaded such that future ->probe() calls can
      initialize runtime PM again.
      Signed-off-by: default avatarJonas Karlman <jonas@kwiboo.se>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      520a9d95
    • Jonas Karlman's avatar
      media: rockchip/vpu: Fix/re-order probe-error/remove path · 8f9d40d5
      Jonas Karlman authored
      [ Upstream commit fc8670d1 ]
      
      media_device_cleanup() and v4l2_m2m_unregister_media_controller() were
      missing in the probe error path.
      While at it, re-order calls in the remove path to unregister/cleanup
      things in the reverse order they were initialized/registered.
      Signed-off-by: default avatarJonas Karlman <jonas@kwiboo.se>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8f9d40d5
    • Dave Airlie's avatar
      Revert "drm: allow render capable master with DRM_AUTH ioctls" · c2d2804b
      Dave Airlie authored
      [ Upstream commit dbb92471 ]
      
      This reverts commit 8059add0.
      
      This commit while seemingly a good idea, breaks a radv check,
      for a node being master because something succeeds where it failed
      before now.
      
      Apply the Linus rule, revert early and try again, we don't break
      userspace.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c2d2804b
  2. 11 Jun, 2019 17 commits