1. 04 Jun, 2020 39 commits
    • Baoquan He's avatar
      mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it · 86aaf255
      Baoquan He authored
      Patch series "improvements about lowmem_reserve and /proc/zoneinfo", v2.
      
      This patch (of 3):
      
      When people write to /proc/sys/vm/lowmem_reserve_ratio to change
      sysctl_lowmem_reserve_ratio[], setup_per_zone_lowmem_reserve() is called
      to recalculate all ->lowmem_reserve[] for each zone of all nodes as below:
      
      static void setup_per_zone_lowmem_reserve(void)
      {
      ...
      	for_each_online_pgdat(pgdat) {
      		for (j = 0; j < MAX_NR_ZONES; j++) {
      			...
      			while (idx) {
      				...
      				if (sysctl_lowmem_reserve_ratio[idx] < 1) {
      					sysctl_lowmem_reserve_ratio[idx] = 0;
      					lower_zone->lowmem_reserve[j] = 0;
                                      } else {
      				...
      			}
      		}
      	}
      }
      
      Meanwhile, here, sysctl_lowmem_reserve_ratio[idx] will be tuned if its
      value is smaller than '1'.  As we know, sysctl_lowmem_reserve_ratio[] is
      set for zone without regarding to which node it belongs to.  That means
      the tuning will be done on all nodes, even though it has been done in the
      first node.
      
      And the tuning will be done too even when init_per_zone_wmark_min() calls
      setup_per_zone_lowmem_reserve(), where actually nobody tries to change
      sysctl_lowmem_reserve_ratio[].
      
      So now move the tuning into lowmem_reserve_ratio_sysctl_handler(), to make
      code logic more reasonable.
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David Rientjes <rientjes@google.com>
      Link: http://lkml.kernel.org/r/20200402140113.3696-1-bhe@redhat.com
      Link: http://lkml.kernel.org/r/20200402140113.3696-2-bhe@redhat.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86aaf255
    • Baoquan He's avatar
      mm/page_alloc.c: remove unused free_bootmem_with_active_regions · 4ca7be24
      Baoquan He authored
      Since commit 397dc00e ("mips: sgi-ip27: switch from DISCONTIGMEM
      to SPARSEMEM"), the last caller of free_bootmem_with_active_regions() was
      gone.  Now no user calls it any more.
      
      Let's remove it.
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Link: http://lkml.kernel.org/r/20200402143455.5145-1-bhe@redhat.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ca7be24
    • Roman Gushchin's avatar
      mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations · 16867664
      Roman Gushchin authored
      Currently a cma area is barely used by the page allocator because it's
      used only as a fallback from movable, however kswapd tries hard to make
      sure that the fallback path isn't used.
      
      This results in a system evicting memory and pushing data into swap, while
      lots of CMA memory is still available.  This happens despite the fact that
      alloc_contig_range is perfectly capable of moving any movable allocations
      out of the way of an allocation.
      
      To effectively use the cma area let's alter the rules: if the zone has
      more free cma pages than the half of total free pages in the zone, use cma
      pageblocks first and fallback to movable blocks in the case of failure.
      
      [guro@fb.com: ifdef the cma-specific code]
        Link: http://lkml.kernel.org/r/20200311225832.GA178154@carbon.DHCP.thefacebook.comCo-developed-by: default avatarRik van Riel <riel@surriel.com>
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Link: http://lkml.kernel.org/r/20200306150102.3e77354b@imladris.surriel.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16867664
    • Wei Yang's avatar
      mm/page_alloc.c: extract check_[new|free]_page_bad() common part to page_bad_reason() · 58b7f119
      Wei Yang authored
      We share similar code in check_[new|free]_page_bad() to get the page's bad
      reason.
      
      Let's extract it and reduce code duplication.
      Signed-off-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Link: http://lkml.kernel.org/r/20200411220357.9636-6-richard.weiyang@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58b7f119
    • Wei Yang's avatar
      mm/page_alloc.c: rename free_pages_check() to check_free_page() · 534fe5e3
      Wei Yang authored
      free_pages_check() is the counterpart of check_new_page().  Rename it to
      use the same naming convention.
      Signed-off-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Link: http://lkml.kernel.org/r/20200411220357.9636-5-richard.weiyang@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      534fe5e3
    • Wei Yang's avatar
      mm/page_alloc.c: rename free_pages_check_bad() to check_free_page_bad() · 0d0c48a2
      Wei Yang authored
      free_pages_check_bad() is the counterpart of check_new_page_bad().  Rename
      it to use the same naming convention.
      Signed-off-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Link: http://lkml.kernel.org/r/20200411220357.9636-4-richard.weiyang@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0d0c48a2
    • Wei Yang's avatar
      mm/page_alloc.c: bad_flags is not necessary for bad_page() · 82a3241a
      Wei Yang authored
      After commit 5b57b8f2 ("mm/debug.c: always print flags in
      dump_page()"), page->flags is always printed for a bad page.  It is not
      necessary to have bad_flags any more.
      Suggested-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Link: http://lkml.kernel.org/r/20200411220357.9636-3-richard.weiyang@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82a3241a
    • Wei Yang's avatar
      mm/page_alloc.c: bad_[reason|flags] is not necessary when PageHWPoison · 833d8a42
      Wei Yang authored
      Patch series "mm/page_alloc.c: cleanup on check page", v3.
      
      This patchset does some cleanup related to check page.
      
      1. Remove unnecessary bad_reason assignment
      2. Remove bad_flags to bad_page()
      3. Rename function for naming convention
      4. Extract common part to check page
      
      Thanks for suggestions from David Rientjes and Anshuman Khandual.
      
      This patch (of 5):
      
      Since function returns directly, bad_[reason|flags] is not used any where.
      And move this to the first.
      
      This is a following cleanup for commit e570f56c ("mm:
      check_new_page_bad() directly returns in __PG_HWPOISON case")
      Signed-off-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David Rientjes <rientjes@google.com>
      Link: http://lkml.kernel.org/r/20200411220357.9636-2-richard.weiyang@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      833d8a42
    • Mike Rapoport's avatar
      docs/vm: update memory-models documentation · 237e506c
      Mike Rapoport authored
      To reflect the updates to free_area_init() family of functions.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-22-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      237e506c
    • Mike Rapoport's avatar
      mm: simplify find_min_pfn_with_active_regions() · 8a1b25fe
      Mike Rapoport authored
      find_min_pfn_with_active_regions() calls find_min_pfn_for_node() with nid
      parameter set to MAX_NUMNODES.  This makes the find_min_pfn_for_node()
      traverse all memblock memory regions although the first PFN in the system
      can be easily found with memblock_start_of_DRAM().
      
      Use memblock_start_of_DRAM() in find_min_pfn_with_active_regions() and drop
      now unused find_min_pfn_for_node().
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-21-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a1b25fe
    • Mike Rapoport's avatar
      mm: clean up free_area_init_node() and its helpers · 854e8848
      Mike Rapoport authored
      free_area_init_node() now always uses memblock info and the zone PFN
      limits so it does not need the backwards compatibility functions to
      calculate the zone spanned and absent pages.  The removal of the compat_
      versions of zone_{abscent,spanned}_pages_in_node() in turn, makes
      zone_size and zhole_size parameters unused.
      
      The node_start_pfn is determined by get_pfn_range_for_nid(), so there is
      no need to pass it to free_area_init_node().
      
      As a result, the only required parameter to free_area_init_node() is the
      node ID, all the rest are removed along with no longer used
      compat_zone_{abscent,spanned}_pages_in_node() helpers.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-20-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      854e8848
    • Mike Rapoport's avatar
      mm: rename free_area_init_node() to free_area_init_memoryless_node() · bc9331a1
      Mike Rapoport authored
      free_area_init_node() is only used by x86 to initialize a memory-less
      nodes.  Make its name reflect this and drop all the function parameters
      except node ID as they are anyway zero.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-19-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc9331a1
    • Mike Rapoport's avatar
      mm: free_area_init: allow defining max_zone_pfn in descending order · 51930df5
      Mike Rapoport authored
      Some architectures (e.g.  ARC) have the ZONE_HIGHMEM zone below the
      ZONE_NORMAL.  Allowing free_area_init() parse max_zone_pfn array even it
      is sorted in descending order allows using free_area_init() on such
      architectures.
      
      Add top -> down traversal of max_zone_pfn array in free_area_init() and
      use the latter in ARC node/zone initialization.
      
      [rppt@kernel.org: ARC fix]
        Link: http://lkml.kernel.org/r/20200504153901.GM14260@kernel.org
      [rppt@linux.ibm.com: arc: free_area_init(): take into account PAE40 mode]
        Link: http://lkml.kernel.org/r/20200507205900.GH683243@linux.ibm.com
      [akpm@linux-foundation.org: declare arch_has_descending_max_zone_pfns()]
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Link: http://lkml.kernel.org/r/20200412194859.12663-18-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51930df5
    • Mike Rapoport's avatar
      mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES · acd3f5c4
      Mike Rapoport authored
      The memmap_init() function was made to iterate over memblock regions and
      as the result the early_pfn_in_nid() function became obsolete.  Since
      CONFIG_NODES_SPAN_OTHER_NODES is only used to pick a stub or a real
      implementation of early_pfn_in_nid(), it is also not needed anymore.
      
      Remove both early_pfn_in_nid() and the CONFIG_NODES_SPAN_OTHER_NODES.
      Co-developed-by: default avatarHoan Tran <Hoan@os.amperecomputing.com>
      Signed-off-by: default avatarHoan Tran <Hoan@os.amperecomputing.com>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-17-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      acd3f5c4
    • Baoquan He's avatar
      mm: memmap_init: iterate over memblock regions rather that check each PFN · 73a6e474
      Baoquan He authored
      When called during boot the memmap_init_zone() function checks if each PFN
      is valid and actually belongs to the node being initialized using
      early_pfn_valid() and early_pfn_in_nid().
      
      Each such check may cost up to O(log(n)) where n is the number of memory
      banks, so for large amount of memory overall time spent in early_pfn*()
      becomes substantial.
      
      Since the information is anyway present in memblock, we can iterate over
      memblock memory regions in memmap_init() and only call memmap_init_zone()
      for PFN ranges that are know to be valid and in the appropriate node.
      
      [cai@lca.pw: fix a compilation warning from Clang]
        Link: http://lkml.kernel.org/r/CF6E407F-17DC-427C-8203-21979FB882EF@lca.pw
      [bhe@redhat.com: fix the incorrect hole in fast_isolate_freepages()]
        Link: http://lkml.kernel.org/r/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
        Link: http://lkml.kernel.org/r/20200521014407.29690-1-bhe@redhat.comSigned-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Qian Cai <cai@lca.pw>
      Link: http://lkml.kernel.org/r/20200412194859.12663-16-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73a6e474
    • Mike Rapoport's avatar
      xtensa: simplify detection of memory zone boundaries · da50c57b
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-15-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da50c57b
    • Mike Rapoport's avatar
      unicore32: simplify detection of memory zone boundaries · 1b02ec01
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-14-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b02ec01
    • Mike Rapoport's avatar
      sparc32: simplify detection of memory zone boundaries · bee3b3cc
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-13-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bee3b3cc
    • Mike Rapoport's avatar
      parisc: simplify detection of memory zone boundaries · 625bf73e
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-12-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      625bf73e
    • Mike Rapoport's avatar
      m68k: mm: simplify detection of memory zone boundaries · 5d2ee1a1
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-11-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d2ee1a1
    • Mike Rapoport's avatar
      csky: simplify detection of memory zone boundaries · 8f4693f0
      Mike Rapoport authored
      The free_area_init() function only requires the definition of maximal PFN
      for each of the supported zone rater than calculation of actual zone sizes
      and the sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-10-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f4693f0
    • Mike Rapoport's avatar
      arm64: simplify detection of memory zone boundaries for UMA configs · 584cb13d
      Mike Rapoport authored
      The free_area_init() function only requires the definition of maximal PFN
      for each of the supported zone rater than calculation of actual zone sizes
      and the sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-9-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      584cb13d
    • Mike Rapoport's avatar
      arm: simplify detection of memory zone boundaries · a32c1c61
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-8-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a32c1c61
    • Mike Rapoport's avatar
      alpha: simplify detection of memory zone boundaries · 30760203
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-7-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30760203
    • Mike Rapoport's avatar
      mm: use free_area_init() instead of free_area_init_nodes() · 9691a071
      Mike Rapoport authored
      free_area_init() has effectively became a wrapper for
      free_area_init_nodes() and there is no point of keeping it.  Still
      free_area_init() name is shorter and more general as it does not imply
      necessity to initialize multiple nodes.
      
      Rename free_area_init_nodes() to free_area_init(), update the callers and
      drop old version of free_area_init().
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9691a071
    • Mike Rapoport's avatar
      mm: free_area_init: use maximal zone PFNs rather than zone sizes · fa3354e4
      Mike Rapoport authored
      Currently, architectures that use free_area_init() to initialize memory
      map and node and zone structures need to calculate zone and hole sizes.
      We can use free_area_init_nodes() instead and let it detect the zone
      boundaries while the architectures will only have to supply the possible
      limits for the zones.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-5-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa3354e4
    • Mike Rapoport's avatar
      mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option · 3f08a302
      Mike Rapoport authored
      CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
      nodes and zones structures between the systems that have region to node
      mapping in memblock and those that don't.
      
      Currently all the NUMA architectures enable this option and for the
      non-NUMA systems we can presume that all the memory belongs to node 0 and
      therefore the compile time configuration option is not required.
      
      The remaining few architectures that use DISCONTIGMEM without NUMA are
      easily updated to use memblock_add_node() instead of memblock_add() and
      thus have proper correspondence of memblock regions to NUMA nodes.
      
      Still, free_area_init_node() must have a backward compatible version
      because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
      different.  Once all the architectures will use the new semantics, the
      entire compatibility layer can be dropped.
      
      To avoid addition of extra run time memory to store node id for
      architectures that keep memblock but have only a single node, the node id
      field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
      the corresponding accessors presume that in those cases it is always 0.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-4-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f08a302
    • Mike Rapoport's avatar
      mm: make early_pfn_to_nid() and related defintions close to each other · 6f24fbd3
      Mike Rapoport authored
      early_pfn_to_nid() and its helper __early_pfn_to_nid() are spread around
      include/linux/mm.h, include/linux/mmzone.h and mm/page_alloc.c.
      
      Drop unused stub for __early_pfn_to_nid() and move its actual generic
      implementation close to its users.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-3-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f24fbd3
    • Mike Rapoport's avatar
      mm: memblock: replace dereferences of memblock_region.nid with API calls · d622abf7
      Mike Rapoport authored
      Patch series "mm: rework free_area_init*() funcitons".
      
      After the discussion [1] about removal of CONFIG_NODES_SPAN_OTHER_NODES
      and CONFIG_HAVE_MEMBLOCK_NODE_MAP options, I took it a bit further and
      updated the node/zone initialization.
      
      Since all architectures have memblock, it is possible to use only the
      newer version of free_area_init_node() that calculates the zone and node
      boundaries based on memblock node mapping and architectural limits on
      possible zone PFNs.
      
      The architectures that still determined zone and hole sizes can be
      switched to the generic code and the old code that took those zone and
      hole sizes can be simply removed.
      
      And, since it all started from the removal of
      CONFIG_NODES_SPAN_OTHER_NODES, the memmap_init() is now updated to iterate
      over memblocks and so it does not need to perform early_pfn_to_nid() query
      for every PFN.
      
      [1] https://lore.kernel.org/lkml/1585420282-25630-1-git-send-email-Hoan@os.amperecomputing.com
      
      This patch (of 21):
      
      There are several places in the code that directly dereference
      memblock_region.nid despite this field being defined only when
      CONFIG_HAVE_MEMBLOCK_NODE_MAP=y.
      
      Replace these with calls to memblock_get_region_nid() to improve code
      robustness and to avoid possible breakage when
      CONFIG_HAVE_MEMBLOCK_NODE_MAP will be removed.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200412194859.12663-2-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d622abf7
    • Michal Hocko's avatar
      mm: clarify __GFP_MEMALLOC usage · 574c1ae6
      Michal Hocko authored
      It seems that the existing documentation is not explicit about the
      expected usage and potential risks enough.  While it is calls out that
      users have to free memory when using this flag it is not really apparent
      that users have to careful to not deplete memory reserves and that they
      should implement some sort of throttling wrt.  freeing process.
      
      This is partly based on Neil's explanation [1].
      
      Let's also call out that a pre allocated pool allocator should be
      considered.
      
      [1] http://lkml.kernel.org/r/877dz0yxoa.fsf@notabene.neil.brown.name
      
      [akpm@linux-foundation.org: coding style fixes]
      [mhocko@kernel.org: update]
        Link: http://lkml.kernel.org/r/20200406070137.GC19426@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Link: http://lkml.kernel.org/r/20200403083543.11552-2-mhocko@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      574c1ae6
    • Daniel Axtens's avatar
      string.h: fix incompatibility between FORTIFY_SOURCE and KASAN · 47227d27
      Daniel Axtens authored
      The memcmp KASAN self-test fails on a kernel with both KASAN and
      FORTIFY_SOURCE.
      
      When FORTIFY_SOURCE is on, a number of functions are replaced with
      fortified versions, which attempt to check the sizes of the operands.
      However, these functions often directly invoke __builtin_foo() once they
      have performed the fortify check.  Using __builtins may bypass KASAN
      checks if the compiler decides to inline it's own implementation as
      sequence of instructions, rather than emit a function call that goes out
      to a KASAN-instrumented implementation.
      
      Why is only memcmp affected?
      ============================
      
      Of the string and string-like functions that kasan_test tests, only memcmp
      is replaced by an inline sequence of instructions in my testing on x86
      with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2).
      
      I believe this is due to compiler heuristics.  For example, if I annotate
      kmalloc calls with the alloc_size annotation (and disable some fortify
      compile-time checking!), the compiler will replace every memset except the
      one in kmalloc_uaf_memset with inline instructions.  (I have some WIP
      patches to add this annotation.)
      
      Does this affect other functions in string.h?
      =============================================
      
      Yes. Anything that uses __builtin_* rather than __real_* could be
      affected. This looks like:
      
       - strncpy
       - strcat
       - strlen
       - strlcpy maybe, under some circumstances?
       - strncat under some circumstances
       - memset
       - memcpy
       - memmove
       - memcmp (as noted)
       - memchr
       - strcpy
      
      Whether a function call is emitted always depends on the compiler.  Most
      bugs should get caught by FORTIFY_SOURCE, but the missed memcmp test shows
      that this is not always the case.
      
      Isn't FORTIFY_SOURCE disabled with KASAN?
      ========================================-
      
      The string headers on all arches supporting KASAN disable fortify with
      kasan, but only when address sanitisation is _also_ disabled.  For example
      from x86:
      
       #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
       /*
        * For files that are not instrumented (e.g. mm/slub.c) we
        * should use not instrumented version of mem* functions.
        */
       #define memcpy(dst, src, len) __memcpy(dst, src, len)
       #define memmove(dst, src, len) __memmove(dst, src, len)
       #define memset(s, c, n) __memset(s, c, n)
      
       #ifndef __NO_FORTIFY
       #define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
       #endif
      
       #endif
      
      This comes from commit 6974f0c4 ("include/linux/string.h: add the
      option of fortified string.h functions"), and doesn't work when KASAN is
      enabled and the file is supposed to be sanitised - as with test_kasan.c
      
      I'm pretty sure this is not wrong, but not as expansive it should be:
      
       * we shouldn't use __builtin_memcpy etc in files where we don't have
         instrumentation - it could devolve into a function call to memcpy,
         which will be instrumented. Rather, we should use __memcpy which
         by convention is not instrumented.
      
       * we also shouldn't be using __builtin_memcpy when we have a KASAN
         instrumented file, because it could be replaced with inline asm
         that will not be instrumented.
      
      What is correct behaviour?
      ==========================
      
      Firstly, there is some overlap between fortification and KASAN: both
      provide some level of _runtime_ checking. Only fortify provides
      compile-time checking.
      
      KASAN and fortify can pick up different things at runtime:
      
       - Some fortify functions, notably the string functions, could easily be
         modified to consider sub-object sizes (e.g. members within a struct),
         and I have some WIP patches to do this. KASAN cannot detect these
         because it cannot insert poision between members of a struct.
      
       - KASAN can detect many over-reads/over-writes when the sizes of both
         operands are unknown, which fortify cannot.
      
      So there are a couple of options:
      
       1) Flip the test: disable fortify in santised files and enable it in
          unsanitised files. This at least stops us missing KASAN checking, but
          we lose the fortify checking.
      
       2) Make the fortify code always call out to real versions. Do this only
          for KASAN, for fear of losing the inlining opportunities we get from
          __builtin_*.
      
      (We can't use kasan_check_{read,write}: because the fortify functions are
      _extern inline_, you can't include _static_ inline functions without a
      compiler warning. kasan_check_{read,write} are static inline so we can't
      use them even when they would otherwise be suitable.)
      
      Take approach 2 and call out to real versions when KASAN is enabled.
      
      Use __underlying_foo to distinguish from __real_foo: __real_foo always
      refers to the kernel's implementation of foo, __underlying_foo could be
      either the kernel implementation or the __builtin_foo implementation.
      
      This is sometimes enough to make the memcmp test succeed with
      FORTIFY_SOURCE enabled. It is at least enough to get the function call
      into the module. One more fix is needed to make it reliable: see the next
      patch.
      
      Fixes: 6974f0c4 ("include/linux/string.h: add the option of fortified string.h functions")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarDavid Gow <davidgow@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Link: http://lkml.kernel.org/r/20200423154503.5103-3-dja@axtens.netSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47227d27
    • Daniel Axtens's avatar
      kasan: stop tests being eliminated as dead code with FORTIFY_SOURCE · adb72ae1
      Daniel Axtens authored
      Patch series "Fix some incompatibilites between KASAN and FORTIFY_SOURCE", v4.
      
      3 KASAN self-tests fail on a kernel with both KASAN and FORTIFY_SOURCE:
      memchr, memcmp and strlen.
      
      When FORTIFY_SOURCE is on, a number of functions are replaced with
      fortified versions, which attempt to check the sizes of the operands.
      However, these functions often directly invoke __builtin_foo() once they
      have performed the fortify check.  The compiler can detect that the
      results of these functions are not used, and knows that they have no other
      side effects, and so can eliminate them as dead code.
      
      Why are only memchr, memcmp and strlen affected?
      ================================================
      
      Of string and string-like functions, kasan_test tests:
      
       * strchr  ->  not affected, no fortified version
       * strrchr ->  likewise
       * strcmp  ->  likewise
       * strncmp ->  likewise
      
       * strnlen ->  not affected, the fortify source implementation calls the
                     underlying strnlen implementation which is instrumented, not
                     a builtin
      
       * strlen  ->  affected, the fortify souce implementation calls a __builtin
                     version which the compiler can determine is dead.
      
       * memchr  ->  likewise
       * memcmp  ->  likewise
      
       * memset ->   not affected, the compiler knows that memset writes to its
      	       first argument and therefore is not dead.
      
      Why does this not affect the functions normally?
      ================================================
      
      In string.h, these functions are not marked as __pure, so the compiler
      cannot know that they do not have side effects.  If relevant functions are
      marked as __pure in string.h, we see the following warnings and the
      functions are elided:
      
      lib/test_kasan.c: In function `kasan_memchr':
      lib/test_kasan.c:606:2: warning: statement with no effect [-Wunused-value]
        memchr(ptr, '1', size + 1);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~
      lib/test_kasan.c: In function `kasan_memcmp':
      lib/test_kasan.c:622:2: warning: statement with no effect [-Wunused-value]
        memcmp(ptr, arr, size+1);
        ^~~~~~~~~~~~~~~~~~~~~~~~
      lib/test_kasan.c: In function `kasan_strings':
      lib/test_kasan.c:645:2: warning: statement with no effect [-Wunused-value]
        strchr(ptr, '1');
        ^~~~~~~~~~~~~~~~
      ...
      
      This annotation would make sense to add and could be added at any point,
      so the behaviour of test_kasan.c should change.
      
      The fix
      =======
      
      Make all the functions that are pure write their results to a global,
      which makes them live.  The strlen and memchr tests now pass.
      
      The memcmp test still fails to trigger, which is addressed in the next
      patch.
      
      [dja@axtens.net: drop patch 3]
        Link: http://lkml.kernel.org/r/20200424145521.8203-2-dja@axtens.net
      Fixes: 0c96350a ("lib/test_kasan.c: add tests for several string/memory API functions")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarDavid Gow <davidgow@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Link: http://lkml.kernel.org/r/20200423154503.5103-1-dja@axtens.net
      Link: http://lkml.kernel.org/r/20200423154503.5103-2-dja@axtens.netSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      adb72ae1
    • John Hubbard's avatar
      mm/gup: might_lock_read(mmap_sem) in get_user_pages_fast() · f81cd178
      John Hubbard authored
      Instead of scattering these assertions across the drivers, do this
      assertion inside the core of get_user_pages_fast*() functions.  That also
      includes pin_user_pages_fast*() routines.
      
      Add a might_lock_read(mmap_sem) call to internal_get_user_pages_fast().
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Link: http://lkml.kernel.org/r/20200522010443.1290485-1-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f81cd178
    • John Hubbard's avatar
      drm/i915: convert get_user_pages() --> pin_user_pages() · 2170ecfa
      John Hubbard authored
      This code was using get_user_pages*(), in a "Case 2" scenario (DMA/RDMA),
      using the categorization from [1].  That means that it's time to convert
      the get_user_pages*() + put_page() calls to pin_user_pages*() +
      unpin_user_pages() calls.
      
      There is some helpful background in [2]: basically, this is a small part
      of fixing a long-standing disconnect between pinning pages, and file
      systems' use of those pages.
      
      [1] Documentation/core-api/pin_user_pages.rst
      
      [2] "Explicit pinning of user-space pages":
          https://lwn.net/Articles/807108/Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Link: http://lkml.kernel.org/r/20200519002124.2025955-5-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2170ecfa
    • John Hubbard's avatar
      mm/gup: introduce pin_user_pages_fast_only() · 104acc32
      John Hubbard authored
      This is the FOLL_PIN equivalent of __get_user_pages_fast(), except with a
      more descriptive name, and gup_flags instead of a boolean "write" in the
      argument list.
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://lkml.kernel.org/r/20200519002124.2025955-4-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      104acc32
    • John Hubbard's avatar
      mm/gup: refactor and de-duplicate gup_fast() code · 376a34ef
      John Hubbard authored
      There were two nearly identical sets of code for gup_fast() style of
      walking the page tables with interrupts disabled.  This has lead to the
      usual maintenance problems that arise from having duplicated code.
      
      There is already a core internal routine in gup.c for gup_fast(), so just
      enhance it very slightly: allow skipping the fall-back to "slow" (regular)
      get_user_pages(), via the new FOLL_FAST_ONLY flag.  Then, just call
      internal_get_user_pages_fast() from __get_user_pages_fast(), and adjust
      the API to match pre-existing API behavior.
      
      There is a change in behavior from this refactoring: the nested form of
      interrupt disabling is used in all gup_fast() variants now.  That's
      because there is only one place that interrupt disabling for page walking
      is done, and so the safer form is required.  This should, if anything,
      eliminate possible (rare) bugs, because the non-nested form of enabling
      interrupts was fragile at best.
      
      [jhubbard@nvidia.com: fixup]
        Link: http://lkml.kernel.org/r/20200521233841.1279742-1-jhubbard@nvidia.comSigned-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://lkml.kernel.org/r/20200519002124.2025955-3-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      376a34ef
    • John Hubbard's avatar
      mm/gup: move __get_user_pages_fast() down a few lines in gup.c · 9e1f0580
      John Hubbard authored
      Patch series "mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()", v2.
      
      In order to convert the drm/i915 driver from get_user_pages() to
      pin_user_pages(), a FOLL_PIN equivalent of __get_user_pages_fast() was
      required.  That led to refactoring __get_user_pages_fast(), with the
      following goals:
      
      1) As above: provide a pin_user_pages*() routine for drm/i915 to call,
         in place of __get_user_pages_fast(),
      
      2) Get rid of the gup.c duplicate code for walking page tables with
         interrupts disabled. This duplicate code is a minor maintenance
         problem anyway.
      
      3) Make it easy for an upcoming patch from Souptick, which aims to
         convert __get_user_pages_fast() to use a gup_flags argument, instead
         of a bool writeable arg.  Also, if this series looks good, we can
         ask Souptick to change the name as well, to whatever the consensus
         is. My initial recommendation is: get_user_pages_fast_only(), to
         match the new pin_user_pages_only().
      
      This patch (of 4):
      
      This is in order to avoid a forward declaration of
      internal_get_user_pages_fast(), in the next patch.
      
      This is code movement only--all generated code should be identical.
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://lkml.kernel.org/r/20200522051931.54191-1-jhubbard@nvidia.com
      Link: http://lkml.kernel.org/r/20200519002124.2025955-1-jhubbard@nvidia.com
      Link: http://lkml.kernel.org/r/20200519002124.2025955-2-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e1f0580
    • Shakeel Butt's avatar
      mm/memcg: optimize memory.numa_stat like memory.stat · dd8657b6
      Shakeel Butt authored
      Currently reading memory.numa_stat traverses the underlying memcg tree
      multiple times to accumulate the stats to present the hierarchical view of
      the memcg tree.  However the kernel already maintains the hierarchical
      view of the stats and use it in memory.stat.  Just use the same mechanism
      in memory.numa_stat as well.
      
      I ran a simple benchmark which reads root_mem_cgroup's memory.numa_stat
      file in the presense of 10000 memcgs.  The results are:
      
      Without the patch:
      $ time cat /dev/cgroup/memory/memory.numa_stat > /dev/null
      
      real    0m0.700s
      user    0m0.001s
      sys     0m0.697s
      
      With the patch:
      $ time cat /dev/cgroup/memory/memory.numa_stat > /dev/null
      
      real    0m0.001s
      user    0m0.001s
      sys     0m0.000s
      
      [akpm@linux-foundation.org: avoid forcing out-of-line code generation]
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Link: http://lkml.kernel.org/r/20200304022058.248270-1-shakeelb@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd8657b6
    • Wang Hai's avatar
      mm/slub: fix a memory leak in sysfs_slab_add() · dde3c6b7
      Wang Hai authored
      syzkaller reports for memory leak when kobject_init_and_add() returns an
      error in the function sysfs_slab_add() [1]
      
      When this happened, the function kobject_put() is not called for the
      corresponding kobject, which potentially leads to memory leak.
      
      This patch fixes the issue by calling kobject_put() even if
      kobject_init_and_add() fails.
      
      [1]
        BUG: memory leak
        unreferenced object 0xffff8880a6d4be88 (size 8):
        comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s)
        hex dump (first 8 bytes):
          70 69 64 5f 33 00 ff ff                          pid_3...
        backtrace:
           kstrdup+0x35/0x70 mm/util.c:60
           kstrdup_const+0x3d/0x50 mm/util.c:82
           kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
           kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289
           kobject_add_varg lib/kobject.c:384 [inline]
           kobject_init_and_add+0xd8/0x170 lib/kobject.c:473
           sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811
           __kmem_cache_create+0x50a/0x570 mm/slub.c:4384
           create_cache+0x113/0x1e0 mm/slab_common.c:407
           kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505
           kmem_cache_create+0xd/0x10 mm/slab_common.c:564
           create_pid_cachep kernel/pid_namespace.c:54 [inline]
           create_pid_namespace kernel/pid_namespace.c:96 [inline]
           copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148
           create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95
           unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229
           ksys_unshare+0x3d2/0x770 kernel/fork.c:2969
           __do_sys_unshare kernel/fork.c:3037 [inline]
           __se_sys_unshare kernel/fork.c:3035 [inline]
           __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035
           do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
      
      Fixes: 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dde3c6b7
  2. 03 Jun, 2020 1 commit