1. 01 Jul, 2021 40 commits
    • Alistair Popple's avatar
      mm/rmap: split try_to_munlock from try_to_unmap · cd62734c
      Alistair Popple authored
      The behaviour of try_to_unmap_one() is difficult to follow because it
      performs different operations based on a fairly large set of flags used in
      different combinations.
      
      TTU_MUNLOCK is one such flag.  However it is exclusively used by
      try_to_munlock() which specifies no other flags.  Therefore rather than
      overload try_to_unmap_one() with unrelated behaviour split this out into
      it's own function and remove the flag.
      
      Link: https://lkml.kernel.org/r/20210616105937.23201-4-apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd62734c
    • Alistair Popple's avatar
      mm/swapops: rework swap entry manipulation code · 4dd845b5
      Alistair Popple authored
      Both migration and device private pages use special swap entries that are
      manipluated by a range of inline functions.  The arguments to these are
      somewhat inconsistent so rework them to remove flag type arguments and to
      make the arguments similar for both read and write entry creation.
      
      Link: https://lkml.kernel.org/r/20210616105937.23201-3-apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4dd845b5
    • Alistair Popple's avatar
      mm: remove special swap entry functions · af5cdaf8
      Alistair Popple authored
      Patch series "Add support for SVM atomics in Nouveau", v11.
      
      Introduction
      ============
      
      Some devices have features such as atomic PTE bits that can be used to
      implement atomic access to system memory.  To support atomic operations to
      a shared virtual memory page such a device needs access to that page which
      is exclusive of the CPU.  This series introduces a mechanism to
      temporarily unmap pages granting exclusive access to a device.
      
      These changes are required to support OpenCL atomic operations in Nouveau
      to shared virtual memory (SVM) regions allocated with the
      CL_MEM_SVM_ATOMICS clSVMAlloc flag.  A more complete description of the
      OpenCL SVM feature is available at
      https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/
      OpenCL_API.html#_shared_virtual_memory .
      
      Implementation
      ==============
      
      Exclusive device access is implemented by adding a new swap entry type
      (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry.  The main
      difference is that on fault the original entry is immediately restored by
      the fault handler instead of waiting.
      
      Restoring the entry triggers calls to MMU notifers which allows a device
      driver to revoke the atomic access permission from the GPU prior to the
      CPU finalising the entry.
      
      Patches
      =======
      
      Patches 1 & 2 refactor existing migration and device private entry
      functions.
      
      Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
      functionality into separate functions - try_to_migrate_one() and
      try_to_munlock_one().
      
      Patch 5 renames some existing code but does not introduce functionality.
      
      Patch 6 is a small clean-up to swap entry handling in copy_pte_range().
      
      Patch 7 contains the bulk of the implementation for device exclusive
      memory.
      
      Patch 8 contains some additions to the HMM selftests to ensure everything
      works as expected.
      
      Patch 9 is a cleanup for the Nouveau SVM implementation.
      
      Patch 10 contains the implementation of atomic access for the Nouveau
      driver.
      
      Testing
      =======
      
      This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program
      which checks that GPU atomic accesses to system memory are atomic.
      Without this series the test fails as there is no way of write-protecting
      the page mapping which results in the device clobbering CPU writes.  For
      reference the test is available at
      https://ozlabs.org/~apopple/opencl_svm_atomics/
      
      Further testing has been performed by adding support for testing exclusive
      access to the hmm-tests kselftests.
      
      This patch (of 10):
      
      Remove multiple similar inline functions for dealing with different types
      of special swap entries.
      
      Both migration and device private swap entries use the swap offset to
      store a pfn.  Instead of multiple inline functions to obtain a struct page
      for each swap entry type use a common function pfn_swap_entry_to_page().
      Also open-code the various entry_to_pfn() functions as this results is
      shorter code that is easier to understand.
      
      Link: https://lkml.kernel.org/r/20210616105937.23201-1-apopple@nvidia.com
      Link: https://lkml.kernel.org/r/20210616105937.23201-2-apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af5cdaf8
    • Marco Elver's avatar
      kfence: unconditionally use unbound work queue · ff06e45d
      Marco Elver authored
      Unconditionally use unbound work queue, and not just if wq_power_efficient
      is true.  Because if the system is idle, KFENCE may wait, and by being run
      on the unbound work queue, we permit the scheduler to make better
      scheduling decisions and not require pinning KFENCE to the same CPU upon
      waking up.
      
      Link: https://lkml.kernel.org/r/20210521111630.472579-1-elver@google.com
      Fixes: 36f0b35d ("kfence: use power-efficient work queue to run delayed work")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reported-by: default avatarHillf Danton <hdanton@sina.com>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff06e45d
    • Anshuman Khandual's avatar
      mm/thp: define default pmd_pgtable() · 1c2f7d14
      Anshuman Khandual authored
      Currently most platforms define pmd_pgtable() as pmd_page() duplicating
      the same code all over.  Instead just define a default value i.e
      pmd_page() for pmd_pgtable() and let platforms override when required via
      <asm/pgtable.h>.  All the existing platform that override pmd_pgtable()
      have been moved into their respective <asm/pgtable.h> header in order to
      precede before the new generic definition.  This makes it much cleaner
      with reduced code.
      
      Link: https://lkml.kernel.org/r/1623646133-20306-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c2f7d14
    • Mel Gorman's avatar
      mm/swap: make NODE_DATA an inline function on CONFIG_FLATMEM · 351de44f
      Mel Gorman authored
      make W=1 generates the following warning in mm/workingset.c for allnoconfig
      
        mm/workingset.c: In function `unpack_shadow':
        mm/workingset.c:201:15: warning: variable `nid' set but not used [-Wunused-but-set-variable]
          int memcgid, nid;
                       ^~~
      
      On FLATMEM, NODE_DATA returns a global pglist_data without dereferencing
      nid.  Make the helper an inline function to suppress the warning, add type
      checking and to apply any side-effects in the parameter list.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-15-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      351de44f
    • Mel Gorman's avatar
      mm/page_alloc: move prototype for find_suitable_fallback · ffd8f251
      Mel Gorman authored
      make W=1 generates the following warning in mmap_lock.c for allnoconfig
      
        mm/page_alloc.c:2670:5: warning: no previous prototype for `find_suitable_fallback' [-Wmissing-prototypes]
         int find_suitable_fallback(struct free_area *area, unsigned int order,
             ^~~~~~~~~~~~~~~~~~~~~~
      
      find_suitable_fallback is only shared outside of page_alloc.c for
      CONFIG_COMPACTION but to suppress the warning, move the protype outside of
      CONFIG_COMPACTION.  It is not worth the effort at this time to find a
      clever way of allowing compaction.c to share the code or avoid the use
      entirely as the function is called on relatively slow paths.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-14-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ffd8f251
    • Mel Gorman's avatar
      mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations · d01079f3
      Mel Gorman authored
      make W=1 generates the following warning in mmap_lock.c for allnoconfig
      
        mm/mmap_lock.c:213:6: warning: no previous prototype for `__mmap_lock_do_trace_start_locking' [-Wmissing-prototypes]
         void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write)
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        mm/mmap_lock.c:219:6: warning: no previous prototype for `__mmap_lock_do_trace_acquire_returned' [-Wmissing-prototypes]
         void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        mm/mmap_lock.c:226:6: warning: no previous prototype for `__mmap_lock_do_trace_released' [-Wmissing-prototypes]
         void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write)
      
      On !CONFIG_TRACING configurations, the code is dead so put it behind an
      #ifdef.
      
      [cuibixuan@huawei.com: fix warning when CONFIG_TRACING is not defined]
        Link: https://lkml.kernel.org/r/20210531033426.74031-1-cuibixuan@huawei.com
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-13-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarBixuan Cui <cuibixuan@huawei.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d01079f3
    • Mel Gorman's avatar
      mm/swap: make swap_address_space an inline function · 2bb6a033
      Mel Gorman authored
      make W=1 generates the following warning in page_mapping() for allnoconfig
      
        mm/util.c:700:15: warning: variable `entry' set but not used [-Wunused-but-set-variable]
           swp_entry_t entry;
                       ^~~~~
      
      swap_address is a #define on !CONFIG_SWAP configurations.  Make the helper
      an inline function to suppress the warning, add type checking and to apply
      any side-effects in the parameter list.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-12-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bb6a033
    • Mel Gorman's avatar
      mm/z3fold: add kerneldoc fields for z3fold_pool · 30522175
      Mel Gorman authored
      make W=1 generates the following warning for z3fold_pool
      
        mm/z3fold.c:171: warning: Function parameter or member 'zpool' not described in 'z3fold_pool'
        mm/z3fold.c:171: warning: Function parameter or member 'zpool_ops' not described in 'z3fold_pool'
      
      Commit 9a001fc1 ("z3fold: the 3-fold allocator for compressed pages")
      simply did not document the fields at the time.  Add rudimentary
      documentation.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-11-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30522175
    • Mel Gorman's avatar
      mm/zbud: add kerneldoc fields for zbud_pool · a29a7506
      Mel Gorman authored
      make W=1 generates the following warning for zbud_pool
      
        mm/zbud.c:105: warning: Function parameter or member 'zpool' not described in 'zbud_pool'
        mm/zbud.c:105: warning: Function parameter or member 'zpool_ops' not described in 'zbud_pool'
      
      Commit 479305fd ("zpool: remove zpool_evict()") removed the
      zpool_evict helper and added the associated zpool and operations structure
      in struct zbud_pool but did not add documentation for the fields.  Add
      rudimentary documentation.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-10-mgorman@techsingularity.net
      Fixes: 479305fd ("zpool: remove zpool_evict()")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a29a7506
    • Mel Gorman's avatar
      mm/memory_hotplug: fix kerneldoc comment for __remove_memory · 5640c9ca
      Mel Gorman authored
      make W=1 generates the following warning for __remove_memory
      
        mm/memory_hotplug.c:2044: warning: expecting prototype for remove_memory(). Prototype was for __remove_memory() instead
      
      Commit eca499ab ("mm/hotplug: make remove_memory() interface usable")
      introduced the kerneldoc comment and function but the kerneldoc name and
      function name did not match.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-9-mgorman@techsingularity.net
      Fixes: eca499ab ("mm/hotplug: make remove_memory() interface usable")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5640c9ca
    • Mel Gorman's avatar
      mm/memory_hotplug: fix kerneldoc comment for __try_online_node · ba2d2666
      Mel Gorman authored
      make W=1 generates the following warning for try_online_node
      
      mm/memory_hotplug.c:1087: warning: expecting prototype for try_online_node(). Prototype was for __try_online_node() instead
      
      Commit b9ff0360 ("mm/memory_hotplug.c: make add_memory_resource use
      __try_online_node") renamed the function but did not update the associated
      kerneldoc.  The function is static and somewhat specialised in nature so
      it's not clear it warrants being a kerneldoc by moving the comment to
      try_online_node.  Hence, leave the comment of the internal helper in place
      but leave it out of kerneldoc and correct the function name in the
      comment.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-8-mgorman@techsingularity.net
      Fixes: Commit b9ff0360 ("mm/memory_hotplug.c: make add_memory_resource use __try_online_node")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba2d2666
    • Mel Gorman's avatar
      mm/memcontrol.c: fix kerneldoc comment for mem_cgroup_calculate_protection · 05395718
      Mel Gorman authored
      make W=1 generates the following warning for mem_cgroup_calculate_protection
      
        mm/memcontrol.c:6468: warning: expecting prototype for mem_cgroup_protected(). Prototype was for mem_cgroup_calculate_protection() instead
      
      Commit 45c7f7e1 ("mm, memcg: decouple e{low,min} state mutations from
      protection checks") changed the function definition but not the associated
      kerneldoc comment.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-7-mgorman@techsingularity.net
      Fixes: 45c7f7e1 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05395718
    • Mel Gorman's avatar
      mm/mapping_dirty_helpers: remove double Note in kerneldoc · b417941f
      Mel Gorman authored
      make W=1 generates the following warning for mm/mapping_dirty_helpers.c
      
      mm/mapping_dirty_helpers.c:325: warning: duplicate section name 'Note'
      
      The helper function is very specific to one driver -- vmwgfx.  While the
      two notes are separate, all of it needs to be taken into account when
      using the helper so make it one note.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-5-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b417941f
    • Mel Gorman's avatar
      mm/page_alloc: make should_fail_alloc_page() static · f7173090
      Mel Gorman authored
      make W=1 generates the following warning for mm/page_alloc.c
      
        mm/page_alloc.c:3651:15: warning: no previous prototype for `should_fail_alloc_page' [-Wmissing-prototypes]
         noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
                       ^~~~~~~~~~~~~~~~~~~~~~
      
      This function is deliberately split out for BPF to allow errors to be
      injected.  The function is not used anywhere else so it is local to the
      file.  Make it static which should still allow error injection to be used
      similar to how block/blk-core.c:should_fail_bio() works.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-4-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7173090
    • Mel Gorman's avatar
      mm/vmalloc: include header for prototype of set_iounmap_nonlazy · 5da96bdd
      Mel Gorman authored
      make W=1 generates the following warning for mm/vmalloc.c
      
        mm/vmalloc.c:1599:6: warning: no previous prototype for `set_iounmap_nonlazy' [-Wmissing-prototypes]
         void set_iounmap_nonlazy(void)
              ^~~~~~~~~~~~~~~~~~~
      
      This is an arch-generic function only used by x86.  On other arches, it's
      dead code.  Include the header with the definition and make it x86-64
      specific.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-3-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5da96bdd
    • Mel Gorman's avatar
      mm/vmscan: remove kerneldoc-like comment from isolate_lru_pages · f611fab7
      Mel Gorman authored
      Patch series "Clean W=1 build warnings for mm/".
      
      This is a janitorial only.  During development of a tool to catch build
      warnings early to avoid tripping the Intel lkp-robot, I noticed that mm/
      is not clean for W=1.  This is generally harmless but there is no harm in
      cleaning it up.  It disrupts git blame a little but on relatively obvious
      lines that are unlikely to be git blame targets.
      
      This patch (of 13):
      
      make W=1 generates the following warning for vmscan.c
      
          mm/vmscan.c:1814: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
      
      It is not a kerneldoc comment and isolate_lru_pages() is a static
      function.  While the detailed comment is nice, it does not need to be
      exposed via kernel-doc.
      
      Link: https://lkml.kernel.org/r/20210520084809.8576-1-mgorman@techsingularity.net
      Link: https://lkml.kernel.org/r/20210520084809.8576-2-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f611fab7
    • Zhen Lei's avatar
      mm: fix spelling mistakes · 041711ce
      Zhen Lei authored
      Fix some spelling mistakes in comments:
      each having differents usage ==> each has a different usage
      statments ==> statements
      adresses ==> addresses
      aggresive ==> aggressive
      datas ==> data
      posion ==> poison
      higer ==> higher
      precisly ==> precisely
      wont ==> won't
      We moves tha ==> We move the
      endianess ==> endianness
      
      Link: https://lkml.kernel.org/r/20210519065853.7723-2-thunder.leizhen@huawei.comSigned-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Reviewed-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      041711ce
    • Anshuman Khandual's avatar
      mm: define default value for FIRST_USER_ADDRESS · fac7757e
      Anshuman Khandual authored
      Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the
      same code all over.  Instead just define a generic default value (i.e 0UL)
      for FIRST_USER_ADDRESS and let the platforms override when required.  This
      makes it much cleaner with reduced code.
      
      The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h>
      when the given platform overrides its value via <asm/pgtable.h>.
      
      Link: https://lkml.kernel.org/r/1620615725-24623-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Guo Ren <guoren@kernel.org>			[csky]
      Acked-by: Stafford Horne <shorne@gmail.com>		[openrisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>	[RISC-V]
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fac7757e
    • Hyeonggon Yoo's avatar
      mm: fix typos and grammar error in comments · c4ffefd1
      Hyeonggon Yoo authored
      We moves tha -> We move that in mm/swap.c
      statments -> statements in include/linux/mm.h
      
      Link: https://lkml.kernel.org/r/20210509063444.GA24745@hyeyooSigned-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4ffefd1
    • Yue Hu's avatar
      zram: move backing_dev under macro CONFIG_ZRAM_WRITEBACK · dd794835
      Yue Hu authored
      backing_dev is never used when not enable CONFIG_ZRAM_WRITEBACK and it's
      introduced from writeback feature.  So it's needless also affect
      readability in that case.
      
      Link: https://lkml.kernel.org/r/20210521060544.2385-1-zbestahu@gmail.comSigned-off-by: default avatarYue Hu <huyue2@yulong.com>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd794835
    • Miaohe Lin's avatar
      mm/zsmalloc.c: improve readability for async_free_zspage() · 33848337
      Miaohe Lin authored
      The class is extracted from pool->size_class[class_idx] again before
      calling __free_zspage().  It looks like class will change after we fetch
      the class lock.  But this is misleading as class will stay unchanged.
      
      Link: https://lkml.kernel.org/r/20210624123930.1769093-4-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33848337
    • Miaohe Lin's avatar
      mm/zsmalloc.c: remove confusing code in obj_free() · ce8475b6
      Miaohe Lin authored
      Patch series "Cleanup for zsmalloc".
      
      This series contains cleanups to remove confusing code in obj_free(),
      combine two atomic ops and improve readability for async_free_zspage().
      More details can be found in the respective changelogs.
      
      This patch (of 2):
      
      OBJ_ALLOCATED_TAG is only set for handle to indicate allocated object.
      It's irrelevant with obj.  So remove this misleading code to improve
      readability.
      
      Link: https://lkml.kernel.org/r/20210624123930.1769093-1-linmiaohe@huawei.com
      Link: https://lkml.kernel.org/r/20210624123930.1769093-2-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce8475b6
    • Miaohe Lin's avatar
      mm/zswap.c: fix two bugs in zswap_writeback_entry() · 46b76f2e
      Miaohe Lin authored
      In the ZSWAP_SWAPCACHE_FAIL and ZSWAP_SWAPCACHE_EXIST case, we forgot to
      call zpool_unmap_handle() when zpool can't sleep. And we might sleep in
      zswap_get_swap_cache_page() while zpool can't sleep. To fix all of these,
      zpool_unmap_handle() should be done before zswap_get_swap_cache_page()
      when zpool can't sleep.
      
      Link: https://lkml.kernel.org/r/20210522092242.3233191-4-linmiaohe@huawei.com
      Fixes: fc6697a8 ("mm/zswap: add the flag can_sleep_mapped")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Tian Tao <tiantao6@hisilicon.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      46b76f2e
    • Miaohe Lin's avatar
      mm/zswap.c: avoid unnecessary copy-in at map time · ae34af1f
      Miaohe Lin authored
      The buf mapped via zpool_map_handle() is only used to store compressed
      page buffer and there is no information to extract from it. So we could
      use ZPOOL_MM_WO instead to avoid unnecessary copy-in at map time.
      
      Link: https://lkml.kernel.org/r/20210522092242.3233191-3-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Tian Tao <tiantao6@hisilicon.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae34af1f
    • Miaohe Lin's avatar
      mm/zswap.c: remove unused function zswap_debugfs_exit() · 2c1e9a2c
      Miaohe Lin authored
      Patch series "Cleanup and fixup for zswap".
      
      This series contains cleanups to remove unused function and avoid
      unnecessary copy-in at map time.  Also this fixes two bugs in the function
      zswap_writeback_entry().  More details can be found in the respective
      changelogs.
      
      This patch (of 3):
      
      zswap_debugfs_exit() is unused, remove it.
      
      Link: https://lkml.kernel.org/r/20210522092242.3233191-1-linmiaohe@huawei.com
      Link: https://lkml.kernel.org/r/20210522092242.3233191-2-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Tian Tao <tiantao6@hisilicon.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c1e9a2c
    • Oscar Salvador's avatar
      mm,memory_hotplug: drop unneeded locking · 27cacaad
      Oscar Salvador authored
      Currently, memory-hotplug code takes zone's span_writelock and pgdat's
      resize_lock when resizing the node/zone's spanned pages via
      {move_pfn_range_to_zone(),remove_pfn_range_from_zone()} and when resizing
      node and zone's present pages via adjust_present_page_count().
      
      These locks are also taken during the initialization of the system at boot
      time, where it protects parallel struct page initialization, but they
      should not really be needed in memory-hotplug where all operations are a)
      synchronized on device level and b) serialized by the mem_hotplug_lock
      lock.
      
      [akpm@linux-foundation.org: remove now-unused locals]
      
      Link: https://lkml.kernel.org/r/20210531093958.15021-1-osalvador@suse.deSigned-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27cacaad
    • Liam Mark's avatar
      mm/memory_hotplug: rate limit page migration warnings · 786dee86
      Liam Mark authored
      When offlining memory the system can attempt to migrate a lot of pages, if
      there are problems with migration this can flood the logs.  Printing all
      the data hogs the CPU and cause some RT threads to run for a long time,
      which may have some bad consequences.
      
      Rate limit the page migration warnings in order to avoid this.
      
      Link: https://lkml.kernel.org/r/20210505140542.24935-1-georgi.djakov@linaro.orgSigned-off-by: default avatarLiam Mark <lmark@codeaurora.org>
      Signed-off-by: default avatarGeorgi Djakov <georgi.djakov@linaro.org>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      786dee86
    • David Hildenbrand's avatar
      selftests/vm: add test for MADV_POPULATE_(READ|WRITE) · e5bfac53
      David Hildenbrand authored
      Let's add a simple test for MADV_POPULATE_READ and MADV_POPULATE_WRITE,
      verifying some error handling, that population works, and that softdirty
      tracking works as expected.  For now, limit the test to private anonymous
      memory.
      
      Link: https://lkml.kernel.org/r/20210419135443.12822-6-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5bfac53
    • David Hildenbrand's avatar
      selftests/vm: add protection_keys_32 / protection_keys_64 to gitignore · 2abdd8b8
      David Hildenbrand authored
      We missed adding two binaries to gitignore.
      
      Link: https://lkml.kernel.org/r/20210419135443.12822-5-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2abdd8b8
    • David Hildenbrand's avatar
      MAINTAINERS: add tools/testing/selftests/vm/ to MEMORY MANAGEMENT · 5d334317
      David Hildenbrand authored
      MEMORY MANAGEMENT seems to be a good fit.
      
      Link: https://lkml.kernel.org/r/20210419135443.12822-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d334317
    • David Hildenbrand's avatar
      mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables · 4ca9b385
      David Hildenbrand authored
      I. Background: Sparse Memory Mappings
      
      When we manage sparse memory mappings dynamically in user space - also
      sometimes involving MAP_NORESERVE - we want to dynamically populate/
      discard memory inside such a sparse memory region.  Example users are
      hypervisors (especially implementing memory ballooning or similar
      technologies like virtio-mem) and memory allocators.  In addition, we want
      to fail in a nice way (instead of generating SIGBUS) if populating does
      not succeed because we are out of backend memory (which can happen easily
      with file-based mappings, especially tmpfs and hugetlbfs).
      
      While MADV_DONTNEED, MADV_REMOVE and FALLOC_FL_PUNCH_HOLE allow for
      reliably discarding memory for most mapping types, there is no generic
      approach to populate page tables and preallocate memory.
      
      Although mmap() supports MAP_POPULATE, it is not applicable to the concept
      of sparse memory mappings, where we want to populate/discard dynamically
      and avoid expensive/problematic remappings.  In addition, we never
      actually report errors during the final populate phase - it is best-effort
      only.
      
      fallocate() can be used to preallocate file-based memory and fail in a
      safe way.  However, it cannot really be used for any private mappings on
      anonymous files via memfd due to COW semantics.  In addition, fallocate()
      does not actually populate page tables, so we still always get pagefaults
      on first access - which is sometimes undesired (i.e., real-time workloads)
      and requires real prefaulting of page tables, not just a preallocation of
      backend storage.  There might be interesting use cases for sparse memory
      regions along with mlockall(MCL_ONFAULT) which fallocate() cannot satisfy
      as it does not prefault page tables.
      
      II. On preallcoation/prefaulting from user space
      
      Because we don't have a proper interface, what applications (like QEMU and
      databases) end up doing is touching (i.e., reading+writing one byte to not
      overwrite existing data) all individual pages.
      
      However, that approach
      1) Can result in wear on storage backing, because we end up reading/writing
         each page; this is especially a problem for dax/pmem.
      2) Can result in mmap_sem contention when prefaulting via multiple
         threads.
      3) Requires expensive signal handling, especially to catch SIGBUS in case
         of hugetlbfs/shmem/file-backed memory. For example, this is
         problematic in hypervisors like QEMU where SIGBUS handlers might already
         be used by other subsystems concurrently to e.g, handle hardware errors.
         "Simply" doing preallocation concurrently from other thread is not that
         easy.
      
      III. On MADV_WILLNEED
      
      Extending MADV_WILLNEED is not an option because
      1. It would change the semantics: "Expect access in the near future." and
         "might be a good idea to read some pages" vs. "Definitely populate/
         preallocate all memory and definitely fail on errors.".
      2. Existing users (like virtio-balloon in QEMU when deflating the balloon)
         don't want populate/prealloc semantics. They treat this rather as a hint
         to give a little performance boost without too much overhead - and don't
         expect that a lot of memory might get consumed or a lot of time
         might be spent.
      
      IV. MADV_POPULATE_READ and MADV_POPULATE_WRITE
      
      Let's introduce MADV_POPULATE_READ and MADV_POPULATE_WRITE, inspired by
      MAP_POPULATE, with the following semantics:
      1. MADV_POPULATE_READ can be used to prefault page tables just like
         manually reading each individual page. This will not break any COW
         mappings. The shared zero page might get mapped and no backend storage
         might get preallocated -- allocation might be deferred to
         write-fault time. Especially shared file mappings require an explicit
         fallocate() upfront to actually preallocate backend memory (blocks in
         the file system) in case the file might have holes.
      2. If MADV_POPULATE_READ succeeds, all page tables have been populated
         (prefaulted) readable once.
      3. MADV_POPULATE_WRITE can be used to preallocate backend memory and
         prefault page tables just like manually writing (or
         reading+writing) each individual page. This will break any COW
         mappings -- e.g., the shared zeropage is never populated.
      4. If MADV_POPULATE_WRITE succeeds, all page tables have been populated
         (prefaulted) writable once.
      5. MADV_POPULATE_READ and MADV_POPULATE_WRITE cannot be applied to special
         mappings marked with VM_PFNMAP and VM_IO. Also, proper access
         permissions (e.g., PROT_READ, PROT_WRITE) are required. If any such
         mapping is encountered, madvise() fails with -EINVAL.
      6. If MADV_POPULATE_READ or MADV_POPULATE_WRITE fails, some page tables
         might have been populated.
      7. MADV_POPULATE_READ and MADV_POPULATE_WRITE will return -EHWPOISON
         when encountering a HW poisoned page in the range.
      8. Similar to MAP_POPULATE, MADV_POPULATE_READ and MADV_POPULATE_WRITE
         cannot protect from the OOM (Out Of Memory) handler killing the
         process.
      
      While the use case for MADV_POPULATE_WRITE is fairly obvious (i.e.,
      preallocate memory and prefault page tables for VMs), one issue is that
      whenever we prefault pages writable, the pages have to be marked dirty,
      because the CPU could dirty them any time.  while not a real problem for
      hugetlbfs or dax/pmem, it can be a problem for shared file mappings: each
      page will be marked dirty and has to be written back later when evicting.
      
      MADV_POPULATE_READ allows for optimizing this scenario: Pre-read a whole
      mapping from backend storage without marking it dirty, such that eviction
      won't have to write it back.  As discussed above, shared file mappings
      might require an explciit fallocate() upfront to achieve
      preallcoation+prepopulation.
      
      Although sparse memory mappings are the primary use case, this will also
      be useful for other preallocate/prefault use cases where MAP_POPULATE is
      not desired or the semantics of MAP_POPULATE are not sufficient: as one
      example, QEMU users can trigger preallocation/prefaulting of guest RAM
      after the mapping was created -- and don't want errors to be silently
      suppressed.
      
      Looking at the history, MADV_POPULATE was already proposed in 2013 [1],
      however, the main motivation back than was performance improvements --
      which should also still be the case.
      
      V. Single-threaded performance comparison
      
      I did a short experiment, prefaulting page tables on completely *empty
      mappings/files* and repeated the experiment 10 times.  The results
      correspond to the shortest execution time.  In general, the performance
      benefit for huge pages is negligible with small mappings.
      
      V.1: Private mappings
      
      POPULATE_READ and POPULATE_WRITE is fastest.  Note that
      Reading/POPULATE_READ will populate the shared zeropage where applicable
      -- which result in short population times.
      
      The fastest way to allocate backend storage (here: swap or huge pages) and
      prefault page tables is POPULATE_WRITE.
      
      V.2: Shared mappings
      
      fallocate() is fastest, however, doesn't prefault page tables.
      POPULATE_WRITE is faster than simple writes and read/writes.
      POPULATE_READ is faster than simple reads.
      
      Without a fd, the fastest way to allocate backend storage and prefault
      page tables is POPULATE_WRITE.  With an fd, the fastest way is usually
      FALLOCATE+POPULATE_READ or FALLOCATE+POPULATE_WRITE respectively; one
      exception are actual files: FALLOCATE+Read is slightly faster than
      FALLOCATE+POPULATE_READ.
      
      The fastest way to allocate backend storage prefault page tables is
      FALLOCATE+POPULATE_WRITE -- except when dealing with actual files; then,
      FALLOCATE+POPULATE_READ is fastest and won't directly mark all pages as
      dirty.
      
      v.3: Detailed results
      
      ==================================================
      2 MiB MAP_PRIVATE:
      **************************************************
      Anon 4 KiB     : Read                     :     0.119 ms
      Anon 4 KiB     : Write                    :     0.222 ms
      Anon 4 KiB     : Read/Write               :     0.380 ms
      Anon 4 KiB     : POPULATE_READ            :     0.060 ms
      Anon 4 KiB     : POPULATE_WRITE           :     0.158 ms
      Memfd 4 KiB    : Read                     :     0.034 ms
      Memfd 4 KiB    : Write                    :     0.310 ms
      Memfd 4 KiB    : Read/Write               :     0.362 ms
      Memfd 4 KiB    : POPULATE_READ            :     0.039 ms
      Memfd 4 KiB    : POPULATE_WRITE           :     0.229 ms
      Memfd 2 MiB    : Read                     :     0.030 ms
      Memfd 2 MiB    : Write                    :     0.030 ms
      Memfd 2 MiB    : Read/Write               :     0.030 ms
      Memfd 2 MiB    : POPULATE_READ            :     0.030 ms
      Memfd 2 MiB    : POPULATE_WRITE           :     0.030 ms
      tmpfs          : Read                     :     0.033 ms
      tmpfs          : Write                    :     0.313 ms
      tmpfs          : Read/Write               :     0.406 ms
      tmpfs          : POPULATE_READ            :     0.039 ms
      tmpfs          : POPULATE_WRITE           :     0.285 ms
      file           : Read                     :     0.033 ms
      file           : Write                    :     0.351 ms
      file           : Read/Write               :     0.408 ms
      file           : POPULATE_READ            :     0.039 ms
      file           : POPULATE_WRITE           :     0.290 ms
      hugetlbfs      : Read                     :     0.030 ms
      hugetlbfs      : Write                    :     0.030 ms
      hugetlbfs      : Read/Write               :     0.030 ms
      hugetlbfs      : POPULATE_READ            :     0.030 ms
      hugetlbfs      : POPULATE_WRITE           :     0.030 ms
      **************************************************
      4096 MiB MAP_PRIVATE:
      **************************************************
      Anon 4 KiB     : Read                     :   237.940 ms
      Anon 4 KiB     : Write                    :   708.409 ms
      Anon 4 KiB     : Read/Write               :  1054.041 ms
      Anon 4 KiB     : POPULATE_READ            :   124.310 ms
      Anon 4 KiB     : POPULATE_WRITE           :   572.582 ms
      Memfd 4 KiB    : Read                     :   136.928 ms
      Memfd 4 KiB    : Write                    :   963.898 ms
      Memfd 4 KiB    : Read/Write               :  1106.561 ms
      Memfd 4 KiB    : POPULATE_READ            :    78.450 ms
      Memfd 4 KiB    : POPULATE_WRITE           :   805.881 ms
      Memfd 2 MiB    : Read                     :   357.116 ms
      Memfd 2 MiB    : Write                    :   357.210 ms
      Memfd 2 MiB    : Read/Write               :   357.606 ms
      Memfd 2 MiB    : POPULATE_READ            :   356.094 ms
      Memfd 2 MiB    : POPULATE_WRITE           :   356.937 ms
      tmpfs          : Read                     :   137.536 ms
      tmpfs          : Write                    :   954.362 ms
      tmpfs          : Read/Write               :  1105.954 ms
      tmpfs          : POPULATE_READ            :    80.289 ms
      tmpfs          : POPULATE_WRITE           :   822.826 ms
      file           : Read                     :   137.874 ms
      file           : Write                    :   987.025 ms
      file           : Read/Write               :  1107.439 ms
      file           : POPULATE_READ            :    80.413 ms
      file           : POPULATE_WRITE           :   857.622 ms
      hugetlbfs      : Read                     :   355.607 ms
      hugetlbfs      : Write                    :   355.729 ms
      hugetlbfs      : Read/Write               :   356.127 ms
      hugetlbfs      : POPULATE_READ            :   354.585 ms
      hugetlbfs      : POPULATE_WRITE           :   355.138 ms
      **************************************************
      2 MiB MAP_SHARED:
      **************************************************
      Anon 4 KiB     : Read                     :     0.394 ms
      Anon 4 KiB     : Write                    :     0.348 ms
      Anon 4 KiB     : Read/Write               :     0.400 ms
      Anon 4 KiB     : POPULATE_READ            :     0.326 ms
      Anon 4 KiB     : POPULATE_WRITE           :     0.273 ms
      Anon 2 MiB     : Read                     :     0.030 ms
      Anon 2 MiB     : Write                    :     0.030 ms
      Anon 2 MiB     : Read/Write               :     0.030 ms
      Anon 2 MiB     : POPULATE_READ            :     0.030 ms
      Anon 2 MiB     : POPULATE_WRITE           :     0.030 ms
      Memfd 4 KiB    : Read                     :     0.412 ms
      Memfd 4 KiB    : Write                    :     0.372 ms
      Memfd 4 KiB    : Read/Write               :     0.419 ms
      Memfd 4 KiB    : POPULATE_READ            :     0.343 ms
      Memfd 4 KiB    : POPULATE_WRITE           :     0.288 ms
      Memfd 4 KiB    : FALLOCATE                :     0.137 ms
      Memfd 4 KiB    : FALLOCATE+Read           :     0.446 ms
      Memfd 4 KiB    : FALLOCATE+Write          :     0.330 ms
      Memfd 4 KiB    : FALLOCATE+Read/Write     :     0.454 ms
      Memfd 4 KiB    : FALLOCATE+POPULATE_READ  :     0.379 ms
      Memfd 4 KiB    : FALLOCATE+POPULATE_WRITE :     0.268 ms
      Memfd 2 MiB    : Read                     :     0.030 ms
      Memfd 2 MiB    : Write                    :     0.030 ms
      Memfd 2 MiB    : Read/Write               :     0.030 ms
      Memfd 2 MiB    : POPULATE_READ            :     0.030 ms
      Memfd 2 MiB    : POPULATE_WRITE           :     0.030 ms
      Memfd 2 MiB    : FALLOCATE                :     0.030 ms
      Memfd 2 MiB    : FALLOCATE+Read           :     0.031 ms
      Memfd 2 MiB    : FALLOCATE+Write          :     0.031 ms
      Memfd 2 MiB    : FALLOCATE+Read/Write     :     0.031 ms
      Memfd 2 MiB    : FALLOCATE+POPULATE_READ  :     0.030 ms
      Memfd 2 MiB    : FALLOCATE+POPULATE_WRITE :     0.030 ms
      tmpfs          : Read                     :     0.416 ms
      tmpfs          : Write                    :     0.369 ms
      tmpfs          : Read/Write               :     0.425 ms
      tmpfs          : POPULATE_READ            :     0.346 ms
      tmpfs          : POPULATE_WRITE           :     0.295 ms
      tmpfs          : FALLOCATE                :     0.139 ms
      tmpfs          : FALLOCATE+Read           :     0.447 ms
      tmpfs          : FALLOCATE+Write          :     0.333 ms
      tmpfs          : FALLOCATE+Read/Write     :     0.454 ms
      tmpfs          : FALLOCATE+POPULATE_READ  :     0.380 ms
      tmpfs          : FALLOCATE+POPULATE_WRITE :     0.272 ms
      file           : Read                     :     0.191 ms
      file           : Write                    :     0.511 ms
      file           : Read/Write               :     0.524 ms
      file           : POPULATE_READ            :     0.196 ms
      file           : POPULATE_WRITE           :     0.434 ms
      file           : FALLOCATE                :     0.004 ms
      file           : FALLOCATE+Read           :     0.197 ms
      file           : FALLOCATE+Write          :     0.554 ms
      file           : FALLOCATE+Read/Write     :     0.480 ms
      file           : FALLOCATE+POPULATE_READ  :     0.201 ms
      file           : FALLOCATE+POPULATE_WRITE :     0.381 ms
      hugetlbfs      : Read                     :     0.030 ms
      hugetlbfs      : Write                    :     0.030 ms
      hugetlbfs      : Read/Write               :     0.030 ms
      hugetlbfs      : POPULATE_READ            :     0.030 ms
      hugetlbfs      : POPULATE_WRITE           :     0.030 ms
      hugetlbfs      : FALLOCATE                :     0.030 ms
      hugetlbfs      : FALLOCATE+Read           :     0.031 ms
      hugetlbfs      : FALLOCATE+Write          :     0.031 ms
      hugetlbfs      : FALLOCATE+Read/Write     :     0.030 ms
      hugetlbfs      : FALLOCATE+POPULATE_READ  :     0.030 ms
      hugetlbfs      : FALLOCATE+POPULATE_WRITE :     0.030 ms
      **************************************************
      4096 MiB MAP_SHARED:
      **************************************************
      Anon 4 KiB     : Read                     :  1053.090 ms
      Anon 4 KiB     : Write                    :   913.642 ms
      Anon 4 KiB     : Read/Write               :  1060.350 ms
      Anon 4 KiB     : POPULATE_READ            :   893.691 ms
      Anon 4 KiB     : POPULATE_WRITE           :   782.885 ms
      Anon 2 MiB     : Read                     :   358.553 ms
      Anon 2 MiB     : Write                    :   358.419 ms
      Anon 2 MiB     : Read/Write               :   357.992 ms
      Anon 2 MiB     : POPULATE_READ            :   357.533 ms
      Anon 2 MiB     : POPULATE_WRITE           :   357.808 ms
      Memfd 4 KiB    : Read                     :  1078.144 ms
      Memfd 4 KiB    : Write                    :   942.036 ms
      Memfd 4 KiB    : Read/Write               :  1100.391 ms
      Memfd 4 KiB    : POPULATE_READ            :   925.829 ms
      Memfd 4 KiB    : POPULATE_WRITE           :   804.394 ms
      Memfd 4 KiB    : FALLOCATE                :   304.632 ms
      Memfd 4 KiB    : FALLOCATE+Read           :  1163.359 ms
      Memfd 4 KiB    : FALLOCATE+Write          :   933.186 ms
      Memfd 4 KiB    : FALLOCATE+Read/Write     :  1187.304 ms
      Memfd 4 KiB    : FALLOCATE+POPULATE_READ  :  1013.660 ms
      Memfd 4 KiB    : FALLOCATE+POPULATE_WRITE :   794.560 ms
      Memfd 2 MiB    : Read                     :   358.131 ms
      Memfd 2 MiB    : Write                    :   358.099 ms
      Memfd 2 MiB    : Read/Write               :   358.250 ms
      Memfd 2 MiB    : POPULATE_READ            :   357.563 ms
      Memfd 2 MiB    : POPULATE_WRITE           :   357.334 ms
      Memfd 2 MiB    : FALLOCATE                :   356.735 ms
      Memfd 2 MiB    : FALLOCATE+Read           :   358.152 ms
      Memfd 2 MiB    : FALLOCATE+Write          :   358.331 ms
      Memfd 2 MiB    : FALLOCATE+Read/Write     :   358.018 ms
      Memfd 2 MiB    : FALLOCATE+POPULATE_READ  :   357.286 ms
      Memfd 2 MiB    : FALLOCATE+POPULATE_WRITE :   357.523 ms
      tmpfs          : Read                     :  1087.265 ms
      tmpfs          : Write                    :   950.840 ms
      tmpfs          : Read/Write               :  1107.567 ms
      tmpfs          : POPULATE_READ            :   922.605 ms
      tmpfs          : POPULATE_WRITE           :   810.094 ms
      tmpfs          : FALLOCATE                :   306.320 ms
      tmpfs          : FALLOCATE+Read           :  1169.796 ms
      tmpfs          : FALLOCATE+Write          :   933.730 ms
      tmpfs          : FALLOCATE+Read/Write     :  1191.610 ms
      tmpfs          : FALLOCATE+POPULATE_READ  :  1020.474 ms
      tmpfs          : FALLOCATE+POPULATE_WRITE :   798.945 ms
      file           : Read                     :   654.101 ms
      file           : Write                    :  1259.142 ms
      file           : Read/Write               :  1289.509 ms
      file           : POPULATE_READ            :   661.642 ms
      file           : POPULATE_WRITE           :  1106.816 ms
      file           : FALLOCATE                :     1.864 ms
      file           : FALLOCATE+Read           :   656.328 ms
      file           : FALLOCATE+Write          :  1153.300 ms
      file           : FALLOCATE+Read/Write     :  1180.613 ms
      file           : FALLOCATE+POPULATE_READ  :   668.347 ms
      file           : FALLOCATE+POPULATE_WRITE :   996.143 ms
      hugetlbfs      : Read                     :   357.245 ms
      hugetlbfs      : Write                    :   357.413 ms
      hugetlbfs      : Read/Write               :   357.120 ms
      hugetlbfs      : POPULATE_READ            :   356.321 ms
      hugetlbfs      : POPULATE_WRITE           :   356.693 ms
      hugetlbfs      : FALLOCATE                :   355.927 ms
      hugetlbfs      : FALLOCATE+Read           :   357.074 ms
      hugetlbfs      : FALLOCATE+Write          :   357.120 ms
      hugetlbfs      : FALLOCATE+Read/Write     :   356.983 ms
      hugetlbfs      : FALLOCATE+POPULATE_READ  :   356.413 ms
      hugetlbfs      : FALLOCATE+POPULATE_WRITE :   356.266 ms
      **************************************************
      
      [1] https://lkml.org/lkml/2013/6/27/698
      
      [akpm@linux-foundation.org: coding style fixes]
      
      Link: https://lkml.kernel.org/r/20210419135443.12822-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ca9b385
    • David Hildenbrand's avatar
      mm: make variable names for populate_vma_page_range() consistent · a78f1ccd
      David Hildenbrand authored
      Patch series "mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables", v2.
      
      Excessive details on MADV_POPULATE_(READ|WRITE) can be found in patch #2.
      
      This patch (of 5):
      
      Let's make the variable names in the function declaration match the
      variable names used in the definition.
      
      Link: https://lkml.kernel.org/r/20210419135443.12822-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20210419135443.12822-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a78f1ccd
    • Kefeng Wang's avatar
      mm: generalize ZONE_[DMA|DMA32] · 63703f37
      Kefeng Wang authored
      ZONE_[DMA|DMA32] configs have duplicate definitions on platforms that
      subscribe to them.  Instead, just make them generic options which can be
      selected on applicable platforms.
      
      Also only x86/arm64 architectures could enable both ZONE_DMA and
      ZONE_DMA32 if EXPERT, add ARCH_HAS_ZONE_DMA_SET to make dma zone
      configurable and visible on the two architectures.
      
      Link: https://lkml.kernel.org/r/20210528074557.17768-1-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>	[RISC-V]
      Acked-by: Michal Simek <michal.simek@xilinx.com>	[microblaze]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <linux@armlinux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63703f37
    • Liam Howlett's avatar
      mm/nommu: unexport do_munmap() · db1d9152
      Liam Howlett authored
      do_munmap() does not take the mmap_write_lock().  vm_munmap() should be
      used instead.
      
      Link: https://lkml.kernel.org/r/20210604194002.648037-1-Liam.Howlett@Oracle.comSigned-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db1d9152
    • Chen Li's avatar
      nommu: remove __GFP_HIGHMEM in vmalloc/vzalloc · 176056fd
      Chen Li authored
      mm/nommu.c:
      void *__vmalloc(unsigned long size, gfp_t gfp_mask)
      {
      	/*
      	 *  You can't specify __GFP_HIGHMEM with kmalloc() since kmalloc()
      	 * returns only a logical address.
      	 */
      	return kmalloc(size, (gfp_mask | __GFP_COMP) & ~__GFP_HIGHMEM);
      }
      
      nommu's __vmalloc just uses kmalloc internally and elimitates
      __GFP_HIGHMEM, so it makes no sense to add __GFP_HIGHMEM for nommu's
      vmalloc/vzalloc.
      
      [akpm@linux-foundation.org: coding style fixes]
      
      Link: https://lkml.kernel.org/r/875z00rnp8.wl-chenli@uniontech.comSigned-off-by: default avatarChen Li <chenli@uniontech.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      176056fd
    • Matthew Wilcox (Oracle)'s avatar
      mm/thp: fix strncpy warning · 1212e00c
      Matthew Wilcox (Oracle) authored
      Using MAX_INPUT_BUF_SZ as the maximum length of the string makes fortify
      complain as it thinks the string might be longer than the buffer, and if
      it is, we will end up with a "string" that is missing a NUL terminator.
      It's trivial to show that 'tok' points to a NUL-terminated string which is
      less than MAX_INPUT_BUF_SZ in length, so we may as well just use strcpy()
      and avoid the warning.
      
      Link: https://lkml.kernel.org/r/20210615200242.1716568-4-willy@infradead.orgSigned-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1212e00c
    • Hugh Dickins's avatar
      mm: hwpoison_user_mappings() try_to_unmap() with TTU_SYNC · 36af6737
      Hugh Dickins authored
      TTU_SYNC prevents an unlikely race, when try_to_unmap() returns shortly
      before the page is accounted as unmapped.  It is unlikely to coincide with
      hwpoisoning, but now that we have the flag, hwpoison_user_mappings() would
      do well to use it.
      
      Link: https://lkml.kernel.org/r/329c28ed-95df-9a2c-8893-b444d8a6d340@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jue Wang <juew@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      36af6737
    • Hugh Dickins's avatar
      mm/thp: remap_page() is only needed on anonymous THP · ab02c252
      Hugh Dickins authored
      THP splitting's unmap_page() only sets TTU_SPLIT_FREEZE when PageAnon, and
      migration entries are only inserted when TTU_MIGRATION (unused here) or
      TTU_SPLIT_FREEZE is set: so it's just a waste of time for remap_page() to
      search for migration entries to remove when !PageAnon.
      
      Link: https://lkml.kernel.org/r/f987bc44-f28e-688d-2424-b4722153ed8@google.com
      Fixes: baa355fd ("thp: file pages support for split_huge_page()")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jue Wang <juew@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab02c252