1. 19 Jun, 2023 40 commits
    • Catalin Marinas's avatar
      drivers/base: use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN · be6a5b5e
      Catalin Marinas authored
      ARCH_DMA_MINALIGN represents the minimum (static) alignment for safe DMA
      operations while ARCH_KMALLOC_MINALIGN is the minimum kmalloc() objects
      alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-6-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      be6a5b5e
    • Catalin Marinas's avatar
      mm/slab: limit kmalloc() minimum alignment to dma_get_cache_alignment() · 963e84b0
      Catalin Marinas authored
      Do not create kmalloc() caches which are not aligned to
      dma_get_cache_alignment().  There is no functional change since for
      current architectures defining ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN
      equals ARCH_DMA_MINALIGN (and dma_get_cache_alignment()).  On
      architectures without a specific ARCH_DMA_MINALIGN,
      dma_get_cache_alignment() is 1, so no change to the kmalloc() caches.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-5-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      963e84b0
    • Catalin Marinas's avatar
      mm/slab: simplify create_kmalloc_cache() args and make it static · 0c474d31
      Catalin Marinas authored
      In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead
      of initialising the kmalloc_caches array directly.  With this,
      create_kmalloc_cache() is now only called from new_kmalloc_cache() in the
      same file, so make it static.  In addition, the useroffset argument is
      always 0 while usersize is the same as size.  Remove them.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0c474d31
    • Catalin Marinas's avatar
      dma: allow dma_get_cache_alignment() to be overridden by the arch code · 8c57da28
      Catalin Marinas authored
      On arm64, ARCH_DMA_MINALIGN is larger than most cache line size
      configurations deployed.  Allow an architecture to override
      dma_get_cache_alignment() in order to return a run-time probed value (e.g.
      cache_line_size()).
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-3-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8c57da28
    • Catalin Marinas's avatar
      mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN · 4ab5f8ec
      Catalin Marinas authored
      Patch series "mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8", v7.
      
      A series reducing the kmalloc() minimum alignment on arm64 to 8 (from
      128).  
      
      
      This patch (of 17):
      
      In preparation for supporting a kmalloc() minimum alignment smaller than
      the arch DMA alignment, decouple the two definitions.  This requires that
      either the kmalloc() caches are aligned to a (run-time) cache-line size or
      the DMA API bounces unaligned kmalloc() allocations.  Subsequent patches
      will implement both options.
      
      After this patch, ARCH_DMA_MINALIGN is expected to be used in static
      alignment annotations and defined by an architecture to be the maximum
      alignment for all supported configurations/SoCs in a single Image. 
      Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to
      define its value in the arch headers.
      
      Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in
      dma_get_cache_alignment() so that there is no change for architectures not
      requiring a minimum DMA alignment.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-1-catalin.marinas@arm.com
      Link: https://lkml.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4ab5f8ec
    • Peter Xu's avatar
      mm/hugetlb: fix pgtable lock on pmd sharing · 349d1670
      Peter Xu authored
      Huge pmd sharing operates on PUD not PMD, huge_pte_lock() is not suitable
      in this case because it should only work for last level pte changes, while
      pmd sharing is always one level higher.
      
      Meanwhile, here we're locking over the spte pgtable lock which is even not
      a lock for current mm but someone else's.
      
      It seems even racy on operating on the lock, as after put_page() of the
      spte pgtable page logically the page can be released, so at least the
      spin_unlock() needs to be done after the put_page().
      
      No report I am aware, I'm not even sure whether it'll just work on taking
      the spte pmd lock, because while we're holding i_mmap read lock it probably
      means the vma interval tree is frozen, all pte allocators over this pud
      entry could always find the specific svma and spte page, so maybe they'll
      serialize on this spte page lock?  Even so, doesn't seem to be expected.
      It just seems to be an accident of cb900f41.
      
      Fix it with the proper pud lock (which is the mm's page_table_lock).
      
      Link: https://lkml.kernel.org/r/20230612160420.809818-1-peterx@redhat.com
      Fixes: cb900f41 ("mm, hugetlb: convert hugetlbfs to use split pmd lock")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      349d1670
    • Sidhartha Kumar's avatar
      mm: remove set_compound_page_dtor() · b95826c9
      Sidhartha Kumar authored
      All users can use the folio equivalent so this function can be safely
      removed.
      
      Link: https://lkml.kernel.org/r/20230612163405.99345-1-sidhartha.kumar@oracle.comSigned-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Tarun Sahu <tsahu@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b95826c9
    • Hugh Dickins's avatar
      perf/core: allow pte_offset_map() to fail · a92cbb82
      Hugh Dickins authored
      In rare transient cases, not yet made possible, pte_offset_map() and
      pte_offet_map_lock() may not find a page table: handle appropriately.
      
      [hughd@google.com: __wp_page_copy_user(): don't call update_mmu_tlb() with NULL]
        Link: https://lkml.kernel.org/r/1a4db221-7872-3594-57ce-42369945ec8d@google.com
      Link: https://lkml.kernel.org/r/a194441b-63f3-adb6-5964-7ca3171ae7c2@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a92cbb82
    • Hugh Dickins's avatar
      mm/swap: swap_vma_readahead() do the pte_offset_map() · 4f8fcf4c
      Hugh Dickins authored
      swap_vma_readahead() has been proceeding in an unconventional way, its
      preliminary swap_ra_info() doing the pte_offset_map() and pte_unmap(),
      then relying on that pte pointer even after the pte_unmap() - in its
      CONFIG_64BIT case (I think !CONFIG_HIGHPTE was intended; whereas 32-bit
      copied ptes to stack while they were mapped, but had to limit how many).
      
      Though it would be difficult to construct a failing testcase, accessing
      page table after pte_unmap() will become bad practice, even on 64-bit: an
      rcu_read_unlock() in pte_unmap() will allow page table to be freed.
      
      Move relevant definitions from include/linux/swap.h to mm/swap_state.c,
      nothing else used them.  Delete the CONFIG_64BIT distinction and buffer,
      delete all reference to ptes from swap_ra_info(), use pte_offset_map()
      repeatedly in swap_vma_readahead(), breaking from the loop if it fails.
      
      (Will the repeated "map" and "unmap" show up as a slowdown anywhere?  If
      so, maybe modify __read_swap_cache_async() to do the pte_unmap() only when
      it does not find the page already in the swapcache.)
      
      Use ptep_get_lockless(), mainly for its READ_ONCE().  Correctly advance
      the address passed down to each call of __read__swap_cache_async().
      
      Link: https://lkml.kernel.org/r/b7c64ab3-9e44-aac0-d2b-c57de578af1c@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4f8fcf4c
    • Hugh Dickins's avatar
      mm/pgtable: delete pmd_trans_unstable() and friends · feda5c39
      Hugh Dickins authored
      Delete pmd_trans_unstable, pmd_none_or_trans_huge_or_clear_bad() and
      pmd_devmap_trans_unstable(), all now unused.
      
      With mixed feelings, delete all the comments on pmd_trans_unstable(). 
      That was very good documentation of a subtle state, and this series does
      not even eliminate that state: but rather, normalizes and extends it,
      asking pte_offset_map[_lock]() callers to anticipate failure, without
      regard for whether mmap_read_lock() or mmap_write_lock() is held.
      
      Retain pud_trans_unstable(), which has one use in __handle_mm_fault(), but
      delete its equivalent pud_none_or_trans_huge_or_dev_or_clear_bad().  While
      there, move the default arch_needs_pgtable_deposit() definition up near
      where pgtable_trans_huge_deposit() and withdraw() are declared.
      
      Link: https://lkml.kernel.org/r/5abdab3-3136-b42e-274d-9c6281bfb79@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      feda5c39
    • Hugh Dickins's avatar
      mm/memory: handle_pte_fault() use pte_offset_map_nolock() · c7ad0880
      Hugh Dickins authored
      handle_pte_fault() use pte_offset_map_nolock() to get the vmf.ptl which
      corresponds to vmf.pte, instead of pte_lockptr() being used later, when
      there's a chance that the pmd entry might have changed, perhaps to none,
      or to a huge pmd, with no split ptlock in its struct page.
      
      Remove its pmd_devmap_trans_unstable() call: pte_offset_map_nolock() will
      handle that case by failing.  Update the "morph" comment above, looking
      forward to when shmem or file collapse to THP may not take mmap_lock for
      write (or not at all).
      
      do_numa_page() use the vmf->ptl from handle_pte_fault() at first, but
      refresh it when refreshing vmf->pte.
      
      do_swap_page()'s pte_unmap_same() (the thing that takes ptl to verify a
      two-part PAE orig_pte) use the vmf->ptl from handle_pte_fault() too; but
      do_swap_page() is also used by anon THP's __collapse_huge_page_swapin(),
      so adjust that to set vmf->ptl by pte_offset_map_nolock().
      
      Link: https://lkml.kernel.org/r/c1107654-3929-60ac-223e-6877cbb86065@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c7ad0880
    • Hugh Dickins's avatar
      mm/memory: allow pte_offset_map[_lock]() to fail · 3db82b93
      Hugh Dickins authored
      copy_pte_range(): use pte_offset_map_nolock(), and allow for it to fail;
      but with a comment on some further assumptions that are being made there.
      
      zap_pte_range() and zap_pmd_range(): adjust their interaction so that a
      pte_offset_map_lock() failure in zap_pte_range() leads to a retry in
      zap_pmd_range(); remove call to pmd_none_or_trans_huge_or_clear_bad().
      
      Allow pte_offset_map_lock() to fail in many functions.  Update comment on
      calling pte_alloc() in do_anonymous_page().  Remove redundant calls to
      pmd_trans_unstable(), pmd_devmap_trans_unstable(), pmd_none() and
      pmd_bad(); but leave pmd_none_or_clear_bad() calls in free_pmd_range() and
      copy_pmd_range(), those do simplify the next level down.
      
      Link: https://lkml.kernel.org/r/bb548d50-e99a-f29e-eab1-a43bef2a1287@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3db82b93
    • Hugh Dickins's avatar
      mm/khugepaged: allow pte_offset_map[_lock]() to fail · 895f5ee4
      Hugh Dickins authored
      __collapse_huge_page_swapin(): don't drop the map after every pte, it only
      has to be dropped by do_swap_page(); give up if pte_offset_map() fails;
      trace_mm_collapse_huge_page_swapin() at the end, with result; fix comment
      on returned result; fix vmf.pgoff, though it's not used.
      
      collapse_huge_page(): use pte_offset_map_lock() on the _pmd returned from
      clearing; allow failure, but it should be impossible there. 
      hpage_collapse_scan_pmd() and collapse_pte_mapped_thp() allow for
      pte_offset_map_lock() failure.
      
      Link: https://lkml.kernel.org/r/6513e85-d798-34ec-3762-7c24ffb9329@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      895f5ee4
    • Hugh Dickins's avatar
      mm/huge_memory: split huge pmd under one pte_offset_map() · c9c1ee20
      Hugh Dickins authored
      __split_huge_zero_page_pmd() use a single pte_offset_map() to sweep the
      extent: it's already under pmd_lock(), so this is no worse for latency;
      and since it's supposed to have full control of the just-withdrawn page
      table, here choose to VM_BUG_ON if it were to fail.  And please don't
      increment haddr by PAGE_SIZE, that should remain huge aligned: declare a
      separate addr (not a bugfix, but it was deceptive).
      
      __split_huge_pmd_locked() likewise (but it had declared a separate addr);
      and change its BUG_ON(!pte_none) to VM_BUG_ON, for consistency with zero
      (those deposited page tables are sometimes victims of random corruption).
      
      Link: https://lkml.kernel.org/r/90cbed7f-90d9-b779-4a46-d2485baf9595@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c9c1ee20
    • Hugh Dickins's avatar
      mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() · 2378118b
      Hugh Dickins authored
      There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to
      distinguish huge_zero_page from a normal THP: follow_page_pte() handles
      any instability, and here it's a good idea to replace any pmd_none(*pmd)
      by a page table a.s.a.p, in the huge_zero_page case as for a normal THP;
      and this removes an unnecessary possibility of -EBUSY failure.
      
      (Hmm, couldn't the normal THP case have hit an unstably refaulted THP
      before?  But there are only two, exceptional, users of FOLL_SPLIT_PMD.)
      
      Link: https://lkml.kernel.org/r/59fd15dd-4d39-5ec-2043-1d5117f7f85@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2378118b
    • Hugh Dickins's avatar
      mm/migrate_device: allow pte_offset_map_lock() to fail · 4b56069c
      Hugh Dickins authored
      migrate_vma_collect_pmd(): remove the pmd_trans_unstable() handling after
      splitting huge zero pmd, and the pmd_none() handling after successfully
      splitting huge page: those are now managed inside pte_offset_map_lock(),
      and by "goto again" when it fails.
      
      But the skip after unsuccessful split_huge_page() must stay: it avoids an
      endless loop.  The skip when pmd_bad()?  Remove that: it will be treated
      as a hole rather than a skip once cleared by pte_offset_map_lock(), but
      with different timing that would be so anyway; and it's arguably best to
      leave the pmd_bad() handling centralized there.
      
      migrate_vma_insert_page(): remove comment on the old pte_offset_map() and
      old locking limitations; remove the pmd_trans_unstable() check and just
      proceed to pte_offset_map_lock(), aborting when it fails (page has been
      charged to memcg, but as in other cases, it's uncharged when freed).
      
      Link: https://lkml.kernel.org/r/1131be62-2e84-da2f-8f45-807b2cbeeec5@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4b56069c
    • Hugh Dickins's avatar
      mm/mglru: allow pte_offset_map_nolock() to fail · 52fc0483
      Hugh Dickins authored
      MGLRU's walk_pte_range() use the safer pte_offset_map_nolock(), rather
      than pte_lockptr(), to get the ptl for its trylock.  Just return false and
      move on to next extent if it fails, like when the trylock fails.  Remove
      the VM_WARN_ON_ONCE(pmd_leaf) since that will happen, rarely.
      
      Link: https://lkml.kernel.org/r/51ece73e-7398-2e4a-2384-56708c87844f@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      52fc0483
    • Hugh Dickins's avatar
      mm/swapoff: allow pte_offset_map[_lock]() to fail · d850fa72
      Hugh Dickins authored
      Adjust unuse_pte() and unuse_pte_range() to allow pte_offset_map_lock()
      and pte_offset_map() failure; remove pmd_none_or_trans_huge_or_clear_bad()
      from unuse_pmd_range() now that pte_offset_map() does all that itself.
      
      Link: https://lkml.kernel.org/r/c4d831-13c3-9dfd-70c2-64514ad951fd@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d850fa72
    • Hugh Dickins's avatar
      mm/madvise: clean up force_shm_swapin_readahead() · 179d3e4f
      Hugh Dickins authored
      Some nearby MADV_WILLNEED cleanup unrelated to pte_offset_map_lock(). 
      shmem_swapin_range() is a better name than force_shm_swapin_readahead(). 
      Fix unimportant off-by-one on end_index.  Call the swp_entry_t "entry"
      rather than "swap": either is okay, but entry is the name used elsewhere
      in mm/madvise.c.  Do not assume GFP_HIGHUSER_MOVABLE: that's right for
      anon swap, but shmem should take gfp from mapping.  Pass the actual vma
      and address to read_swap_cache_async(), in case a NUMA mempolicy applies. 
      lru_add_drain() at outer level, like madvise_willneed()'s other branch.
      
      Link: https://lkml.kernel.org/r/67e18875-ffb3-ec27-346-f350e07bed87@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      179d3e4f
    • Hugh Dickins's avatar
      mm/madvise: clean up pte_offset_map_lock() scans · f3cd4ab0
      Hugh Dickins authored
      Came here to make madvise's several pte_offset_map_lock() scans advance to
      next extent on failure, and remove superfluous pmd_trans_unstable() and
      pmd_none_or_trans_huge_or_clear_bad() calls.  But also did some nearby
      cleanup.
      
      swapin_walk_pmd_entry(): don't name an address "index"; don't drop the
      lock after every pte, only when calling out to read_swap_cache_async().
      
      madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(): prefer
      "start_pte" for pointer, orig_pte usually denotes a saved pte value; leave
      lazy MMU mode before unlocking; merge the success and failure paths after
      split_folio().
      
      Link: https://lkml.kernel.org/r/cc4d9a88-9da6-362-50d9-6735c2b125c6@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f3cd4ab0
    • Hugh Dickins's avatar
      mm/mremap: retry if either pte_offset_map_*lock() fails · a5be621e
      Hugh Dickins authored
      move_ptes() return -EAGAIN if pte_offset_map_lock() of old fails, or if
      pte_offset_map_nolock() of new fails: move_page_tables() retry if so.
      
      But that does need a pmd_none() check inside, to stop endless loop when
      huge shmem is truncated (thank you to syzbot); and move_huge_pmd() must
      tolerate that a page table might have been allocated there just before (of
      course it would be more satisfying to remove the empty page table, but
      this is not a path worth optimizing).
      
      Link: https://lkml.kernel.org/r/65e5e84a-f04-947-23f2-b97d3462e1e@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a5be621e
    • Hugh Dickins's avatar
      mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge() · 670ddd8c
      Hugh Dickins authored
      change_pmd_range() had special pmd_none_or_clear_bad_unless_trans_huge(),
      required to avoid "bad" choices when setting automatic NUMA hinting under
      mmap_read_lock(); but most of that is already covered in pte_offset_map()
      now.  change_pmd_range() just wants a pmd_none() check before wasting time
      on MMU notifiers, then checks on the read-once _pmd value to work out
      what's needed for huge cases.  If change_pte_range() returns -EAGAIN to
      retry if pte_offset_map_lock() fails, nothing more special is needed.
      
      Link: https://lkml.kernel.org/r/725a42a9-91e9-c868-925-e3a5fd40bb4f@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      670ddd8c
    • Hugh Dickins's avatar
      mm/various: give up if pte_offset_map[_lock]() fails · 04dee9e8
      Hugh Dickins authored
      Following the examples of nearby code, various functions can just give up
      if pte_offset_map() or pte_offset_map_lock() fails.  And there's no need
      for a preliminary pmd_trans_unstable() or other such check, since such
      cases are now safely handled inside.
      
      Link: https://lkml.kernel.org/r/7b9bd85d-1652-cbf2-159d-f503b45e5b@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      04dee9e8
    • Hugh Dickins's avatar
      mm/debug_vm_pgtable,page_table_check: warn pte map fails · 9f2bad09
      Hugh Dickins authored
      Failures here would be surprising: pte_advanced_tests() and
      pte_clear_tests() and __page_table_check_pte_clear_range() each issue a
      warning if pte_offset_map() or pte_offset_map_lock() fails.
      
      Link: https://lkml.kernel.org/r/3ea9e4f-e5cf-d7d9-4c2-291b3c5a3636@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9f2bad09
    • Hugh Dickins's avatar
      mm/userfaultfd: allow pte_offset_map_lock() to fail · 3622d3cd
      Hugh Dickins authored
      mfill_atomic_install_pte() and mfill_atomic_pte_zeropage() treat failed
      pte_offset_map_lock() as -EAGAIN, which mfill_atomic() already returns to
      user for a similar race.
      
      Link: https://lkml.kernel.org/r/50cf3930-1bfa-4de9-a079-3da47b7ce17b@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3622d3cd
    • Hugh Dickins's avatar
      mm/userfaultfd: retry if pte_offset_map() fails · 2b683a4f
      Hugh Dickins authored
      Instead of worrying whether the pmd is stable, userfaultfd_must_wait()
      call pte_offset_map() as before, but go back to try again if that fails.
      
      Risk of endless loop?  It already broke out if pmd_none(), !pmd_present()
      or pmd_trans_huge(), and pte_offset_map() would have cleared pmd_bad():
      which leaves pmd_devmap().  Presumably pmd_devmap() is inappropriate in a
      vma subject to userfaultfd (it would have been mistreated before), but add
      a check just to avoid all possibility of endless loop there.
      
      Link: https://lkml.kernel.org/r/54423f-3dff-fd8d-614a-632727cc4cfb@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2b683a4f
    • Hugh Dickins's avatar
      mm/hmm: retry if pte_offset_map() fails · 6ec1905f
      Hugh Dickins authored
      hmm_vma_walk_pmd() is called through mm_walk, but already has a goto again
      loop of its own, so take part in that if pte_offset_map() fails.
      
      Link: https://lkml.kernel.org/r/d6c6dd68-25d4-653b-f94b-a45c53ee04b@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6ec1905f
    • Hugh Dickins's avatar
      mm/vmalloc: vmalloc_to_page() use pte_offset_kernel() · 0d1c81ed
      Hugh Dickins authored
      vmalloc_to_page() was using pte_offset_map() (followed by pte_unmap()),
      but it's intended for userspace page tables: prefer pte_offset_kernel().
      
      Link: https://lkml.kernel.org/r/696386a-84f8-b33c-82e5-f865ed6eb39@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0d1c81ed
    • Hugh Dickins's avatar
      mm/vmwgfx: simplify pmd & pud mapping dirty helpers · e5ad581c
      Hugh Dickins authored
      wp_clean_pmd_entry() need not check pmd_trans_unstable() or pmd_none(),
      wp_clean_pud_entry() need not check pud_trans_unstable() or pud_none():
      it's just the ACTION_CONTINUE when trans_huge or devmap that's needed to
      prevent splitting, and we're hoping to remove pmd_trans_unstable().  Is
      that PUD #ifdef necessary?  Maybe some configs are missing a stub.
      
      Link: https://lkml.kernel.org/r/d3379c7-65db-26d3-1764-8e866490925f@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e5ad581c
    • Hugh Dickins's avatar
      mm/pagewalk: walk_pte_range() allow for pte_offset_map() · be872f83
      Hugh Dickins authored
      walk_pte_range() has a no_vma option to serve walk_page_range_novma().  I
      don't know of any problem, but it looks safer to check for init_mm, and
      use pte_offset_kernel() rather than pte_offset_map() in that case:
      pte_offset_map()'s pmdval validation is intended for userspace.
      
      Allow for its pte_offset_map() or pte_offset_map_lock() to fail, and retry
      with ACTION_AGAIN if so.  Add a second check for ACTION_AGAIN in
      walk_pmd_range(), to catch it after return from walk_pte_range().
      
      Remove the pmd_trans_unstable() check after split_huge_pmd() in
      walk_pmd_range(): walk_pte_range() now handles those cases safely (and
      they must fail powerpc's is_hugepd() check).
      
      Link: https://lkml.kernel.org/r/3eba6f0-2b-fb66-6bb6-2ee8533e221@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      be872f83
    • Hugh Dickins's avatar
      mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails · 7780d040
      Hugh Dickins authored
      Simple walk_page_range() users should set ACTION_AGAIN to retry when
      pte_offset_map_lock() fails.
      
      No need to check pmd_trans_unstable(): that was precisely to avoid the
      possiblity of calling pte_offset_map() on a racily removed or inserted THP
      entry, but such cases are now safely handled inside it.  Likewise there is
      no need to check pmd_none() or pmd_bad() before calling it.
      
      Link: https://lkml.kernel.org/r/c77d9d10-3aad-e3ce-4896-99e91c7947f3@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: SeongJae Park <sj@kernel.org> for mm/damon part
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7780d040
    • Hugh Dickins's avatar
      mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr() · 2798bbe7
      Hugh Dickins authored
      map_pte() use pte_offset_map_nolock(), to make sure of the ptl belonging
      to pte, even if pmd entry is then changed racily: page_vma_mapped_walk()
      use that instead of getting pte_lockptr() later, or restart if map_pte()
      found no page table.
      
      Link: https://lkml.kernel.org/r/cba186e0-5ed7-e81b-6cd-dade4c33c248@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2798bbe7
    • Hugh Dickins's avatar
      mm/page_vma_mapped: reformat map_pte() with less indentation · 90f43b0a
      Hugh Dickins authored
      No functional change here, but adjust the format of map_pte() so that the
      following commit will be easier to read: separate out the PVMW_SYNC case
      first, and remove two levels of indentation from the ZONE_DEVICE case.
      
      Link: https://lkml.kernel.org/r/bf723f59-e3fc-6839-1cc3-c0631ee248bc@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      90f43b0a
    • Hugh Dickins's avatar
      mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk() · 45fe85e9
      Hugh Dickins authored
      Revert commit a7a69d8b ("mm/thp: another PVMW_SYNC fix in
      page_vma_mapped_walk()"): I was proud of that "Aha!" commit at the time,
      but in revisiting page_vma_mapped_walk() for pte_offset_map() failure,
      that block raised a doubt: and it now seems utterly bogus.  The prior
      map_pte() has taken ptl unconditionally when PVMW_SYNC: I must have
      forgotten that when making the change.  It did no harm, but could not have
      fixed a BUG or WARN, and is hard to reconcile with coming changes.
      
      Link: https://lkml.kernel.org/r/87475a22-e59e-2d8b-d78a-df376d314bd@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      45fe85e9
    • Hugh Dickins's avatar
      mm/filemap: allow pte_offset_map_lock() to fail · 65747aaf
      Hugh Dickins authored
      filemap_map_pages() allow pte_offset_map_lock() to fail; and remove the
      pmd_devmap_trans_unstable() check from filemap_map_pmd(), which can safely
      return to filemap_map_pages() and let pte_offset_map_lock() discover that.
      
      Link: https://lkml.kernel.org/r/54607cf4-ddb6-7ef3-043-1d2de1a9a71@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65747aaf
    • Hugh Dickins's avatar
      mm/pgtable: allow pte_offset_map[_lock]() to fail · 0d940a9b
      Hugh Dickins authored
      Make pte_offset_map() a wrapper for __pte_offset_map() (optionally outputs
      pmdval), pte_offset_map_lock() a sparse __cond_lock wrapper for
      __pte_offset_map_lock(): those __funcs added in mm/pgtable-generic.c.
      
      __pte_offset_map() do pmdval validation (including pmd_clear_bad() when
      pmd_bad()), returning NULL if pmdval is not for a page table. 
      __pte_offset_map_lock() verify pmdval unchanged after getting the lock,
      trying again if it changed.
      
      No #ifdef CONFIG_TRANSPARENT_HUGEPAGE around them: that could be done to
      cover the imminent case, but we expect to generalize it later, and it
      makes a mess of where to do the pmd_bad() clearing.
      
      Add pte_offset_map_nolock(): outputs ptl like pte_offset_map_lock(),
      without actually taking the lock.  This will be preferred to open uses of
      pte_lockptr(), because (when split ptlock is in page table's struct page)
      it points to the right lock for the returned pte pointer, even if *pmd
      gets changed racily afterwards.
      
      Update corresponding Documentation.
      
      Do not add the anticipated rcu_read_lock() and rcu_read_unlock()s yet:
      they have to wait until all architectures are balancing pte_offset_map()s
      with pte_unmap()s (as in the arch series posted earlier).  But comment
      where they will go, so that it's easy to add them for experiments.  And
      only when those are in place can transient racy failure cases be enabled. 
      Add more safety for the PAE mismatched pmd_low pmd_high case at that time.
      
      Link: https://lkml.kernel.org/r/2929bfd-9893-a374-e463-4c3127ff9b9d@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0d940a9b
    • Hugh Dickins's avatar
      mm/pgtable: kmap_local_page() instead of kmap_atomic() · 46c475bd
      Hugh Dickins authored
      pte_offset_map() was still using kmap_atomic(): update it to the preferred
      kmap_local_page() before making further changes there, in case we need
      this as a bisection point; but I doubt it can cause any trouble.
      
      Link: https://lkml.kernel.org/r/d74dc4b3-6a76-446f-8f5-52ae271fa07d@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      46c475bd
    • Hugh Dickins's avatar
      mm/migrate: remove cruft from migration_entry_wait()s · 0cb8fd4d
      Hugh Dickins authored
      migration_entry_wait_on_locked() does not need to take a mapped pte
      pointer, its callers can do the unmap first.  Annotate it with
      __releases(ptl) to reduce sparse warnings.
      
      Fold __migration_entry_wait_huge() into migration_entry_wait_huge().  Fold
      __migration_entry_wait() into migration_entry_wait(), preferring the
      tighter pte_offset_map_lock() to pte_offset_map() and pte_lockptr().
      
      Link: https://lkml.kernel.org/r/b0e2a532-cdf2-561b-e999-f3b13b8d6d3@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0cb8fd4d
    • Hugh Dickins's avatar
      mm: use pmdp_get_lockless() without surplus barrier() · 26e1a0c3
      Hugh Dickins authored
      Patch series "mm: allow pte_offset_map[_lock]() to fail", v2.
      
      What is it all about?  Some mmap_lock avoidance i.e.  latency reduction. 
      Initially just for the case of collapsing shmem or file pages to THPs; but
      likely to be relied upon later in other contexts e.g.  freeing of empty
      page tables (but that's not work I'm doing).  mmap_write_lock avoidance
      when collapsing to anon THPs?  Perhaps, but again that's not work I've
      done: a quick attempt was not as easy as the shmem/file case.
      
      I would much prefer not to have to make these small but wide-ranging
      changes for such a niche case; but failed to find another way, and have
      heard that shmem MADV_COLLAPSE's usefulness is being limited by that
      mmap_write_lock it currently requires.
      
      These changes (though of course not these exact patches) have been in
      Google's data centre kernel for three years now: we do rely upon them.
      
      What is this preparatory series about?
      
      The current mmap locking will not be enough to guard against that tricky
      transition between pmd entry pointing to page table, and empty pmd entry,
      and pmd entry pointing to huge page: pte_offset_map() will have to
      validate the pmd entry for itself, returning NULL if no page table is
      there.  What to do about that varies: sometimes nearby error handling
      indicates just to skip it; but in many cases an ACTION_AGAIN or "goto
      again" is appropriate (and if that risks an infinite loop, then there must
      have been an oops, or pfn 0 mistaken for page table, before).
      
      Given the likely extension to freeing empty page tables, I have not
      limited this set of changes to a THP config; and it has been easier, and
      sets a better example, if each site is given appropriate handling: even
      where deeper study might prove that failure could only happen if the pmd
      table were corrupted.
      
      Several of the patches are, or include, cleanup on the way; and by the
      end, pmd_trans_unstable() and suchlike are deleted: pte_offset_map() and
      pte_offset_map_lock() then handle those original races and more.  Most
      uses of pte_lockptr() are deprecated, with pte_offset_map_nolock() taking
      its place.
      
      
      This patch (of 32):
      
      Use pmdp_get_lockless() in preference to READ_ONCE(*pmdp), to get a more
      reliable result with PAE (or READ_ONCE as before without PAE); and remove
      the unnecessary extra barrier()s which got left behind in its callers.
      
      HOWEVER: Note the small print in linux/pgtable.h, where it was designed
      specifically for fast GUP, and depends on interrupts being disabled for
      its full guarantee: most callers which have been added (here and before)
      do NOT have interrupts disabled, so there is still some need for caution.
      
      Link: https://lkml.kernel.org/r/f35279a9-9ac0-de22-d245-591afbfb4dc@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      26e1a0c3
    • Anders Roxell's avatar
      selftests: damon: add config file · 7b1798ec
      Anders Roxell authored
      Building and running the subsuite 'damon' of kselftest, shows the
      following issues:
       selftests: damon: debugfs_attrs.sh
        /sys/kernel/debug/damon not found
      
      By creating a config file enabling DAMON fragments in the
      selftests/damon/ directory the tests pass.
      
      Link: https://lkml.kernel.org/r/20230412092854.3306197-1-anders.roxell@linaro.org
      Fixes: b348eb7a ("mm/damon: add user space selftests")
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Acked-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7b1798ec