1. 04 Sep, 2024 40 commits
    • Danilo Krummrich's avatar
      mm: krealloc: consider spare memory for __GFP_ZERO · 1a83a716
      Danilo Krummrich authored
      As long as krealloc() is called with __GFP_ZERO consistently, starting
      with the initial memory allocation, __GFP_ZERO should be fully honored.
      
      However, if for an existing allocation krealloc() is called with a
      decreased size, it is not ensured that the spare portion the allocation is
      zeroed.  Thus, if krealloc() is subsequently called with a larger size
      again, __GFP_ZERO can't be fully honored, since we don't know the previous
      size, but only the bucket size.
      
      Example:
      
      	buf = kzalloc(64, GFP_KERNEL);
      	memset(buf, 0xff, 64);
      
      	buf = krealloc(buf, 48, GFP_KERNEL | __GFP_ZERO);
      
      	/* After this call the last 16 bytes are still 0xff. */
      	buf = krealloc(buf, 64, GFP_KERNEL | __GFP_ZERO);
      
      Fix this, by explicitly setting spare memory to zero, when shrinking an
      allocation with __GFP_ZERO flag set or init_on_alloc enabled.
      
      Link: https://lkml.kernel.org/r/20240812223707.32049-1-dakr@kernel.orgSigned-off-by: default avatarDanilo Krummrich <dakr@kernel.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1a83a716
    • David Hildenbrand's avatar
      mm/rmap: use folio->_mapcount for small folios · d0b003ce
      David Hildenbrand authored
      We have some cases left whereby we operate on small folios and still refer
      to page->_mapcount.  Let's just use folio->_mapcount instead, which
      currently still overlays page->_mapcount, so no change.
      
      This change will make it easier to later spot any remaining users of
      page->_mapcount that target tail pages.
      
      Link: https://lkml.kernel.org/r/20240816103246.719209-1-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d0b003ce
    • Yu Zhao's avatar
      mm/hugetlb: use __GFP_COMP for gigantic folios · cf54f310
      Yu Zhao authored
      Use __GFP_COMP for gigantic folios to greatly reduce not only the amount
      of code but also the allocation and free time.
      
      LOC (approximately): +60, -240
      
      Allocate and free 500 1GB hugeTLB memory without HVO by:
        time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
        time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
      
             Before  After
      Alloc  ~13s    ~10s
      Free   ~15s    <1s
      
      The above magnitude generally holds for multiple x86 and arm64 CPU models.
      
      Link: https://lkml.kernel.org/r/20240814035451.773331-4-yuzhao@google.comSigned-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reported-by: default avatarFrank van der Linden <fvdl@google.com>
      Acked-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cf54f310
    • Yu Zhao's avatar
      mm/cma: add cma_{alloc,free}_folio() · 463586e9
      Yu Zhao authored
      With alloc_contig_range() and free_contig_range() supporting large folios,
      CMA can allocate and free large folios too, by cma_alloc_folio() and
      cma_free_folio().
      
      [yuzhao@google.com: fix WARN in cma_alloc_folio()]
        Link: https://lkml.kernel.org/r/Zsd0PgAQmbpR8jS6@google.com
      Link: https://lkml.kernel.org/r/20240814035451.773331-3-yuzhao@google.comSigned-off-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Frank van der Linden <fvdl@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      463586e9
    • Yu Zhao's avatar
      mm/contig_alloc: support __GFP_COMP · e98337d1
      Yu Zhao authored
      Patch series "mm/hugetlb: alloc/free gigantic folios", v2.
      
      Use __GFP_COMP for gigantic folios can greatly reduce not only the amount
      of code but also the allocation and free time.
      
      Approximate LOC to mm/hugetlb.c: +60, -240
      
      Allocate and free 500 1GB hugeTLB memory without HVO by:
        time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
        time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
      
             Before  After
      Alloc  ~13s    ~10s
      Free   ~15s    <1s
      
      The above magnitude generally holds for multiple x86 and arm64 CPU
      models.
      
      Perf profile before:
        Alloc
          - 99.99% alloc_pool_huge_folio
             - __alloc_fresh_hugetlb_folio
                - 83.23% alloc_contig_pages_noprof
                   - 47.46% alloc_contig_range_noprof
                      - 20.96% isolate_freepages_range
                           16.10% split_page
                      - 14.10% start_isolate_page_range
                      - 12.02% undo_isolate_page_range
      
        Free
          - update_and_free_pages_bulk
             - 87.71% free_contig_range
                - 76.02% free_unref_page
                   - 41.30% free_unref_page_commit
                      - 32.58% free_pcppages_bulk
                         - 24.75% __free_one_page
                     13.96% _raw_spin_trylock
               12.27% __update_and_free_hugetlb_folio
      
      Perf profile after:
        Alloc
          - 99.99% alloc_pool_huge_folio
               alloc_gigantic_folio
             - alloc_contig_pages_noprof
                - 59.15% alloc_contig_range_noprof
                   - 20.72% start_isolate_page_range
                     20.64% prep_new_page
                   - 17.13% undo_isolate_page_range
      
        Free
          - update_and_free_pages_bulk
             - __folio_put
             - __free_pages_ok
                  7.46% free_tail_page_prepare
                - 1.97% free_one_page
                     1.86% __free_one_page
      
      This patch (of 3):
      
      Support __GFP_COMP in alloc_contig_range().  When the flag is set, upon
      success the function returns a large folio prepared by prep_new_page(),
      rather than a range of order-0 pages prepared by split_free_pages() (which
      is renamed from split_map_pages()).
      
      alloc_contig_range() can be used to allocate folios larger than
      MAX_PAGE_ORDER, e.g., gigantic hugeTLB folios.  So on the free path,
      free_one_page() needs to handle that by split_large_buddy().
      
      [akpm@linux-foundation.org: fix folio_alloc_gigantic_noprof() WARN expression, per Yu Liao]
      Link: https://lkml.kernel.org/r/20240814035451.773331-1-yuzhao@google.com
      Link: https://lkml.kernel.org/r/20240814035451.773331-2-yuzhao@google.comSigned-off-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Frank van der Linden <fvdl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e98337d1
    • Kaiyang Zhao's avatar
      mm,memcg: provide per-cgroup counters for NUMA balancing operations · f77f0c75
      Kaiyang Zhao authored
      The ability to observe the demotion and promotion decisions made by the
      kernel on a per-cgroup basis is important for monitoring and tuning
      containerized workloads on machines equipped with tiered memory.
      
      Different containers in the system may experience drastically different
      memory tiering actions that cannot be distinguished from the global
      counters alone.
      
      For example, a container running a workload that has a much hotter memory
      accesses will likely see more promotions and fewer demotions, potentially
      depriving a colocated container of top tier memory to such an extent that
      its performance degrades unacceptably.
      
      For another example, some containers may exhibit longer periods between
      data reuse, causing much more numa_hint_faults than numa_pages_migrated. 
      In this case, tuning hot_threshold_ms may be appropriate, but the signal
      can easily be lost if only global counters are available.
      
      In the long term, we hope to introduce per-cgroup control of promotion and
      demotion actions to implement memory placement policies in tiering.
      
      This patch set adds seven counters to memory.stat in a cgroup:
      numa_pages_migrated, numa_pte_updates, numa_hint_faults, pgdemote_kswapd,
      pgdemote_khugepaged, pgdemote_direct and pgpromote_success.  pgdemote_*
      and pgpromote_success are also available in memory.numa_stat.
      
      count_memcg_events_mm() is added to count multiple event occurrences at
      once, and get_mem_cgroup_from_folio() is added because we need to get a
      reference to the memcg of a folio before it's migrated to track
      numa_pages_migrated.  The accounting of PGDEMOTE_* is moved to
      shrink_inactive_list() before being changed to per-cgroup.
      
      [kaiyang2@cs.cmu.edu: add documentation of the memcg counters in cgroup-v2.rst]
        Link: https://lkml.kernel.org/r/20240814235122.252309-1-kaiyang2@cs.cmu.edu
      Link: https://lkml.kernel.org/r/20240814174227.30639-1-kaiyang2@cs.cmu.eduSigned-off-by: default avatarKaiyang Zhao <kaiyang2@cs.cmu.edu>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeel.butt@linux.dev>
      Cc: Wei Xu <weixugc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f77f0c75
    • Andrey Konovalov's avatar
      kasan: simplify and clarify Makefile · 78788c3e
      Andrey Konovalov authored
      When KASAN support was being added to the Linux kernel, GCC did not yet
      support all of the KASAN-related compiler options.  Thus, the KASAN
      Makefile had to probe the compiler for supported options.
      
      Nowadays, the Linux kernel GCC version requirement is 5.1+, and thus we
      don't need the probing of the -fasan-shadow-offset parameter: it exists in
      all 5.1+ GCCs.
      
      Simplify the KASAN Makefile to drop CFLAGS_KASAN_MINIMAL.
      
      Also add a few more comments and unify the indentation.
      
      [andreyknvl@gmail.com: comments fixes per Miguel]
        Link: https://lkml.kernel.org/r/20240814161052.10374-1-andrey.konovalov@linux.dev
      Link: https://lkml.kernel.org/r/20240813224027.84503-1-andrey.konovalov@linux.devSigned-off-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Reviewed-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Matthew Maurer <mmaurer@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      78788c3e
    • Baolin Wang's avatar
      mm: shmem: support large folio swap out · 809bc865
      Baolin Wang authored
      Shmem will support large folio allocation [1] [2] to get a better
      performance, however, the memory reclaim still splits the precious large
      folios when trying to swap out shmem, which may lead to the memory
      fragmentation issue and can not take advantage of the large folio for
      shmeme.
      
      Moreover, the swap code already supports for swapping out large folio
      without split, hence this patch set supports the large folio swap out for
      shmem.
      
      Note the i915_gem_shmem driver still need to be split when swapping, thus
      add a new flag 'split_large_folio' for writeback_control to indicate
      spliting the large folio.
      
      [1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
      [2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
      
      [hughd@google.com: shmem_writepage() split folio at EOF before swapout]
        Link: https://lkml.kernel.org/r/aef55f8d-6040-692d-65e3-16150cce4440@google.com
      [baolin.wang@linux.alibaba.com: remove the wbc->split_large_folio per Hugh]
        Link: https://lkml.kernel.org/r/1236a002daa301b3b9ba73d6c0fab348427cf295.1724833399.git.baolin.wang@linux.alibaba.com
      Link: https://lkml.kernel.org/r/d80c21abd20e1b0f5ca66b330f074060fb2f082d.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      809bc865
    • Baolin Wang's avatar
      mm: shmem: split large entry if the swapin folio is not large · 12885cbe
      Baolin Wang authored
      Now the swap device can only swap-in order 0 folio, even though a large
      folio is swapped out.  This requires us to split the large entry
      previously saved in the shmem pagecache to support the swap in of small
      folios.
      
      [hughd@google.com: fix warnings from kmalloc_fix_flags()]
        Link: https://lkml.kernel.org/r/e2a2ba5d-864c-50aa-7579-97cba1c7dd0c@google.com
      [baolin.wang@linux.alibaba.com: drop the 'new_order' parameter]
        Link: https://lkml.kernel.org/r/39c71ccf-669b-4d9f-923c-f6b9c4ceb8df@linux.alibaba.com
      Link: https://lkml.kernel.org/r/4a0f12f27c54a62eb4d9ca1265fed3a62531a63e.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      12885cbe
    • Baolin Wang's avatar
      mm: shmem: drop folio reference count using 'nr_pages' in shmem_delete_from_page_cache() · 872339c3
      Baolin Wang authored
      To support large folio swapin/swapout for shmem in the following patches,
      drop the folio's reference count by the number of pages contained in the
      folio when a shmem folio is deleted from shmem pagecache after adding into
      swap cache.
      
      Link: https://lkml.kernel.org/r/b371eadb27f42fc51261c51008fbb9a334985b4c.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      872339c3
    • Baolin Wang's avatar
      mm: shmem: support large folio allocation for shmem_replace_folio() · 736f0e03
      Baolin Wang authored
      To support large folio swapin for shmem in the following patches, add
      large folio allocation for the new replacement folio in
      shmem_replace_folio().  Moreover large folios occupy N consecutive entries
      in the swap cache instead of using multi-index entries like the page
      cache, therefore we should replace each consecutive entries in the swap
      cache instead of using the shmem_replace_entry().
      
      As well as updating statistics and folio reference count using the number
      of pages in the folio.
      
      [baolin.wang@linux.alibaba.com: fix the gfp flag for large folio allocation]
        Link: https://lkml.kernel.org/r/5b1e9c5a-7f61-4d97-a8d7-41767ca04c77@linux.alibaba.com
      [baolin.wang@linux.alibaba.com: fix build without CONFIG_TRANSPARENT_HUGEPAGE]
        Link: https://lkml.kernel.org/r/8c03467c-63b2-43b4-9851-222d4188725c@linux.alibaba.com
      Link: https://lkml.kernel.org/r/a41138ecc857ef13e7c5ffa0174321e9e2c9970a.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      736f0e03
    • Baolin Wang's avatar
      mm: shmem: use swap_free_nr() to free shmem swap entries · 40ff2d11
      Baolin Wang authored
      As a preparation for supporting shmem large folio swapout, use
      swap_free_nr() to free some continuous swap entries of the shmem large
      folio when the large folio was swapped in from the swap cache.  In
      addition, the index should also be round down to the number of pages when
      adding the swapin folio into the pagecache.
      
      Link: https://lkml.kernel.org/r/342207fa679fc88a447dac2e101ad79e6050fe79.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      40ff2d11
    • Baolin Wang's avatar
      mm: filemap: use xa_get_order() to get the swap entry order · fb724159
      Baolin Wang authored
      In the following patches, shmem will support the swap out of large folios,
      which means the shmem mappings may contain large order swap entries, so
      using xa_get_order() to get the folio order of the shmem swap entry to
      update the '*start' correctly.
      
      [hughd@google.com: use xa_get_order() to get the swap entry order]
        Link: https://lkml.kernel.org/r/c336e6e4-da7f-b714-c0f1-12df715f2611@google.com
      Link: https://lkml.kernel.org/r/6876d55145c1cc80e79df7884aa3a62e397b101d.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fb724159
    • Daniel Gomez's avatar
      mm: shmem: return number of pages beeing freed in shmem_free_swap · 6ea0d1cc
      Daniel Gomez authored
      Both shmem_free_swap callers expect the number of pages being freed.  In
      the large folios context, this needs to support larger values other than 0
      (used as 1 page being freed) and -ENOENT (used as 0 pages being freed). 
      In preparation for large folios adoption, make shmem_free_swap routine
      return the number of pages being freed.  So, returning 0 in this context,
      means 0 pages being freed.
      
      While we are at it, changing to use free_swap_and_cache_nr() to free large
      order swap entry by Baolin Wang.
      
      Link: https://lkml.kernel.org/r/9623e863c83d749d5ab407f6fdf0a8e5a3bdf052.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarDaniel Gomez <da.gomez@samsung.com>
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6ea0d1cc
    • Baolin Wang's avatar
      mm: shmem: extend shmem_partial_swap_usage() to support large folio swap · 50f381ec
      Baolin Wang authored
      To support shmem large folio swapout in the following patches, using
      xa_get_order() to get the order of the swap entry to calculate the swap
      usage of shmem.
      
      Link: https://lkml.kernel.org/r/60b130b9fc3e422bb91293a172c2113c85e9233a.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      50f381ec
    • Baolin Wang's avatar
      mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting · 65018076
      Baolin Wang authored
      Patch series "support large folio swap-out and swap-in for shmem", v5.
      
      Shmem will support large folio allocation [1] [2] to get a better
      performance, however, the memory reclaim still splits the precious large
      folios when trying to swap-out shmem, which may lead to the memory
      fragmentation issue and can not take advantage of the large folio for
      shmeme.
      
      Moreover, the swap code already supports for swapping out large folio
      without split, and large folio swap-in[3] series is queued into
      mm-unstable branch.  Hence this patch set also supports the large folio
      swap-out and swap-in for shmem.
      
      
      This patch (of 9):
      
      To support shmem large folio swap operations, add a new parameter to
      swap_shmem_alloc() that allows batch SWAP_MAP_SHMEM flag setting for shmem
      swap entries.
      
      While we are at it, using folio_nr_pages() to get the number of pages of
      the folio as a preparation.
      
      Link: https://lkml.kernel.org/r/cover.1723434324.git.baolin.wang@linux.alibaba.com
      Link: https://lkml.kernel.org/r/99f64115d04b285e009580eb177352c57119ffd0.1723434324.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarBarry Song <baohua@kernel.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Daniel Gomez <da.gomez@samsung.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65018076
    • Barry Song's avatar
      mm: attempt to batch free swap entries for zap_pte_range() · bea67dcc
      Barry Song authored
      Zhiguo reported that swap release could be a serious bottleneck during
      process exits[1].  With mTHP, we have the opportunity to batch free swaps.
      
      Thanks to the work of Chris and Kairui[2], I was able to achieve this
      optimization with minimal code changes by building on their efforts.
      
      If swap_count is 1, which is likely true as most anon memory are private,
      we can free all contiguous swap slots all together.
      
      Ran the below test program for measuring the bandwidth of munmap
      using zRAM and 64KiB mTHP:
      
       #include <sys/mman.h>
       #include <sys/time.h>
       #include <stdlib.h>
      
       unsigned long long tv_to_ms(struct timeval tv)
       {
              return tv.tv_sec * 1000 + tv.tv_usec / 1000;
       }
      
       main()
       {
              struct timeval tv_b, tv_e;
              int i;
       #define SIZE 1024*1024*1024
              void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
              if (!p) {
                      perror("fail to get memory");
                      exit(-1);
              }
      
              madvise(p, SIZE, MADV_HUGEPAGE);
              memset(p, 0x11, SIZE); /* write to get mem */
      
              madvise(p, SIZE, MADV_PAGEOUT);
      
              gettimeofday(&tv_b, NULL);
              munmap(p, SIZE);
              gettimeofday(&tv_e, NULL);
      
              printf("munmap in bandwidth: %ld bytes/ms\n",
                              SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b)));
       }
      
      The result is as below (munmap bandwidth):
                      mm-unstable  mm-unstable-with-patch
         round1       21053761      63161283
         round2       21053761      63161283
         round3       21053761      63161283
         round4       20648881      67108864
         round5       20648881      67108864
      
      munmap bandwidth becomes 3X faster.
      
      [1] https://lore.kernel.org/linux-mm/20240731133318.527-1-justinjiang@vivo.com/
      [2] https://lore.kernel.org/linux-mm/20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org/
      
      [v-songbaohua@oppo.com: check all swaps belong to same swap_cgroup in swap_pte_batch()]
        Link: https://lkml.kernel.org/r/20240815215308.55233-1-21cnbao@gmail.com
      [hughd@google.com: add mem_cgroup_disabled() check]
        Link: https://lkml.kernel.org/r/33f34a88-0130-5444-9b84-93198eeb50e7@google.com
      [21cnbao@gmail.com: add missing zswap_invalidate()]
        Link: https://lkml.kernel.org/r/20240821054921.43468-1-21cnbao@gmail.com
      Link: https://lkml.kernel.org/r/20240807215859.57491-3-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bea67dcc
    • Barry Song's avatar
      mm: rename instances of swap_info_struct to meaningful 'si' · b85508d7
      Barry Song authored
      Patch series "mm: batch free swaps for zap_pte_range()", v3.
      
      Batch free swap slots for zap_pte_range(), making munmap three times
      faster when the page table entries are filled with swap entries to
      be freed. This is likely another advantage of using mTHP.
      
      
      This patch (of 3):
      
      "p" means "pointer to something", rename it to a more meaningful
      identifier - "si".  We also have a case with the name "sis", rename it to
      "si" as well.
      
      Link: https://lkml.kernel.org/r/20240807215859.57491-1-21cnbao@gmail.com
      Link: https://lkml.kernel.org/r/20240807215859.57491-2-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Zhiguo Jiang <justinjiang@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b85508d7
    • Mike Rapoport (Microsoft)'s avatar
      docs: move numa=fake description to kernel-parameters.txt · 101d6470
      Mike Rapoport (Microsoft) authored
      NUMA emulation can be now enabled on arm64 and riscv in addition to x86.
      
      Move description of numa=fake parameters from x86 documentation of
      admin-guide/kernel-parameters.txt
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-27-rppt@kernel.orgSuggested-by: default avatarZi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      101d6470
    • Mike Rapoport (Microsoft)'s avatar
      mm: make range-to-target_node lookup facility a part of numa_memblks · 1b5695b0
      Mike Rapoport (Microsoft) authored
      The x86 implementation of range-to-target_node lookup (i.e. 
      phys_to_target_node() and memory_add_physaddr_to_nid()) relies on
      numa_memblks.
      
      Since numa_memblks are now part of the generic code, move these functions
      from x86 to mm/numa_memblks.c and select CONFIG_NUMA_KEEP_MEMINFO when
      CONFIG_NUMA_MEMBLKS=y for dax and cxl.
      
      [rppt@kernel.org: fix build]
        Link: https://lkml.kernel.org/r/ZtVfSt_zloPdDqVB@kernel.org
      Link: https://lkml.kernel.org/r/20240807064110.1003856-26-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1b5695b0
    • Mike Rapoport (Microsoft)'s avatar
      arch_numa: switch over to numa_memblks · 76750765
      Mike Rapoport (Microsoft) authored
      Until now arch_numa was directly translating firmware NUMA information
      to memblock.
      
      Using numa_memblks as an intermediate step has a few advantages:
      * alignment with more battle tested x86 implementation
      * availability of NUMA emulation
      * maintaining node information for not yet populated memory
      
      Adjust a few places in numa_memblks to compile with 32-bit phys_addr_t and
      replace current functionality related to numa_add_memblk() and
      __node_distance() in arch_numa with the implementation based on
      numa_memblks and add functions required by numa_emulation.
      
      [rppt@kernel.org: fix section mismatch]
        Link: https://lkml.kernel.org/r/ZrO6cExVz1He_yPn@kernel.org
      [rppt@kernel.org: PFN_PHYS() translation is unnecessary here]
        Link: https://lkml.kernel.org/r/Zs2T5wkSYO9MGcab@kernel.org
      Link: https://lkml.kernel.org/r/20240807064110.1003856-25-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      76750765
    • Mike Rapoport (Microsoft)'s avatar
      of, numa: return -EINVAL when no numa-node-id is found · 7e488677
      Mike Rapoport (Microsoft) authored
      Currently of_numa_parse_memory_nodes() returns 0 if no "memory" node in
      device tree contains "numa-node-id" property.  This makes of_numa_init()
      to return "success" despite no NUMA nodes were actually parsed and set up.
      
      arch_numa workarounds this by returning an error if numa_nodes_parsed is
      empty.
      
      numa_memblks however would WARN() in such case and since it will be used
      by arch_numa shortly, such warning is not desirable.
      
      Make sure of_numa_init() returns -EINVAL when no NUMA node information was
      found in the device tree.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-24-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7e488677
    • Mike Rapoport (Microsoft)'s avatar
      mm: numa_memblks: use memblock_{start,end}_of_DRAM() when sanitizing meminfo · f7feea28
      Mike Rapoport (Microsoft) authored
      numa_cleanup_meminfo() moves blocks outside system RAM to
      numa_reserved_meminfo and it uses 0 and PFN_PHYS(max_pfn) to determine the
      memory boundaries.
      
      Replace the memory range boundaries with more portable
      memblock_start_of_DRAM() and memblock_end_of_DRAM().
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-23-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f7feea28
    • Mike Rapoport (Microsoft)'s avatar
      mm: numa_memblks: make several functions and variables static · 317ef459
      Mike Rapoport (Microsoft) authored
      Make functions and variables that are exclusively used by numa_memblks
      static.
      
      Move numa_nodemask_from_meminfo() before its callers to avoid forward
      declaration.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-22-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      317ef459
    • Mike Rapoport (Microsoft)'s avatar
      mm: numa_memblks: introduce numa_memblks_init · 692d73d2
      Mike Rapoport (Microsoft) authored
      Move most of x86::numa_init() to numa_memblks so that the latter will be
      more self-contained.
      
      With this numa_memblk data structures should not be exposed to the
      architecture specific code.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-21-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      692d73d2
    • Mike Rapoport (Microsoft)'s avatar
      mm: introduce numa_emulation · b0c4e27c
      Mike Rapoport (Microsoft) authored
      Move numa_emulation code from arch/x86 to mm/numa_emulation.c
      
      This code will be later reused by arch_numa.
      
      No functional changes.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-20-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b0c4e27c
    • Mike Rapoport (Microsoft)'s avatar
      mm: move numa_distance and related code from x86 to numa_memblks · 75f9d4cc
      Mike Rapoport (Microsoft) authored
      Move code dealing with numa_distance array from arch/x86 to
      mm/numa_memblks.c
      
      This code will be later reused by arch_numa.
      
      No functional changes.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-19-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      75f9d4cc
    • Mike Rapoport (Microsoft)'s avatar
      mm: introduce numa_memblks · 87482708
      Mike Rapoport (Microsoft) authored
      Move code dealing with numa_memblks from arch/x86 to mm/ and add Kconfig
      options to let x86 select it in its Kconfig.
      
      This code will be later reused by arch_numa.
      
      No functional changes.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-18-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      87482708
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned · 7a715285
      Mike Rapoport (Microsoft) authored
      CPU id cannot be negative.
      
      Making it unsigned also aligns with declarations in
      include/asm-generic/numa.h used by arm64 and riscv and allows sharing numa
      emulation code with these architectures.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-17-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7a715285
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa_emu: use a helper function to get MAX_DMA32_PFN · e52d5873
      Mike Rapoport (Microsoft) authored
      This is required to make numa emulation code architecture independent so
      that it can be moved to generic code in following commits.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-16-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e52d5873
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa_emu: split __apicid_to_node update to a helper function · 55e74bcc
      Mike Rapoport (Microsoft) authored
      This is required to make numa emulation code architecture independent so
      that it can be moved to generic code in following commits.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-15-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      55e74bcc
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa_emu: simplify allocation of phys_dist · e3c1299c
      Mike Rapoport (Microsoft) authored
      By the time numa_emulation() is called, all physical memory is already
      mapped in the direct map and there is no need to define limits for
      memblock allocation.
      
      Replace memblock_phys_alloc_range() with memblock_alloc().
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-14-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e3c1299c
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa: move FAKE_NODE_* defines to numa_emu · e4a5e5a5
      Mike Rapoport (Microsoft) authored
      The definitions of FAKE_NODE_MIN_SIZE and FAKE_NODE_MIN_HASH_MASK are only
      used by numa emulation code, make them local to
      arch/x86/mm/numa_emulation.c
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-13-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e4a5e5a5
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa: use get_pfn_range_for_nid to verify that node spans memory · 77c1d0e7
      Mike Rapoport (Microsoft) authored
      Instead of looping over numa_meminfo array to detect node's start and
      end addresses use get_pfn_range_for_init().
      
      This is shorter and make it easier to lift numa_memblks to generic code.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-12-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      77c1d0e7
    • Mike Rapoport (Microsoft)'s avatar
      x86/numa: simplify numa_distance allocation · 9916c27d
      Mike Rapoport (Microsoft) authored
      Allocation of numa_distance uses memblock_phys_alloc_range() to limit
      allocation to be below the last mapped page.
      
      But NUMA initializaition runs after the direct map is populated and there
      is also code in setup_arch() that adjusts memblock limit to reflect how
      much memory is already mapped in the direct map.
      
      Simplify the allocation of numa_distance and use plain memblock_alloc().
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-11-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9916c27d
    • Mike Rapoport (Microsoft)'s avatar
      arch, mm: pull out allocation of NODE_DATA to generic code · 3515863d
      Mike Rapoport (Microsoft) authored
      Architectures that support NUMA duplicate the code that allocates
      NODE_DATA on the node-local memory with slight variations in reporting of
      the addresses where the memory was allocated.
      
      Use x86 version as the basis for the generic alloc_node_data() function
      and call this function in architecture specific numa initialization.
      
      Round up node data size to SMP_CACHE_BYTES rather than to PAGE_SIZE like
      x86 used to do since the bootmem era when allocation granularity was
      PAGE_SIZE anyway.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-10-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3515863d
    • Mike Rapoport (Microsoft)'s avatar
      mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION · ec164cf1
      Mike Rapoport (Microsoft) authored
      There are no users of HAVE_ARCH_NODEDATA_EXTENSION left, so
      arch_alloc_nodedata() and arch_refresh_nodedata() are not needed anymore.
      
      Replace the call to arch_alloc_nodedata() in free_area_init() with a new
      helper alloc_offline_node_data(), remove arch_refresh_nodedata() and
      cleanup include/linux/memory_hotplug.h from the associated ifdefery.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-9-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ec164cf1
    • Mike Rapoport (Microsoft)'s avatar
      arch, mm: move definition of node_data to generic code · 46bcce50
      Mike Rapoport (Microsoft) authored
      Every architecture that supports NUMA defines node_data in the same way:
      
      	struct pglist_data *node_data[MAX_NUMNODES];
      
      No reason to keep multiple copies of this definition and its forward
      declarations, especially when such forward declaration is the only thing
      in include/asm/mmzone.h for many architectures.
      
      Add definition and declaration of node_data to generic code and drop
      architecture-specific versions.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      46bcce50
    • Mike Rapoport (Microsoft)'s avatar
      MIPS: loongson64: drop HAVE_ARCH_NODEDATA_EXTENSION · 3ac9999c
      Mike Rapoport (Microsoft) authored
      Commit f8f9f21c ("MIPS: Fix build error for loongson64 and sgi-ip27")
      added HAVE_ARCH_NODEDATA_EXTENSION to loongson64 to silence a compilation
      error that happened because loongson64 didn't define array of pg_data_t as
      node_data like most other architectures did.
      
      After rename of __node_data to node_data arch_alloc_nodedata() and
      HAVE_ARCH_NODEDATA_EXTENSION can be dropped from loongson64.
      
      Since it was the only user of HAVE_ARCH_NODEDATA_EXTENSION config option
      also remove this option from arch/mips/Kconfig.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-7-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3ac9999c
    • Mike Rapoport (Microsoft)'s avatar
      MIPS: loongson64: rename __node_data to node_data · e20bac65
      Mike Rapoport (Microsoft) authored
      Make definition of node_data match other architectures.  This will allow
      pulling declaration of node_data to the generic mm code in the following
      commit.
      
      Link: https://lkml.kernel.org/r/20240807064110.1003856-6-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Reviewed-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e20bac65