Commits · 6077c943beee407168f72ece745b0aeaef6b896f · Kirill Smelkov / linux

18 Jul, 2022 3 commits

mm: rename is_pinnable_page() to is_longterm_pinnable_page() · 6077c943

Alex Sierra authored Jul 15, 2022

Patch series "Add MEMORY_DEVICE_COHERENT for coherent device memory
mapping", v9.

This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
owned by a device that can be mapped into CPU page tables like
MEMORY_DEVICE_GENERIC and can also be migrated like MEMORY_DEVICE_PRIVATE.

This patch series is mostly self-contained except for a few places where
it needs to update other subsystems to handle the new memory type.

System stability and performance are not affected according to our ongoing
testing, including xfstests.

How it works: The system BIOS advertises the GPU device memory (aka VRAM)
as SPM (special purpose memory) in the UEFI system address map.

The amdgpu driver registers the memory with devmap as
MEMORY_DEVICE_COHERENT using devm_memremap_pages.  The initial user for
this hardware page migration capability is the Frontier supercomputer
project.  This functionality is not AMD-specific.  We expect other GPU
vendors to find this functionality useful, and possibly other hardware
types in the future.

Our test nodes in the lab are similar to the Frontier configuration, with
.5 TB of system memory plus 256 GB of device memory split across 4 GPUs,
all in a single coherent address space.  Page migration is expected to
improve application efficiency significantly.  We will report empirical
results as they become available.

Coherent device type pages at gup are now migrated back to system memory
if they are being pinned long-term (FOLL_LONGTERM).  The reason is, that
long-term pinning would interfere with the device memory manager owning
the device-coherent pages (e.g.  evictions in TTM).  These series
incorporate Alistair Popple patches to do this migration from
pin_user_pages() calls.  hmm_gup_test has been added to hmm-test to test
different get user pages calls.

This series includes handling of device-managed anonymous pages returned
by vm_normal_pages.  Although they behave like normal pages for purposes
of mapping in CPU page tables and for COW, they do not support LRU lists,
NUMA migration or THP.

We also introduced a FOLL_LRU flag that adds the same behaviour to
follow_page and related APIs, to allow callers to specify that they expect
to put pages on an LRU list.


This patch (of 14):

is_pinnable_page() and folio_is_pinnable() are renamed to
is_longterm_pinnable_page() and folio_is_longterm_pinnable() respectively.
These functions are used in the FOLL_LONGTERM flag context.

Link: https://lkml.kernel.org/r/20220715150521.18165-1-alex.sierra@amd.com
Link: https://lkml.kernel.org/r/20220715150521.18165-2-alex.sierra@amd.comSigned-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6077c943

selftests/vm: add protection_keys tests to run_vmtests · 30f6f861

Kalpana Shetty authored May 31, 2022

Add "protected_keys" tests to "run_vmtests.sh" would help run all VM
related tests from a single shell script.

[kalpana.shetty@amd.com: Shuah Khan's review comments incorporated, added -x executable check]
  Link: https://lkml.kernel.org/r/20220617202931.357-1-kalpana.shetty@amd.com
Link: https://lkml.kernel.org/r/20220610090704.296-1-kalpana.shetty@amd.com
Link: https://lkml.kernel.org/r/20220531102556.388-1-kalpana.shetty@amd.comSigned-off-by: Kalpana Shetty <kalpana.shetty@amd.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

30f6f861

mm/damon/lru_sort: fix potential memory leak in damon_lru_sort_init() · ec1658f0

SeongJae Park authored Jul 14, 2022

damon_lru_sort_init() returns an error when damon_select_ops() fails
without freeing 'ctx' which allocated before. This commit fixes the
potential memory leak by freeing 'ctx' under the situation.

Link: https://lkml.kernel.org/r/20220714170458.49727-1-sj@kernel.org
Fixes: 40e983cc ("mm/damon: introduce DAMON-based LRU-lists Sorting")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ec1658f0

04 Jul, 2022 37 commits

selftests/vm: only run 128TBswitch with 5-level paging · 4f2930c6

Adam Sindelar authored Jun 27, 2022

The test va_128TBswitch.c expects to be able to pass mmap an address hint
and length that cross the address 1<<47.  On x86_64, this is not possible
without 5-level page tables, so the test fails.

The test is already only run on 64-bit powerpc and x86_64 archs, but this
patch adds an additional check on x86_64 that skips the test if
PG_TABLE_LEVELS < 5.  There is precedent for checking /proc/config.gz in
selftests, e.g.  in selftests/firmware.

Running the tests produces the desired output:

sudo make -C tools/testing/selftests TARGETS=vm run_tests
---------------------------
running ./va_128TBswitch.sh
---------------------------
./va_128TBswitch.sh: PG_TABLE_LEVELS=4, must be >= 5 to run this test
[SKIP]
-------------------------------

[adam@wowsignal.io: restrict the check to x86_64]
  Link: https://lkml.kernel.org/r/20220628163654.337600-1-adam@wowsignal.io
[adam@wowsignal.io: fix formatting issues, rename "die" to "fail"]
  Link: https://lkml.kernel.org/r/20220701163030.415735-1-adam@wowsignal.io
Link: https://lkml.kernel.org/r/20220627163912.5581-1-adam@wowsignal.ioSigned-off-by: Adam Sindelar <adam@wowsignal.io>
Cc: Adam Sindelar <ats@fb.com>
Cc: David Vernet <void@manifault.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4f2930c6

mm/khugepaged: try to free transhuge swapcache when possible · 1baec203

Miaohe Lin authored Jun 25, 2022

Transhuge swapcaches won't be freed in __collapse_huge_page_copy().  It's
because release_pte_page() is not called for these pages and thus
free_page_and_swap_cache can't grab the page lock.  These pages won't be
freed from swap cache even if we are the only user until next time
reclaim.  It shouldn't hurt indeed, but we could try to free these pages
to save more memory for system.

Link: https://lkml.kernel.org/r/20220625092816.4856-8-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

1baec203

mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp() · 081c3256

Miaohe Lin authored Jun 25, 2022

The return value of khugepaged_add_pte_mapped_thp() is always 0 and also
ignored.  Remove it to clean up the code.

Link: https://lkml.kernel.org/r/20220625092816.4856-7-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

081c3256

mm/khugepaged: use helper macro __ATTR_RW · 6dcdc94d

Miaohe Lin authored Jun 25, 2022

Use helper macro __ATTR_RW to define the khugepaged attributes.  Minor
readability improvement.

Link: https://lkml.kernel.org/r/20220625092816.4856-6-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6dcdc94d

mm/khugepaged: minor cleanup for collapse_file · 2f55f070

Miaohe Lin authored Jun 25, 2022

nr_none is always 0 for non-shmem case because the page can be read from
the backend store.  So when nr_none !  = 0, it must be in is_shmem case. 
Also only adjust the nrpages and uncharge shmem when nr_none != 0 to save
cpu cycles.

Link: https://lkml.kernel.org/r/20220625092816.4856-5-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Zach O'Keefe <zokeefe@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2f55f070

mm/khugepaged: trivial typo and codestyle cleanup · 36ee2c78

Miaohe Lin authored Jun 25, 2022

Fix some typos and tweak the code to meet codestyle.  No functional change
intended.

Link: https://lkml.kernel.org/r/20220625092816.4856-4-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

36ee2c78

mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs · 4d928e20

Miaohe Lin authored Jun 25, 2022

When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
swap entry will remain in pagetable.  This will result in later failure. 
So stop swapping in pages in this case to save cpu cycles.  As A further
optimization, mmap_lock is released when __collapse_huge_page_swapin()
fails to avoid relocking mmap_lock.  And "swapped_in++" is moved after
error handling to make it more accurate.

Link: https://lkml.kernel.org/r/20220625092816.4856-3-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4d928e20

mm/khugepaged: remove unneeded shmem_huge_enabled() check · dd5ff79d

Miaohe Lin authored Jun 25, 2022

Patch series "A few cleanup patches for khugepaged", v2.

This series contains a few cleaup patches to remove unneeded return value,
use helper macro, fix typos and so on.  More details can be found in the
respective changelogs.


This patch (of 7):

If we reach here, khugepaged_scan_mm_slot() has already made sure that
hugepage is enabled for shmem, via its call to hugepage_vma_check(). 
Remove this duplicated check.

Link: https://lkml.kernel.org/r/20220625092816.4856-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220625092816.4856-2-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zach O'Keefe <zokeefe@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

dd5ff79d

mm: sparsemem: drop unexpected word 'a' in comments · f673bd7c

XueBing Chen authored Jun 25, 2022

there is an unexpected word 'a' in the comments that need to be dropped

Link: https://lkml.kernel.org/r/24fbdae3.c86.1819a0f31b9.Coremail.chenxuebing@jari.cnSigned-off-by: XueBing Chen <chenxuebing@jari.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

f673bd7c

mm: hugetlb: kill set_huge_swap_pte_at() · 18f39629

Qi Zheng authored Jun 26, 2022

Commit e5251fd4 ("mm/hugetlb: introduce set_huge_swap_pte_at()
helper") add set_huge_swap_pte_at() to handle swap entries on
architectures that support hugepages consisting of contiguous ptes.  And
currently the set_huge_swap_pte_at() is only overridden by arm64.

set_huge_swap_pte_at() provide a sz parameter to help determine the number
of entries to be updated.  But in fact, all hugetlb swap entries contain
pfn information, so we can find the corresponding folio through the pfn
recorded in the swap entry, then the folio_size() is the number of entries
that need to be updated.

And considering that users will easily cause bugs by ignoring the
difference between set_huge_swap_pte_at() and set_huge_pte_at().  Let's
handle swap entries in set_huge_pte_at() and remove the
set_huge_swap_pte_at(), then we can call set_huge_pte_at() anywhere, which
simplifies our coding.

Link: https://lkml.kernel.org/r/20220626145717.53572-1-zhengqi.arch@bytedance.comSigned-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

18f39629

mm/page_alloc: make the annotations of available memory more accurate · ade63b41

Yang Yang authored Jun 23, 2022

Not all systems use swap, so estimating available memory would help to
prevent swapping or OOM of system that not use swap.

And we need to reserve some page cache to prevent swapping or thrashing.
If somebody is accessing the pages in pagecache, and if too much would be
freed, most accesses might mean reading data from disk, i.e. thrashing.

Link: https://lkml.kernel.org/r/20220623020833.972979-1-yang.yang29@zte.com.cnSigned-off-by: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ade63b41

zram: do not lookup algorithm in backends table · dc899972

Sergey Senozhatsky authored Jun 22, 2022

Always use crypto_has_comp() so that crypto can lookup module, call
usermodhelper to load the modules, wait for usermodhelper to finish and so
on.  Otherwise crypto will do all of these steps under CPU hot-plug lock
and this looks like too much stuff to handle under the CPU hot-plug lock. 
Besides this can end up in a deadlock when usermodhelper triggers a code
path that attempts to lock the CPU hot-plug lock, that zram already holds.

An example of such deadlock:

- path A. zram grabs CPU hot-plug lock, execs /sbin/modprobe from crypto
  and waits for modprobe to finish

disksize_store
 zcomp_create
  __cpuhp_state_add_instance
   __cpuhp_state_add_instance_cpuslocked
    zcomp_cpu_up_prepare
     crypto_alloc_base
      crypto_alg_mod_lookup
       call_usermodehelper_exec
        wait_for_completion_killable
         do_wait_for_common
          schedule

- path B. async work kthread that brings in scsi device. It wants to
  register CPUHP states at some point, and it needs the CPU hot-plug
  lock for that, which is owned by zram.

async_run_entry_fn
 scsi_probe_and_add_lun
  scsi_mq_alloc_queue
   blk_mq_init_queue
    blk_mq_init_allocated_queue
     blk_mq_realloc_hw_ctxs
      __cpuhp_state_add_instance
       __cpuhp_state_add_instance_cpuslocked
        mutex_lock
         schedule

- path C. modprobe sleeps, waiting for all aync works to finish.

load_module
 do_init_module
  async_synchronize_full
   async_synchronize_cookie_domain
    schedule

[senozhatsky@chromium.org: add comment]
  Link: https://lkml.kernel.org/r/20220624060606.1014474-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20220622023501.517125-1-senozhatsky@chromium.orgSigned-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

dc899972

mm, docs: fix comments that mention mem_hotplug_end() · e8da368a

Yun-Ze Li authored Jun 20, 2022

Comments that mention mem_hotplug_end() are confusing as there is no
function called mem_hotplug_end().  Fix them by replacing all the
occurences of mem_hotplug_end() in the comments with mem_hotplug_done().

[akpm@linux-foundation.org: grammatical fixes]
Link: https://lkml.kernel.org/r/20220620071516.1286101-1-p76091292@gs.ncku.edu.twSigned-off-by: Yun-Ze Li <p76091292@gs.ncku.edu.tw>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

e8da368a

mm/smaps: add Pss_Dirty · 30934843

Vincent Whitchurch authored Jun 20, 2022

Pss is the sum of the sizes of clean and dirty private pages, and the
proportional sizes of clean and dirty shared pages:

 Private = Private_Dirty + Private_Clean
 Shared_Proportional = Shared_Dirty_Proportional + Shared_Clean_Proportional
 Pss = Private + Shared_Proportional

The Shared*Proportional fields are not present in smaps, so it is not
always possible to determine how much of the Pss is from dirty pages and
how much is from clean pages.  This information can be useful for
measuring memory usage for the purpose of optimisation, since clean pages
can usually be discarded by the kernel immediately while dirty pages
cannot.

The smaps routines in the kernel already have access to this data, so add
a Pss_Dirty to show it to userspace.  Pss_Clean is not added since it can
be calculated from Pss and Pss_Dirty.

Link: https://lkml.kernel.org/r/20220620081251.2928103-1-vincent.whitchurch@axis.comSigned-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

30934843

mm: rmap: simplify the hugetlb handling when unmapping or migration · 0506c31d

Baolin Wang authored Jun 20, 2022

According to previous discussion [1], there are so many levels of
indenting to handle the hugetlb case when unmapping or migration.  We can
combine folio_test_anon() and huge_pmd_unshare() to save one level of
indenting, by adding a local variable and moving the VM_BUG_ON() a little
forward.

No intended functional changes in this patch.

[1] https://lore.kernel.org/all/0b986dc4-5843-3e2d-c2df-5a2e9f13e6ab@oracle.com/

Link: https://lkml.kernel.org/r/28414b1b96f095e838c1e548074f8e0fc70d78cf.1655724713.git.baolin.wang@linux.alibaba.comSigned-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

0506c31d

mm/madvise: minor cleanup for swapin_walk_pmd_entry() · f7cc67ae

Miaohe Lin authored Jun 18, 2022

Passing index to pte_offset_map_lock() directly so the below calculation
can be avoided. Rename orig_pte to ptep as it's not changed. Also use
helper is_swap_pte() to improve the readability. No functional change
intended.

[akpm@linux-foundation.org: reduce scope of `ptep']
Link: https://lkml.kernel.org/r/20220618090527.37843-1-linmiaohe@huawei.comSigned-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

f7cc67ae

mm: hugetlb: remove minimum_order variable · dc2628f3

Muchun Song authored Jun 16, 2022

commit 641844f5 ("mm/hugetlb: introduce minimum hugepage order") fixed
a static checker warning and introduced a global variable minimum_order to
fix the warning. However, the local variable in
dissolve_free_huge_pages() can be initialized to
huge_page_order(&default_hstate) to fix the warning.

So remove minimum_order to simplify the code.

Link: https://lkml.kernel.org/r/20220616033846.96937-1-songmuchun@bytedance.comSigned-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

dc2628f3

mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory · 66361095

Muchun Song authored Jun 17, 2022

For now, the feature of hugetlb_free_vmemmap is not compatible with the
feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory. However, someone wants
to make memory_hotplug.memmap_on_memory takes precedence over
hugetlb_free_vmemmap since memmap_on_memory makes it more likely to
succeed memory hotplug in close-to-OOM situations. So the decision of
making hugetlb_free_vmemmap take precedence is not wise and elegant.

The proper approach is to have hugetlb_vmemmap.c do the check whether the
section which the HugeTLB pages belong to can be optimized. If the
section's vmemmap pages are allocated from the added memory block itself,
hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do
the optimization. Then both kernel parameters are compatible. So this
patch introduces VmemmapSelfHosted to mask any non-optimizable vmemmap
pages. The hugetlb_vmemmap can use this flag to detect if a vmemmap page
can be optimized.

[songmuchun@bytedance.com: walk vmemmap page tables to avoid false-positive]
Link: https://lkml.kernel.org/r/20220620110616.12056-3-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220617135650.74901-3-songmuchun@bytedance.comSigned-off-by: Muchun Song <songmuchun@bytedance.com>
Co-developed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

66361095

mm: memory_hotplug: enumerate all supported section flags · ed7802dd

Muchun Song authored Jun 17, 2022

Patch series "make hugetlb_optimize_vmemmap compatible with
memmap_on_memory", v3.

This series makes hugetlb_optimize_vmemmap compatible with
memmap_on_memory.


This patch (of 2):

We are almost running out of section flags, only one bit is available in
the worst case (powerpc with 256k pages).  However, there are still some
free bits (in ->section_mem_map) on other architectures (e.g.  x86_64 has
10 bits available, arm64 has 8 bits available with worst case of 64K
pages).  We have hard coded those numbers in code, it is inconvenient to
use those bits on other architectures except powerpc.  So transfer those
section flags to enumeration to make it easy to add new section flags in
the future.  Also, move SECTION_TAINT_ZONE_DEVICE into the scope of
CONFIG_ZONE_DEVICE to save a bit on non-zone-device case.

[songmuchun@bytedance.com: replace enum with defines per David]
  Link: https://lkml.kernel.org/r/20220620110616.12056-2-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220617135650.74901-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220617135650.74901-2-songmuchun@bytedance.comSigned-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ed7802dd

mm/swap: convert __delete_from_swap_cache() to a folio · ceff9d33

Matthew Wilcox (Oracle) authored Jun 17, 2022

All callers now have a folio, so convert the entire function to operate
on folios.

Link: https://lkml.kernel.org/r/20220617175020.717127-23-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ceff9d33

mm/swap: convert delete_from_swap_cache() to take a folio · 75fa68a5

Matthew Wilcox (Oracle) authored Jun 17, 2022

All but one caller already has a folio, so convert it to use a folio.

Link: https://lkml.kernel.org/r/20220617175020.717127-22-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

75fa68a5

mm: convert page_swap_flags to folio_swap_flags · b98c359f

Matthew Wilcox (Oracle) authored Jun 17, 2022

The only caller already has a folio, so push the folio->page conversion
down a level.

Link: https://lkml.kernel.org/r/20220617175020.717127-21-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b98c359f

mm: convert destroy_compound_page() to destroy_large_folio() · 5375336c

Matthew Wilcox (Oracle) authored Jun 17, 2022

All callers now have a folio, so push the folio->page conversion
down to this function.

[akpm@linux-foundation.org: uninline destroy_large_folio() to fix build issue]
Link: https://lkml.kernel.org/r/20220617175020.717127-20-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5375336c

mm/swap: convert __page_cache_release() to use a folio · 188e8cae

Matthew Wilcox (Oracle) authored Jun 17, 2022

All the callers now have a folio. Saves several calls to compound_head,
totalling 502 bytes of text.

Link: https://lkml.kernel.org/r/20220617175020.717127-19-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

188e8cae

mm/swap: convert __put_compound_page() to __folio_put_large() · 5ef82fe7

Matthew Wilcox (Oracle) authored Jun 17, 2022

All the callers now have a folio, so pass it in. This doesn't
save any text, but it does save a call to compound_head() as
folio_test_hugetlb() does not contain a call like PageHuge() does.

Link: https://lkml.kernel.org/r/20220617175020.717127-18-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5ef82fe7

mm/swap: convert __put_single_page() to __folio_put_small() · 83d99659

Matthew Wilcox (Oracle) authored Jun 17, 2022

Saves 56 bytes of text by removing a call to compound_head().

Link: https://lkml.kernel.org/r/20220617175020.717127-17-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

83d99659

mm/swap: convert __put_page() to __folio_put() · 8d29c703

Matthew Wilcox (Oracle) authored Jun 17, 2022

Saves 11 bytes of text by removing a check of PageTail.

Link: https://lkml.kernel.org/r/20220617175020.717127-16-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8d29c703

mm/swap: convert put_pages_list to use folios · 2f58e5de

Matthew Wilcox (Oracle) authored Jun 17, 2022

Pages linked through the LRU list cannot be tail pages as ->compound_head
is in a union with one of the words of the list_head, and they cannot
be ZONE_DEVICE pages as ->pgmap is in a union with the same word.
Saves 60 bytes of text by removing a call to page_is_fake_head().

Link: https://lkml.kernel.org/r/20220617175020.717127-15-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2f58e5de

mm/swap: convert release_pages to use a folio internally · ab5e653e

Matthew Wilcox (Oracle) authored Jun 17, 2022

This function was already calling compound_head(), but now it can
cache the result of calling compound_head() and avoid calling it again.
Saves 299 bytes of text by avoiding various calls to compound_page()
and avoiding checks of PageTail.

Link: https://lkml.kernel.org/r/20220617175020.717127-14-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ab5e653e

mm/swap: convert try_to_free_swap to use a folio · 2397f780

Matthew Wilcox (Oracle) authored Jun 17, 2022

Save a few calls to compound_head by converting the passed page to
a folio. Reduces kernel text size by 74 bytes.

Link: https://lkml.kernel.org/r/20220617175020.717127-13-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2397f780

mm/swap: optimise lru_add_drain_cpu() · a2d33b5d

Matthew Wilcox (Oracle) authored Jun 17, 2022

Do the per-cpu dereferencing of the fbatches once which saves 14 bytes
of text and several percpu relocations.

Link: https://lkml.kernel.org/r/20220617175020.717127-12-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

a2d33b5d

mm/swap: pull the CPU conditional out of __lru_add_drain_all() · 4864545a

Matthew Wilcox (Oracle) authored Jun 17, 2022

The function is too long, so pull this complicated conditional out into
cpu_needs_drain(). This ends up shrinking the text by 14 bytes,
by allowing GCC to cache the result of calling per_cpu() instead of
relocating each lookup individually.

Link: https://lkml.kernel.org/r/20220617175020.717127-11-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4864545a

mm/swap: rename lru_pvecs to cpu_fbatches · 82ac64d8

Matthew Wilcox (Oracle) authored Jun 17, 2022

No change to generated code, but this struct no longer contains any
pagevecs, and not all the folio batches it contains are lru.

Link: https://lkml.kernel.org/r/20220617175020.717127-10-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

82ac64d8

mm/swap: convert activate_page to a folio_batch · 3a44610b

Matthew Wilcox (Oracle) authored Jun 17, 2022

Rename it to just 'activate', saving 696 bytes of text from removals
of compound_page() and the pagevec_lru_move_fn() infrastructure.
Inline need_activate_page_drain() into its only caller.

Link: https://lkml.kernel.org/r/20220617175020.717127-9-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3a44610b

mm/swap: convert lru_lazyfree to a folio_batch · cec394ba

Matthew Wilcox (Oracle) authored Jun 17, 2022

Using folios instead of pages removes several calls to compound_head(),
shrinking the kernel by 1089 bytes of text.

Link: https://lkml.kernel.org/r/20220617175020.717127-8-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cec394ba

mm/swap: convert lru_deactivate to a folio_batch · 85cd7791

Matthew Wilcox (Oracle) authored Jun 17, 2022

Using folios instead of pages shrinks deactivate_page() and
lru_deactivate_fn() by 778 bytes between them.

Link: https://lkml.kernel.org/r/20220617175020.717127-7-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

85cd7791

mm/swap: convert lru_deactivate_file to a folio_batch · 7a3dbfe8

Matthew Wilcox (Oracle) authored Jun 17, 2022

Use a folio throughout lru_deactivate_file_fn(), removing many hidden
calls to compound_head(). Shrinks the kernel by 864 bytes of text.

Link: https://lkml.kernel.org/r/20220617175020.717127-6-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7a3dbfe8