Commits · a09fad96ffb1d0da007283727235a03b813f989b · Kirill Smelkov / linux

09 Jun, 2023 40 commits

mm/vmalloc: prevent flushing dirty space over and over · a09fad96

Thomas Gleixner authored May 25, 2023

vmap blocks which have active mappings cannot be purged.  Allocations
which have been freed are accounted for in vmap_block::dirty_min/max, so
that they can be detected in _vm_unmap_aliases() as potentially stale
TLBs.

If there are several invocations of _vm_unmap_aliases() then each of them
will flush the dirty range.  That's pointless and just increases the
probability of full TLB flushes.

Avoid that by resetting the flush range after accounting for it.  That's
safe versus other invocations of _vm_unmap_aliases() because this is all
serialized with vmap_purge_lock.

Link: https://lkml.kernel.org/r/20230525124504.692056496@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

a09fad96

mm/vmalloc: avoid iterating over per CPU vmap blocks twice · ca5e46c3

Thomas Gleixner authored May 25, 2023

_vunmap_aliases() walks the per CPU xarrays to find partially unmapped
blocks and then walks the per cpu free lists to purge fragmented blocks.

Arguably that's waste of CPU cycles and cache lines as the full xarray
walk already touches every block.

Avoid this double iteration:

  - Split out the code to purge one block and the code to free the local
    purge list into helper functions.

  - Try to purge the fragmented blocks in the xarray walk before looking at
    their dirty space.

Link: https://lkml.kernel.org/r/20230525124504.633469722@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ca5e46c3

mm/vmalloc: prevent stale TLBs in fully utilized blocks · fc1e0d98

Thomas Gleixner authored May 25, 2023

Patch series "mm/vmalloc: Assorted fixes and improvements", v2.

this series addresses the following issues:

  1) Prevent the stale TLB problem related to fully utilized vmap blocks

  2) Avoid the double per CPU list walk in _vm_unmap_aliases()

  3) Avoid flushing dirty space over and over

  4) Add a lockless quickcheck in vb_alloc() and add missing
     READ/WRITE_ONCE() annotations

  5) Prevent overeager purging of usable vmap_blocks if
     not under memory/address space pressure.


This patch (of 6):

_vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
page are left in the system. This is required due to the lazy TLB flush
mechanism in vmalloc.

This is tried to achieve by walking the per CPU free lists, but those do
not contain fully utilized vmap blocks because they are removed from the
free list once the blocks free space became zero.

When the block is not fully unmapped then it is not on the purge list
either.

So neither the per CPU list iteration nor the purge list walk find the
block and if the page was mapped via such a block and the TLB has not yet
been flushed, the guarantee of _vm_unmap_aliases() that there are no stale
TLBs after returning is broken:

x = vb_alloc() // Removes vmap_block from free list because vb->free became 0
vb_free(x)     // Unmaps page and marks in dirty_min/max range
	       // Block has still mappings and is not put on purge list

// Page is reused
vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL

So instead of walking the per CPU free lists, walk the per CPU xarrays
which hold pointers to _all_ active blocks in the system including those
removed from the free lists.

Link: https://lkml.kernel.org/r/20230525122342.109672430@linutronix.de
Link: https://lkml.kernel.org/r/20230525124504.573987880@linutronix.de
Fixes: db64fe02 ("mm: rewrite vmap layer")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fc1e0d98

kmemleak-test: drop __init to get better backtrace · dcb8cbb5

Jim Cromie authored May 25, 2023

Drop the __init on kmemleak_test_init().  With it, the storage is
reclaimed, but then the symbol isn't available for "%pS" rendering,
and the backtrace gets a bare pointer where the actual leak happened.

unreferenced object 0xffff88800a2b0800 (size 1024):
  comm "modprobe", pid 413, jiffies 4294953430
  hex dump (first 32 bytes):
    73 02 00 00 75 01 00 68 02 00 00 01 00 00 00 04  s...u..h........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000fabad728>] kmalloc_trace+0x26/0x90
    [<00000000ef738764>] 0xffffffffc02350a2
    [<00000000004e5795>] do_one_initcall+0x43/0x210
    [<00000000d768905e>] do_init_module+0x4a/0x210
    [<0000000087135ab5>] __do_sys_finit_module+0x93/0xf0
    [<000000004fcb1fa2>] do_syscall_64+0x34/0x80
    [<00000000c73c8d9d>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

with __init gone, that trace entry renders like:

    [<00000000ef738764>] kmemleak_test_init+<offset>/<size>

Link: https://lkml.kernel.org/r/20230525174356.69711-1-jim.cromie@gmail.comSigned-off-by: Jim Cromie <jim.cromie@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

dcb8cbb5

mm: multi-gen LRU: cleanup lru_gen_test_recent() · d7f1afd0

T.J. Alumbaugh authored May 22, 2023

Avoid passing memcg* and pglist_data* to lru_gen_test_recent()
since we only use the lruvec anyway.

Link: https://lkml.kernel.org/r/20230522112058.2965866-4-talumbau@google.comSigned-off-by: T.J. Alumbaugh <talumbau@google.com>
Reviewed-by: Yuanchu Xie <yuanchu@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d7f1afd0

mm: multi-gen LRU: add helpers in page table walks · bd02df41

T.J. Alumbaugh authored May 22, 2023

Add helpers to page table walking code:
 - Clarifies intent via name "should_walk_mmu" and "should_clear_pmd_young"
 - Avoids repeating same logic in two places

Link: https://lkml.kernel.org/r/20230522112058.2965866-3-talumbau@google.comSigned-off-by: T.J. Alumbaugh <talumbau@google.com>
Reviewed-by: Yuanchu Xie <yuanchu@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

bd02df41

mm: multi-gen LRU: cleanup lru_gen_soft_reclaim() · 5c7e7a0d

T.J. Alumbaugh authored May 22, 2023

lru_gen_soft_reclaim() gets the lruvec from the memcg and node ID to keep a
cleaner interface on the caller side.

Link: https://lkml.kernel.org/r/20230522112058.2965866-2-talumbau@google.comSigned-off-by: T.J. Alumbaugh <talumbau@google.com>
Reviewed-by: Yuanchu Xie <yuanchu@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5c7e7a0d

mm: multi-gen LRU: use macro for bitmap · 0285762c

T.J. Alumbaugh authored May 22, 2023

Use DECLARE_BITMAP macro when possible.

Link: https://lkml.kernel.org/r/20230522112058.2965866-1-talumbau@google.comSigned-off-by: T.J. Alumbaugh <talumbau@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

0285762c

selftests: cgroup: fix unexpected failure on test_memcg_low · 19ab3657

Haifeng Xu authored May 22, 2023

Since commit f079a020 ("selftests: memcg: factor out common parts of
memory.{low,min} tests"), the value used in second alloc_anon has changed
from 148M to 170M.  Because memory.low allows reclaiming page cache in
child cgroups, so the memory.current is close to 30M instead of 50M. 
Therefore, adjust the expected value of parent cgroup.

Link: https://lkml.kernel.org/r/20230522095233.4246-2-haifeng.xu@shopee.com
Fixes: f079a020 ("selftests: memcg: factor out common parts of memory.{low,min} tests")
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

19ab3657

mm/memcontrol: fix typo in comment · 08e0f49e

Haifeng Xu authored May 22, 2023

Replace 'then' with 'than'.

Link: https://lkml.kernel.org/r/20230522095233.4246-1-haifeng.xu@shopee.comSigned-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

08e0f49e

mm/mlock: rename mlock_future_check() to mlock_future_ok() · b0cc5e89

Andrew Morton authored May 22, 2023

It is felt that the name mlock_future_check() is vague - it doesn't
particularly convey the function's operation.  mlock_future_ok() is a
clearer name for a predicate function.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b0cc5e89

mm/mmap: refactor mlock_future_check() · 3c54a298

Lorenzo Stoakes authored May 22, 2023

In all but one instance, mlock_future_check() is treated as a boolean
function despite returning an error code. In one instance, this error
code is ignored and replaced with -ENOMEM.

This is confusing, and the inversion of true -> failure, false -> success
is not warranted. Convert the function to a bool, lightly refactor and
return true if the check passes, false if not.

Link: https://lkml.kernel.org/r/20230522082412.56685-1-lstoakes@gmail.comSigned-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3c54a298

selftests/mm: gup_longterm: add liburing tests · 89207c66

David Hildenbrand authored May 19, 2023

Similar to the COW selftests, also use io_uring fixed buffers to test if
long-term page pinning works as expected.

Link: https://lkml.kernel.org/r/20230519102723.185721-4-david@redhat.comSigned-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

89207c66

selftests/mm: gup_longterm: new functional test for FOLL_LONGTERM · c879462a

David Hildenbrand authored May 19, 2023

Let's add a new test for checking whether GUP long-term page pinning works
as expected (R/O vs.  R/W, MAP_PRIVATE vs.  MAP_SHARED, GUP vs. 
GUP-fast).  Note that COW handling with long-term R/O pinning in private
mappings, and pinning of anonymous memory in general, is tested by the COW
selftest.  This test, therefore, focuses on page pinning in file mappings.

The most interesting case is probably the "local tmpfile" case, as that
will likely end up on a "real" filesystem such as ext4 or xfs, not on a
virtual one like tmpfs or hugetlb where any long-term page pinning is
always expected to succeed.

For now, only add tests that use the "/sys/kernel/debug/gup_test"
interface.  We'll add tests based on liburing separately next.

[akpm@linux-foundation.org: update .gitignore for gup_longterm, per Peter]
Link: https://lkml.kernel.org/r/20230519102723.185721-3-david@redhat.comSigned-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

c879462a

selftests/mm: factor out detection of hugetlb page sizes into vm_util · 81b1e3f9

David Hildenbrand authored May 19, 2023

Patch series "selftests/mm: new test for FOLL_LONGTERM on file mappings".

Let's add some selftests to make sure that:
* R/O long-term pinning always works of file mappings
* R/W long-term pinning always works in MAP_PRIVATE file mappings
* R/W long-term pinning only works in MAP_SHARED mappings with special
  filesystems (shmem, hugetlb) and fails with other filesystems (ext4, btrfs,
  xfs).

The tests make use of the gup_test kernel module to trigger ordinary GUP
and GUP-fast, and liburing (similar to our COW selftests).  Test with
memfd, memfd hugetlb, tmpfile() and mkstemp().  The latter usually gives
us a "real" filesystem (ext4, btrfs, xfs) where long-term pinning is
expected to fail.

Note that these selftests don't contain any actual reproducers for data
corruptions in case R/W long-term pinning on problematic filesystems
"would" work.

Maybe we can later come up with a racy !FOLL_LONGTERM reproducer that can
reuse an existing interface to trigger short-term pinning (I'll look into
that next).

On current mm/mm-unstable:
	# ./gup_longterm
	# [INFO] detected hugetlb page size: 2048 KiB
	# [INFO] detected hugetlb page size: 1048576 KiB
	TAP version 13
	1..50
	# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
	ok 1 Should have worked
	# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
	ok 2 Should have worked
	# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
	ok 3 Should have failed
	# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
	ok 4 Should have worked
	# [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
	ok 5 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
	ok 6 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
	ok 7 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
	ok 8 Should have failed
	# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
	ok 9 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
	ok 10 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd
	ok 11 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
	ok 12 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
	ok 13 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
	ok 14 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
	ok 15 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd
	ok 16 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile
	ok 17 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile
	ok 18 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
	ok 19 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
	ok 20 Should have worked
	# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
	ok 21 Should have worked
	# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
	ok 22 Should have worked
	# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
	ok 23 Should have worked
	# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
	ok 24 Should have worked
	# [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
	ok 25 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
	ok 26 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
	ok 27 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
	ok 28 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
	ok 29 Should have worked
	# [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
	ok 30 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
	ok 31 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile
	ok 32 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile
	ok 33 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
	ok 34 Should have worked
	# [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
	ok 35 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd
	ok 36 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile
	ok 37 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile
	ok 38 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
	ok 39 Should have worked
	# [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
	ok 40 Should have worked
	# [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with memfd
	ok 41 Should have worked
	# [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with tmpfile
	ok 42 Should have worked
	# [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with local tmpfile
	ok 43 Should have failed
	# [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
	ok 44 Should have worked
	# [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB)
	ok 45 Should have worked
	# [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with memfd
	ok 46 Should have worked
	# [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with tmpfile
	ok 47 Should have worked
	# [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with local tmpfile
	ok 48 Should have worked
	# [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB)
	ok 49 Should have worked
	# [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB)
	ok 50 Should have worked
	# Totals: pass:50 fail:0 xfail:0 xpass:0 skip:0 error:0


This patch (of 3):

Let's factor detection out into vm_util, to be reused by a new test.

Link: https://lkml.kernel.org/r/20230519102723.185721-1-david@redhat.com
Link: https://lkml.kernel.org/r/20230519102723.185721-2-david@redhat.comSigned-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

81b1e3f9

mm: compaction: avoid GFP_NOFS ABBA deadlock · 4fbbb3fd

Johannes Weiner authored May 19, 2023

During stress testing with higher-order allocations, a deadlock scenario
was observed in compaction: One GFP_NOFS allocation was sleeping on
mm/compaction.c::too_many_isolated(), while all CPUs in the system were
busy with compactors spinning on buffer locks held by the sleeping
GFP_NOFS allocation.

Reclaim is susceptible to this same deadlock; we fixed it by granting
GFP_NOFS allocations additional LRU isolation headroom, to ensure it makes
forward progress while holding fs locks that other reclaimers might
acquire. Do the same here.

This code has been like this since compaction was initially merged, and I
only managed to trigger this with out-of-tree patches that dramatically
increase the contexts that do GFP_NOFS compaction. While the issue is
real, it seems theoretical in nature given existing allocation sites.
Worth fixing now, but no Fixes tag or stable CC.

Link: https://lkml.kernel.org/r/20230519111359.40475-1-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4fbbb3fd

mm: compaction: have compaction_suitable() return bool · 3cf04937

Johannes Weiner authored Jun 02, 2023

Since it only returns COMPACT_CONTINUE or COMPACT_SKIPPED now, a bool
return value simplifies the callsites.

Link: https://lkml.kernel.org/r/20230602151204.GD161817@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3cf04937

mm: compaction: drop redundant watermark check in compaction_zonelist_suitable() · 1c9568e8

Johannes Weiner authored May 19, 2023

The watermark check in compaction_zonelist_suitable(), called from
should_compact_retry(), is sandwiched between two watermark checks
already: before, there are freelist attempts as part of direct reclaim and
direct compaction; after, there is a last-minute freelist attempt in
__alloc_pages_may_oom().

The check in compaction_zonelist_suitable() isn't necessary. Kill it.

Link: https://lkml.kernel.org/r/20230519123959.77335-6-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

1c9568e8

mm: compaction: remove unnecessary is_via_compact_memory() checks · f98a497e

Johannes Weiner authored May 19, 2023

Remove from all paths not reachable via /proc/sys/vm/compact_memory.

Link: https://lkml.kernel.org/r/20230519123959.77335-5-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

f98a497e

mm: compaction: refactor __compaction_suitable() · e8606320

Johannes Weiner authored May 19, 2023

__compaction_suitable() is supposed to check for available migration
targets.  However, it also checks whether the operation was requested via
/proc/sys/vm/compact_memory, and whether the original allocation request
can already succeed.  These don't apply to all callsites.

Move the checks out to the callers, so that later patches can deal with
them one by one.  No functional change intended.

[hannes@cmpxchg.org: fix comment, per Vlastimil]
  Link: https://lkml.kernel.org/r/20230602144942.GC161817@cmpxchg.org
Link: https://lkml.kernel.org/r/20230519123959.77335-4-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

e8606320

mm: compaction: simplify should_compact_retry() · 511a69b2

Johannes Weiner authored May 19, 2023

The different branches for retry are unnecessarily complicated.  There are
really only three outcomes: progress (retry n times), skipped (retry if
reclaim can help), failed (retry with higher priority).

Rearrange the branches and the retry counter to make it simpler.

[hannes@cmpxchg.org: restore behavior when hitting max_retries]
  Link: https://lkml.kernel.org/r/20230602144705.GB161817@cmpxchg.org
Link: https://lkml.kernel.org/r/20230519123959.77335-3-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

511a69b2

mm: compaction: remove compaction result helpers · ecd8b292

Johannes Weiner authored May 19, 2023

Patch series "mm: compaction: cleanups & simplifications".

These compaction cleanups are split out from the huge page allocator
series[1], as requested by reviewer feedback.

[1] https://lore.kernel.org/linux-mm/20230418191313.268131-1-hannes@cmpxchg.org/


This patch (of 5):

The compaction result helpers encode quirks that are specific to the
allocator's retry logic.  E.g.  COMPACT_SUCCESS and COMPACT_COMPLETE
actually represent failures that should be retried upon, and so on.  I
frequently found myself pulling up the helper implementation in order to
understand and work on the retry logic.  They're not quite clean
abstractions; rather they split the retry logic into two locations.

Remove the helpers and inline the checks.  Then comment on the result
interpretations directly where the decision making happens.

Link: https://lkml.kernel.org/r/20230519123959.77335-1-hannes@cmpxchg.org
Link: https://lkml.kernel.org/r/20230519123959.77335-2-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ecd8b292

mm: page_alloc: set sysctl_lowmem_reserve_ratio storage-class-specifier to static · 62069aac

Tom Rix authored May 18, 2023

smatch reports
mm/page_alloc.c:247:5: warning: symbol
  'sysctl_lowmem_reserve_ratio' was not declared. Should it be static?

This variable is only used in its defining file, so it should be static

Link: https://lkml.kernel.org/r/20230518141119.927074-1-trix@redhat.comSigned-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

62069aac

mm: avoid rewalk in mmap_region · 5c1c03de

Liam R. Howlett authored May 18, 2023

If the iterator has moved to the previous entry, then step forward one
range, back to the gap.

Link: https://lkml.kernel.org/r/20230518145544.1722059-36-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5c1c03de

mm: add vma_iter_{next,prev}_range() to vma iterator · bb5dbd22

Liam R. Howlett authored May 18, 2023

Add functionality to the VMA iterator to advance and retreat one offset
within the maple tree, regardless of the value contained.  This can lead
to less re-walking to find an area of interest, especially when there is
nothing in that offset.

Link: https://lkml.kernel.org/r/20230518145544.1722059-35-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

bb5dbd22

maple_tree: update testing code for mas_{next,prev,walk} · eb2e817f

Liam R. Howlett authored May 18, 2023

Now that the functions have changed the limits, update the testing of the
maple tree to test these new settings.

Link: https://lkml.kernel.org/r/20230518145544.1722059-34-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

eb2e817f

maple_tree: clear up index and last setting in single entry tree · 6b23a290

Liam R. Howlett authored May 18, 2023

When there is a single entry tree (range of 0-0 pointing to an entry),
then ensure the limit is either 0-0 or 1-oo, depending on where the user
walks.  Ensure the correct node setting as well; either MAS_ROOT or
MAS_NONE.

Link: https://lkml.kernel.org/r/20230518145544.1722059-33-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6b23a290

maple_tree: add mas_prev_range() and mas_find_range_rev interface · 6b9e93e0

Liam R. Howlett authored May 18, 2023

Some users of the maple tree may want to move to the previous range
regardless of the value stored there.  Add this interface as well as the
'find' variant to support walking to the first value, then iterating over
the previous ranges.

Link: https://lkml.kernel.org/r/20230518145544.1722059-32-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6b9e93e0

maple_tree: introduce mas_prev_slot() interface · dd9a8513

Liam R. Howlett authored May 18, 2023

Sometimes the user needs to revert to the previous slot, regardless of if
it is empty or not.  Add an interface to go to the previous slot.

Since there can't be two consecutive NULLs in the tree, the mas_prev()
function can be implemented by calling mas_prev_slot() a maximum of 2
times.  Change the underlying interface to use mas_prev_slot() to align
the code.

Link: https://lkml.kernel.org/r/20230518145544.1722059-31-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

dd9a8513

maple_tree: relocate mas_rewalk() and mas_rewalk_if_dead() · de6e386c

Liam R. Howlett authored May 18, 2023

These functions need to move for future use.

Link: https://lkml.kernel.org/r/20230518145544.1722059-30-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

de6e386c

maple_tree: add mas_next_range() and mas_find_range() interfaces · 6169b553

Liam R. Howlett authored May 18, 2023

Some users of the maple tree may want to move to the next range in the
tree, even if it stores a NULL.  This family of function provides that
functionality by advancing one slot at a time and returning the result,
while mas_contiguous() will iterate over the range and stop on
encountering the first NULL.

Link: https://lkml.kernel.org/r/20230518145544.1722059-29-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6169b553

maple_tree: introduce mas_next_slot() interface · fff4a58c

Liam R. Howlett authored May 18, 2023

Sometimes, during a tree walk, the user needs the next slot regardless of
if it is empty or not.  Add an interface to get the next slot.

Since there are no consecutive NULLs allowed in the tree, the mas_next()
function can only advance two slots at most.  So use the new
mas_next_slot() interface to align both implementations.  Use this method
for mas_find() as well.

Link: https://lkml.kernel.org/r/20230518145544.1722059-28-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fff4a58c

maple_tree: fix testing mas_empty_area() · 17e7436b

Liam R. Howlett authored May 18, 2023

Empty area will return -EINVAL if the search window is smaller than the
requested size.  Fix the test case to check for this error code.

Link: https://lkml.kernel.org/r/20230518145544.1722059-27-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

17e7436b

maple_tree: revise limit checks in mas_empty_area{_rev}() · ba997212

Liam R. Howlett authored May 18, 2023

Since the maple tree is inclusive in range, ensure that a range of 1 (min
= max) works for searching for a gap in either direction, and make sure
the size is at least 1 but not larger than the delta between min and max.

This commit also updates the testing.  Unfortunately there isn't a way to
safely update the tests and code without a test failure.

Link: https://lkml.kernel.org/r/20230518145544.1722059-26-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ba997212

maple_tree: try harder to keep active node with mas_prev() · 39193685

Liam R. Howlett authored May 18, 2023

Keep a reference to the node when possible with mas_prev(). This will
avoid re-walking the tree. In keeping a reference to the node, keep the
last/index accurate to the range being referenced. This means the limit
may be within the range, but the range may extend outside of the limit.

Also fix the single entry tree to respect the range (of 0), or set the
node to MAS_NONE in the case of shifting beyond 0.

Link: https://lkml.kernel.org/r/20230518145544.1722059-25-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

39193685

maple_tree: try harder to keep active node after mas_next() · ca80f610

Liam R. Howlett authored May 18, 2023

Clean up the mas_next() call to try and keep a node reference when
possible.  This will avoid re-walking the tree in most cases.

Also clean up the single entry tree handling to ensure index/last are
consistent with what one would expect.  (returning NULL with limit of
1-oo).

Link: https://lkml.kernel.org/r/20230518145544.1722059-24-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ca80f610

mm/mmap: change do_vmi_align_munmap() for maple tree iterator changes · 15c0c60b

Liam R. Howlett authored May 18, 2023

The maple tree iterator clean up is incompatible with the way
do_vmi_align_munmap() expects it to behave.  Update the expected behaviour
to map now since the change will work currently.

Link: https://lkml.kernel.org/r/20230518145544.1722059-23-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

15c0c60b

maple_tree: mas_start() reset depth on dead node · d0411860

Liam R. Howlett authored May 18, 2023

When a dead node is detected, the depth has already been set to 1 so reset
it to 0.

Link: https://lkml.kernel.org/r/20230518145544.1722059-22-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d0411860

maple_tree: remove unnecessary check from mas_destroy() · 23e734ec

Liam R. Howlett authored May 18, 2023

mas_destroy currently checks if mas->node is MAS_START prior to calling
mas_start(), but this is unnecessary as mas_start() will do nothing if the
node is anything but MAS_START.

Link: https://lkml.kernel.org/r/20230518145544.1722059-21-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

23e734ec

maple_tree: add __init and __exit to test module · eaf9790d

Liam R. Howlett authored May 18, 2023

The test functions are not needed after the module is removed, so mark
them as such.  Add __exit to the module removal function.  Some other
variables have been marked as const static as well.

Link: https://lkml.kernel.org/r/20230518145544.1722059-20-Liam.Howlett@oracle.comSigned-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

eaf9790d