Commits · cb2d344c7551c250c07c009639eb81d54235ef8b · Kirill Smelkov / linux

25 Apr, 2022 13 commits

io_uring: helper for prep+queuing linked timeouts · cb2d344c

Pavel Begunkov authored Apr 15, 2022

We try to aggresively inline the submission path, so it's a good idea to
not pollute it with colder code. One of them is linked timeout
preparation + queue, which can be extracted into a function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ecf74df7ac77389b6d9211211ec4954e91de98ba.1650056133.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

cb2d344c

io_uring: inline io_free_req() · f5c6cf2a

Pavel Begunkov authored Apr 15, 2022

Inline io_free_req() into its only user and remove an underscore prefix
from __io_free_req().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ed114edef5c256a644f4839bb372df70d8df8e3f.1650056133.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f5c6cf2a

io_uring: kill io_put_req_deferred() · 4e118cd9

Pavel Begunkov authored Apr 15, 2022

We have several spots where a call to io_fill_cqe_req() is immediately
followed by io_put_req_deferred(). Replace them with
__io_req_complete_post() and get rid of io_put_req_deferred() and
io_fill_cqe_req().

> size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  86942   13734       8  100684   1894c ./fs/io_uring.o
> size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  86438   13654       8  100100   18704 ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/10672a538774ac8986bee6468d960527af59169d.1650056133.git.asml.silence@gmail.com
[axboe: fold in followup fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4e118cd9

io_uring: minor refactoring for some tw handlers · 971cf9c1

Pavel Begunkov authored Apr 15, 2022

Get rid of some useless local variables
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7798327b684b7015f7e4300420142ddfcd317297.1650056133.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

971cf9c1

io_uring: clean poll tw PF_EXITING handling · f2219057

Pavel Begunkov authored Apr 15, 2022

When we meet PF_EXITING in io_poll_check_events(), don't overcomplicate
the code with io_poll_mark_cancelled() but just return -ECANCELED and
the callers will deal with the rest.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0cc981af82a5b193658f8f44397eeb3bf838b7b.1650056133.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f2219057

io_uring: optimise io_get_cqe() · d8da428b

Pavel Begunkov authored Apr 12, 2022

io_get_cqe() is expensive because of a bunch of loads, masking, etc.
However, most of the time we should have enough of entries in the CQ,
so we can cache two pointers representing a range of contiguous CQE
memory we can use. When the range is exhausted we'll go through a slower
path to set up a new range. When there are no CQEs avaliable, pointers
will naturally point to the same address.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/487eeef00f3146537b3d9c1a9cef2fc0b9a86f81.1649771823.git.asml.silence@gmail.com
[axboe: santinel -> sentinel]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d8da428b

io_uring: optimise submission left counting · 1cd15904

Pavel Begunkov authored Apr 12, 2022

Considering all inlining io_submit_sqe() is huge and usually ends up
calling some other functions.

We decrement @left in io_submit_sqes() just before calling
io_submit_sqe() and use it later after the call. Considering how huge
io_submit_sqe() is, there is not much hope @left will be treated
gracefully by compilers.

Decrement it after the call, not only it's easier on register spilling
and probably saves stack write/read, but also at least for x64 uses
CPU flags set by the dec instead of doing (read/write and tests).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/807f9a276b54ee8ff4e42e2b78721484f1c71743.1649771823.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1cd15904

io_uring: optimise submission loop invariant · 8e6971a8

Pavel Begunkov authored Apr 12, 2022

Instead of keeping @submitted in io_submit_sqes(), which for each
iteration requires comparison with the initial number of SQEs, store the
number of SQEs left to submit. We'll need nr only for when we're done
with SQE handling.

note: if we can't allocate a req for the first SQE we always has been
returning -EAGAIN to the userspace, save this behaviour by looking into
the cache in a slow path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c3b3df9aeae4c2f7a53fd8386385742e4e261e77.1649771823.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8e6971a8

io_uring: add helper to return req to cache list · fa05457a

Pavel Begunkov authored Apr 12, 2022

Don't hand code wq_stack_add_head() to ->free_list, which serves for
recycling io_kiocb, add a helper doing it for us.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f206f575486a8dd3d52f074ab37ed146b2d215b7.1649771823.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

fa05457a

io_uring: helper for empty req cache checks · 88ab95be

Pavel Begunkov authored Apr 12, 2022

Add io_req_cache_empty(), which checks if there are requests in the
inline req cache or not. It'll be needed in the future, but also nicely
cleans up a few spots poking into ->free_list directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b18662389f3fb483d0bd07906647f65f6037475a.1649771823.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

88ab95be

io_uring: inline io_flush_cached_reqs · 23a5c43b

Pavel Begunkov authored Apr 12, 2022

io_flush_cached_reqs() isn't descriptive and has only one caller, inline
it into __io_alloc_req_refill().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec38abe65a883d9fe6b169793119ce86806655a4.1649771823.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

23a5c43b

io_uring: shrink final link flush · e126391c

Pavel Begunkov authored Apr 12, 2022

All good users should not set IOSQE_IO_*LINK flags for the last request
of a link. io_uring flushes collected links at the end of submission,
but it's not the optimal way and so we don't care too much about it.
Replace io_queue_sqe() call with io_queue_sqe_fallback() as the former
one is inlined and will generate a bunch of extra code. This will also
help compilers with the submission path inlining.

> size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  87265   13734       8  101007   18a8f ./fs/io_uring.o
> size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  87073   13734       8  100815   189cf ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/01fb5e417ef49925d544a0b0bae30409845ed2b4.1649771823.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e126391c

io_uring: memcpy CQE from req · 90e7c35f

Pavel Begunkov authored Apr 12, 2022

We can do CQE filling a bit more efficiently when req->cqe is fully
filled by memcpy()'ing it to the userspace instead of doing it field by
field. It's easier on register spilling, removes a couple of extra
loads/stores and write combines two u32 memory writes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ee3f514ff28b1fe3347a8eca93a9d91647f2eaad.1649771823.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

90e7c35f

24 Apr, 2022 27 commits

io_uring: explicitly keep a CQE in io_kiocb · cef216fc

Pavel Begunkov authored Apr 12, 2022

We already have req->{result,user_data,cflags}, which mimic struct
io_uring_cqe and are intended to store CQE data. Combine them into a
struct io_uring_cqe field.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1efe65d5005cd6a9ec3440767eb15a9fa9351cf.1649771823.git.asml.silence@gmail.com
[axboe: add mirror cqe to cater to fd union]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cef216fc

io_uring: rename io_sqe_file_register · 8b3171bd

Pavel Begunkov authored Apr 07, 2022

Rename io_sqe_file_register(), so the name better reflects what the
function is doing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d5091518883786969e244d2f0854a47bbdaa5061.1649334991.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8b3171bd

io_uring: deduplicate SCM accounting · 73b25d3b

Pavel Begunkov authored Apr 07, 2022

Merge io_sqe_file_register() and io_sqe_file_register(). The only
real difference left between them is from where we get an skb.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dddda3039c71fcbec24b3465cbe8c7e7ae7bb0e8.1649334991.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

73b25d3b

io_uring: don't pass around fixed index for scm · e390510a

Pavel Begunkov authored Apr 07, 2022

There is an old API nuisance where io_uring's SCM accounting functions
traverse fixed file tables and so requires them to be set in advance,
which leads to some implicit rules of how io_sqe_file_register() should
be used.

__io_sqe_files_scm() now works with only one file at a time, pass a file
directly and get rid of all fixed table dereferencing inside. Clean
io_sqe_file_register() callers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fb32031d892e61a7748c70da7999725d5e798671.1649334991.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e390510a

io_uring: refactor __io_sqe_files_scm · dca58c6a

Pavel Begunkov authored Apr 07, 2022

__io_sqe_files_scm() is now called only from one place passing a single
file, so nr argument can be killed and __io_sqe_files_scm() simplified.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66b492bc66dc8356d45d64076bb31d677d11a7c9.1649334991.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

dca58c6a

io_uring: uniform SCM accounting · a03a2a20

Pavel Begunkov authored Apr 07, 2022

Channel all SCM accounting through io_sqe_file_register(), so we do it
uniformely for updates and initial registration and can kill duplicated
code. Registration might be slightly slower in some case, but first we
skip most of SCM accounting now so it's not a problem. Moreover, it's
nicer for an empty set registration as we don't even try to allocate
skb for them anymore.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6c9afbeb22812777d0c43e52353b63db5b87ed1e.1649334991.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a03a2a20

io_uring: don't scm-account for non af_unix sockets · 1f59bc0f

Pavel Begunkov authored Apr 06, 2022

io_uring deals with file reference loops by registering all fixed files
in the SCM/GC infrastrucure. However, only a small subset of all file
types can keep long-term references to other files and those that don't
are not interesting for the garbage collector as they can't be in a
reference loop. They neither can be directly recycled by GC nor affect
loop searching.

Let's skip io_uring SCM accounting for loop-less files, i.e. all but
af_unix sockets, quite imroving fixed file updates performance and
greatly helpnig with memory footprint.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9c44ecf6e89d69130a8c4360cce2183ffc5ddd6f.1649277098.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1f59bc0f

io_uring: move finish_wait() outside of loop in cqring_wait() · b4f20bb4

Jens Axboe authored Mar 25, 2022

We don't need to call this for every loop. This is particularly
troublesome if we are task_work intensive, and get woken more often than
we desire due to that.

Just do it at the end, that's always safe as we initialize the waitqueue
list head anyway. This can save a considerable amount of hammering on
the waitqueue lock, which is also hot from the request completion side.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b4f20bb4

io_uring: refactor io_req_add_compl_list() · 775a1f2f

Pavel Begunkov authored Mar 25, 2022

A small refactoring for io_req_add_compl_list() deduplicating some code.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0a5272b45efe4ffc41cb79b99784e39c699aade.1648209006.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

775a1f2f

io_uring: silence io_for_each_link() warning · 963c6abb

Pavel Begunkov authored Mar 25, 2022

Some tooling keep complaining about self assignment in
io_for_each_link(), the code is correct but still let's workaround it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0de77b0b0f8309554ba6fba34327b7813bcc3ff.1648209006.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

963c6abb

io_uring: partially uninline io_put_task() · 9d170164

Pavel Begunkov authored Mar 25, 2022

In most cases io_put_task() is called from the submitter task and go
through a higly optimised fast path, which has to be inlined. The other
branch though is bulkier and we don't care about it as much because it
implies atomics and other heavy calls. Extract it into a helper, which
is expected not to be inlined.

[before] size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  89328   13646       8  102982   19246 ./fs/io_uring.o
[after] size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  89096   13646       8  102750   1915e ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dec213db0e0b8605132da81e0a0be687a4d140cb.1648209006.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

9d170164

io_uring: cleanup conditional submit locking · f8929630

Pavel Begunkov authored Mar 25, 2022

Refactor io_ring_submit_[un]lock(), make it accept issue_flags and
remove manual IO_URING_F_UNLOCKED checks. It also allows us to place
lockdep annotations inside instead of sprinkling them in a bunch of
places. There is only one user that doesn't fit now, so hand code
locking in __io_rsrc_put_work().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e55c2c06767676a801252e8094c9ab09912487a4.1648209006.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f8929630

io_uring: optimise mutex locking for submit+iopoll · d487b43c

Pavel Begunkov authored Mar 22, 2022

Both submittion and iopolling requires holding uring_lock. IOPOLL can
users do them together in a single syscall, however it would still do 2
pairs of lock/unlock. Optimise this case combining locking into one
lock/unlock pair, which especially nice for low QD.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/034b6c41658648ad3ad3c9485ac8eb546f010bc4.1647957378.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d487b43c

io_uring: pre-calculate syscall iopolling decision · 773697b6

Pavel Begunkov authored Mar 22, 2022

Syscall should only iopoll for events when it's a IOPOLL ring and is not
SQPOLL. Instead of check both flags every time we can save it in ring
flags so it's easier to use. We don't care much about an extra if there,
however it will be inconvenient to copy-paste this chunk with checks in
future patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7fd2f8fc2606305aa06dd8c0ff8f76a66b39c383.1647957378.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

773697b6

io_uring: split off IOPOLL argument verifiction · f81440d3

Pavel Begunkov authored Mar 22, 2022

IOPOLL doesn't use additional arguments like sigsets, but it still
needs some basic verification, which is currently done by
io_get_ext_arg(). This patch adds a separate function for the IOPOLL
path, which is a bit simpler and doesn't do extra. This prepares us for
further patches, which would have hurt inlining in the hot path otherwise.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/71b23fca412e3374b74be7711cfd42a3d9d5dfe0.1647957378.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f81440d3

io_uring: clean up io_queue_next() · 57859f4d

Pavel Begunkov authored Mar 21, 2022

Move fast check out of io_queue_next(), it makes req->flags checks in
__io_submit_flush_completions() a bit clearer and grants us better
comtrol, e.g. can remove now not justified unlikely() in
__io_submit_flush_completions(). Also, we don't care about having this
check in io_free_req() as the function is a slow path and
io_req_find_next() handles it correctly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1f9e1cc80adbb11b37017d511df4a2c6141a3f08.1647897811.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

57859f4d

io_uring: move poll recycling later in compl flushing · b605a7fa

Pavel Begunkov authored Mar 21, 2022

There is a new (req->flags & REQ_F_POLLED) check in
__io_submit_flush_completions() for poll recycling, however
io_free_batch_list() is a much better place for it. First, we prefer it
after putting the last req ref just to avoid potential problems in the
future. Also, it'll enable the recycling for IOPOLL and also will place
it closer to all other req->flags bits clean up requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/31dfe1dafda66ba3ce36b301884ec7e162c777d1.1647897811.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b605a7fa

io_uring: optimise io_free_batch_list · a538be5b

Pavel Begunkov authored Mar 21, 2022

We do several req->flags checks in the fast path of
io_free_batch_list(). One explicit check of REQ_F_REFCOUNT, and two
other hidden in io_queue_next() and io_dismantle_req(). Moreover, there
is a io_req_put_rsrc_locked() call in between, so there is no hope
req->flags will be preserved in registers.

All those flags if not a slow path than definitely a slower path, so
put them all under a single flags mask check and save several mem
reloads and ifs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0fb493f73f2009aea395c570c2932fecaa4e1244.1647897811.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a538be5b

io_uring: refactor io_req_find_next · 7819a1f6

Pavel Begunkov authored Mar 21, 2022

Move the fast path from io_req_find_next() into callers. It prepares us
for further changes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/10bd0e564472dde0c7f8d90ae317d05356cd565a.1647897811.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

7819a1f6

io_uring: remove extra ifs around io_commit_cqring · 60053be8

Pavel Begunkov authored Mar 21, 2022

Now io_commit_cqring() is simple and it tolerates well being called
without a new CQE filled, so kill a bunch of not needed anymore
guards.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/36aed692dff402bba00a444a63a9cd2e97a340ea.1647897811.git.asml.silence@gmail.com
[axboe: fold in followup fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

60053be8

io_uring: small optimisation of tctx_task_work · 68ca8fc0

Pavel Begunkov authored Mar 21, 2022

There should be no completions stashed when we first get into
tctx_task_work(), so move completion flushing checks a bit later
after we had a chance to execute some task works.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c6765c804f3c438591b9825ab9c43d22039073c4.1647897811.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

68ca8fc0

Linux 5.18-rc4 · af2d861d
Linus Torvalds authored Apr 24, 2022

af2d861d

Merge tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 42740a2f

Linus Torvalds authored Apr 24, 2022

Pull scheduler fix from Borislav Petkov:

 - Fix a corner case when calculating sched runqueue variables

That fix also removes a check for a zero divisor in the code, without
mentioning it.  Vincent clarified that it's ok after I whined about it:

  https://lore.kernel.org/all/CAKfTPtD2QEyZ6ADd5WrwETMOX0XOwJGnVddt7VHgfURdqgOS-Q@mail.gmail.com/

* tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/pelt: Fix attach_entity_load_avg() corner case

42740a2f

Merge tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5206548f

Linus Torvalds authored Apr 24, 2022

Pull powerpc fixes from Michael Ellerman:

 - Partly revert a change to our timer_interrupt() that caused lockups
   with high res timers disabled.

 - Fix a bug in KVM TCE handling that could corrupt kernel memory.

 - Two commits fixing Power9/Power10 perf alternative event selection.

Thanks to Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic
Barrat, Madhavan Srinivasan, Miguel Ojeda, and Nicholas Piggin.

* tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/perf: Fix 32bit compile
  powerpc/perf: Fix power10 event alternatives
  powerpc/perf: Fix power9 event alternatives
  KVM: PPC: Fix TCE handling for VFIO
  powerpc/time: Always set decrementer in timer_interrupt()

5206548f

Merge tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f48ffef1

Linus Torvalds authored Apr 24, 2022

Pull perf fixes from Borislav Petkov:

 - Add Sapphire Rapids CPU support

 - Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use)

* tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support
  perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled

f48ffef1

Merge tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · b877ca4d

Linus Torvalds authored Apr 24, 2022

Pull EDAC fix from Borislav Petkov:

 - Read the reported error count from the proper register on
   synopsys_edac

* tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/synopsys: Read the error count from the correct register

b877ca4d

kvmalloc: use vmalloc_huge for vmalloc allocations · 9becb688

Linus Torvalds authored Apr 22, 2022

Since commit 559089e0 ("vmalloc: replace VM_NO_HUGE_VMAP with
VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an
opt-in strategy, because it caused a number of problems that weren't
noticed until x86 enabled it too.

One of the issues was fixed by Nick Piggin in commit 3b8000ae
("mm/vmalloc: huge vmalloc backing pages should be split rather than
compound"), but I'm still worried about page protection issues, and
VM_FLUSH_RESET_PERMS in particular.

However, like the hash table allocation case (commit f2edd118:
"page_alloc: use vmalloc_huge for large system hash"), the use of
kvmalloc() should be safe from any such games, since the returned
pointer might be a SLUB allocation, and as such no user should
reasonably be using it in any odd ways.

We also know that the allocations are fairly large, since it falls back
to the vmalloc case only when a kmalloc() fails.  So using a hugepage
mapping seems both safe and relevant.

This patch does show a weakness in the opt-in strategy: since the opt-in
flag is in the 'vm_flags', not the usual gfp_t allocation flags, very
few of the usual interfaces actually expose it.

That's not much of an issue in this case that already used one of the
fairly specialized low-level vmalloc interfaces for the allocation, but
for a lot of other vmalloc() users that might want to opt in, it's going
to be very inconvenient.

We'll either have to fix any compatibility problems, or expose it in the
gfp flags (__GFP_COMP would have made a lot of sense) to allow normal
vmalloc() users to use hugepage mappings.  That said, the cases that
really matter were probably already taken care of by the hash tabel
allocation.

Link: https://lore.kernel.org/all/20220415164413.2727220-1-song@kernel.org/
Link: https://lore.kernel.org/all/CAHk-=whao=iosX1s5Z4SF-ZGa-ebAukJoAdUJFk5SPwnofV+Vg@mail.gmail.com/
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Song Liu <songliubraving@fb.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9becb688