Commits · 76c68fbf1a1f97afed0c8f680ee4e3f4da3e720d · Kirill Smelkov / linux

09 May, 2022 25 commits

Stefan Roesch authored Apr 26, 2022

This enables large CQE's in the uring setup.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-12-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

76c68fbf

io_uring: support CQE32 in /proc info · f9b3dfcc

Stefan Roesch authored Apr 26, 2022

This exposes the extra1 and extra2 fields in the /proc output.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-11-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f9b3dfcc

io_uring: add tracing for additional CQE32 fields · c4bb964f

Stefan Roesch authored Apr 26, 2022

This adds tracing for the extra1 and extra2 fields.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c4bb964f

io_uring: overflow processing for CQE32 · e45a3e05

Stefan Roesch authored Apr 26, 2022

This adds the overflow processing for large CQE's.

This adds two parameters to the io_cqring_event_overflow function and
uses these fields to initialize the large CQE fields.

Allocate enough space for large CQE's in the overflow structue. If no
large CQE's are used, the size of the allocation is unchanged.

The cqe field can have a different size depending if its a large
CQE or not. To be able to allocate different sizes, the two fields
in the structure are re-ordered.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-9-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e45a3e05

io_uring: flush completions for CQE32 · 0e2e5c47

Stefan Roesch authored Apr 26, 2022

This flushes the completions according to their CQE type: the same
processing is done for the default CQE size, but for large CQE's the
extra1 and extra2 fields are filled in.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-8-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

0e2e5c47

io_uring: modify io_get_cqe for CQE32 · 2fee6bc6

Stefan Roesch authored Apr 26, 2022

Modify accesses to the CQE array to take large CQE's into account. The
index needs to be shifted by one for large CQE's.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-7-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2fee6bc6

io_uring: add CQE32 completion processing · effcf8bd

Stefan Roesch authored Apr 26, 2022

This adds the completion processing for the large CQE's and makes sure
that the extra1 and extra2 fields are passed through.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-6-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

effcf8bd

io_uring: add CQE32 setup processing · 91658798

Stefan Roesch authored Apr 26, 2022

This adds two new function to setup and fill the CQE32 result structure.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-5-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

91658798

io_uring: change ring size calculation for CQE32 · baf9cb64

Stefan Roesch authored Apr 26, 2022

This changes the function rings_size to take large CQE's into account.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-4-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

baf9cb64

io_uring: store add. return values for CQE32 · 4e5bc0a9

Stefan Roesch authored Apr 26, 2022

This reuses the hash list node for the storage we need to hold the two
64-bit values that must be passed back.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-3-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

4e5bc0a9

io_uring: support CQE32 in io_uring_cqe · 7a51e5b4

Stefan Roesch authored Apr 26, 2022

This adds the big_cqe array to the struct io_uring_cqe to support large
CQE's.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-2-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

7a51e5b4

io_uring: add support for 128-byte SQEs · ebdeb7c0

Jens Axboe authored Mar 31, 2022

Normal SQEs are 64-bytes in length, which is fine for all the commands
we support. However, in preparation for supporting passthrough IO,
provide an option for setting up a ring with 128-byte SQEs.

We continue to use the same type for io_uring_sqe, it's marked and
commented with a zero sized array pad at the end. This provides up
to 80 bytes of data for a passthrough command - 64 bytes for the
extra added data, and 16 bytes available at the end of the existing
SQE.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ebdeb7c0

Merge branch 'for-5.19/io_uring-socket' into for-5.19/io_uring-passthrough · b5ba65df

Jens Axboe authored May 09, 2022

* for-5.19/io_uring-socket:
  io_uring: use the text representation of ops in trace
  io_uring: rename op -> opcode
  io_uring: add io_uring_get_opcode
  io_uring: add type to op enum
  io_uring: add socket(2) support
  net: add __sys_socket_file()
  io_uring: fix trace for reduced sqe padding
  io_uring: add fgetxattr and getxattr support
  io_uring: add fsetxattr and setxattr support
  fs: split off do_getxattr from getxattr
  fs: split off setxattr_copy and do_setxattr function from setxattr

b5ba65df

Merge branch 'for-5.19/io_uring' into for-5.19/io_uring-passthrough · 13086899

Jens Axboe authored May 09, 2022

* for-5.19/io_uring: (85 commits)
  io_uring: don't clear req->kbuf when buffer selection is done
  io_uring: eliminate the need to track provided buffer ID separately
  io_uring: move provided buffer state closer to submit state
  io_uring: move provided and fixed buffers into the same io_kiocb area
  io_uring: abstract out provided buffer list selection
  io_uring: never call io_buffer_select() for a buffer re-select
  io_uring: get rid of hashed provided buffer groups
  io_uring: always use req->buf_index for the provided buffer group
  io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set
  io_uring: kill io_rw_buffer_select() wrapper
  io_uring: make io_buffer_select() return the user address directly
  io_uring: kill io_recv_buffer_select() wrapper
  io_uring: use 'sr' vs 'req->sr_msg' consistently
  io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg
  io_uring: check IOPOLL/ioprio support upfront
  io_uring: replace smp_mb() with smp_mb__after_atomic() in io_sq_thread()
  io_uring: add IORING_SETUP_TASKRUN_FLAG
  io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
  io_uring: set task_work notify method at init time
  io-wq: use __set_notify_signal() to wake workers
  ...

13086899

io_uring: don't clear req->kbuf when buffer selection is done · 7ccba24d

Jens Axboe authored May 01, 2022

It's not needed as the REQ_F_BUFFER_SELECTED flag tracks the state of
whether or not kbuf is valid, so just drop it.
Suggested-by: Dylan Yudaken <dylany@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7ccba24d

io_uring: eliminate the need to track provided buffer ID separately · 1dbd023e

Jens Axboe authored May 01, 2022

We have io_kiocb->buf_index which is used for either fixed buffers, or
for provided buffers. For the latter, it's used to hold the buffer group
ID for buffer selection. Post selection, req->kbuf->bid is used to get
the buffer ID.

Store the buffer ID, when selected, in req->buf_index. If we do end up
recycling the buffer, reset it back to the buffer group ID.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1dbd023e

io_uring: move provided buffer state closer to submit state · 660cbfa2

Jens Axboe authored May 01, 2022

The timeout and other items that follow are less hot, so let's move the
provided buffer state above that.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

660cbfa2

io_uring: move provided and fixed buffers into the same io_kiocb area · a4f8d94c

Jens Axboe authored Apr 30, 2022

These are mutually exclusive - if you use provided buffers, then you
cannot use fixed buffers and vice versa. Move them into the same spot
in the io_kiocb, which is also advantageous for provided buffers as
they get near the submit side hot cacheline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a4f8d94c

io_uring: abstract out provided buffer list selection · 149c69b0

Jens Axboe authored Apr 30, 2022

In preparation for providing another way to select a buffer, move the
existing logic into a helper.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

149c69b0

io_uring: never call io_buffer_select() for a buffer re-select · b66e65f4

Jens Axboe authored Apr 30, 2022

Callers already have room to store the addr and length information,
clean it up by having the caller just assign the previously provided
data.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b66e65f4

io_uring: get rid of hashed provided buffer groups · 9cfc7e94

Jens Axboe authored May 01, 2022

Use a plain array for any group ID that's less than 64, and punt
anything beyond that to an xarray. 64 fits in a page even for 4KB
page sizes and with the planned additions.

This makes the expected group usage faster by avoiding a hash and lookup
to find our list, and it uses less memory upfront by not allocating any
memory for provided buffers unless it's actually being used.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9cfc7e94

io_uring: always use req->buf_index for the provided buffer group · 4e906702

Jens Axboe authored Apr 28, 2022

The read/write opcodes use it already, but the recv/recvmsg do not. If
we switch them over and read and validate this at init time while we're
checking if the opcode supports it anyway, then we can do it in one spot
and we don't have to pass in a separate group ID for io_buffer_select().
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4e906702

io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set · bb68d504

Jens Axboe authored Apr 29, 2022

There's no point in validity checking buf_index if the request doesn't
have REQ_F_BUFFER_SELECT set, as we will never use it for that case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bb68d504

io_uring: kill io_rw_buffer_select() wrapper · e5b00349

Jens Axboe authored Apr 28, 2022

After the recent changes, this is direct call to io_buffer_select()
anyway. With this change, there are no wrappers left for provided
buffer selection.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e5b00349

io_uring: make io_buffer_select() return the user address directly · c54d52c2

Jens Axboe authored Apr 28, 2022

There's no point in having callers provide a kbuf, we're just returning
the address anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c54d52c2

08 May, 2022 15 commits

Linux 5.18-rc6 · c5eb0a61
Linus Torvalds authored May 08, 2022

c5eb0a61

Merge tag 'for-5.18/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · f002488d

Linus Torvalds authored May 08, 2022

Pull parisc architecture fixes from Helge Deller:
 "Some reverts of existing patches, which were necessary because of boot
  issues due to wrong CPU clock handling and cache issues which led to
  userspace segfaults with 32bit kernels. Dave has a whole bunch of
  upcoming cache fixes which I then plan to push in the next merge
  window.

  Other than that just small updates and fixes, e.g. defconfig updates,
  spelling fixes, a clocksource fix, boot topology fixes and a fix for
  /proc/cpuinfo output to satisfy lscpu"

* tag 'for-5.18/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  Revert "parisc: Increase parisc_cache_flush_threshold setting"
  parisc: Mark cr16 clock unstable on all SMP machines
  parisc: Fix typos in comments
  parisc: Change MAX_ADDRESS to become unsigned long long
  parisc: Merge model and model name into one line in /proc/cpuinfo
  parisc: Re-enable GENERIC_CPU_DEVICES for !SMP
  parisc: Update 32- and 64-bit defconfigs
  parisc: Only list existing CPUs in cpu_possible_mask
  Revert "parisc: Fix patch code locking and flushing"
  Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"
  Revert "parisc: Mark cr16 CPU clocksource unstable on all SMP machines"

f002488d

Merge tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · e3de3a1c

Linus Torvalds authored May 08, 2022

Pull powerpc fixes from Michael Ellerman:

 - Fix the DWARF CFI in our VDSO time functions, allowing gdb to
   backtrace through them correctly.

 - Fix a buffer overflow in the papr_scm driver, only triggerable by
   hypervisor input.

 - A fix in the recently added QoS handling for VAS (used for
   communicating with coprocessors).

Thanks to Alan Modra, Haren Myneni, Kajol Jain, and Segher Boessenkool.

* tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE
  powerpc/vdso: Fix incorrect CFI in gettimeofday.S
  powerpc/pseries/vas: Use QoS credits from the userspace

e3de3a1c

Merge tag 'x86-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 27b5d61c

Linus Torvalds authored May 08, 2022

Pull x86 fix from Thomas Gleixner:
 "A fix and an email address update:

   - Prevent FPU state corruption.

     The condition in irq_fpu_usable() grants FPU usage when the FPU is
     not used in the kernel. That's just wrong as it does not take the
     fpregs_lock()'ed regions into account. If FPU usage happens within
     such a region from interrupt context, then the FPU state gets
     corrupted.

     That's a long standing bug, which got unearthed by the recent
     changes to the random code.

   - Josh wants to use his kernel.org email address"

* tag 'x86-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Prevent FPU state corruption
  MAINTAINERS: Update Josh Poimboeuf's email address

27b5d61c

Merge tag 'timers-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea82593b

Linus Torvalds authored May 08, 2022

Pull timer fix from Thomas Gleixner:
 "A fix and an email address update:

   - Mark the NMI safe time accessors notrace to prevent tracer
     recursion when they are selected as trace clocks.

   - John Stultz has a new email address"

* tag 'timers-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Mark NMI safe time accessors as notrace
  MAINTAINERS: Update email address for John Stultz

ea82593b

Revert "parisc: Increase parisc_cache_flush_threshold setting" · ba0c0410

Helge Deller authored May 08, 2022

This reverts commit a58e9d09.

Triggers segfaults with 32-bit kernels on PA8500 machines.
Signed-off-by: Helge Deller <deller@gmx.de>

ba0c0410

Merge tag 'irq-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9692df05

Linus Torvalds authored May 08, 2022

Pull irq fix from Thomas Gleixner:
 "A fix for the threaded interrupt core.

  A quick sequence of request/free_irq() can result in a hang because
  the interrupt thread did not reach the thread function and got stopped
  in the kthread core already. That leaves a state active counter
  arround which makes a invocation of synchronized_irq() on that
  interrupt hang forever.

  Ensure that the thread reached the thread function in request_irq() to
  prevent that"

* tag 'irq-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Synchronize interrupt thread startup

9692df05

Merge tag 'locking-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ede4c6d7

Linus Torvalds authored May 08, 2022

Pull locking fixlet from Thomas Gleixner:
 "Just a email address update for MAINTAINERS and mailmap"

* tag 'locking-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: MAINTAINERS, .mailmap: Update André's email address

ede4c6d7

parisc: Mark cr16 clock unstable on all SMP machines · 340233dc

Helge Deller authored May 08, 2022

The cr16 interval timers are not synchronized across CPUs, even with just
one dual-core CPU. This becomes visible if the machines have a longer
uptime.
Signed-off-by: Helge Deller <deller@gmx.de>

340233dc

parisc: Fix typos in comments · a65bcad5

Julia Lawall authored Apr 30, 2022

Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Helge Deller <deller@gmx.de>

a65bcad5

parisc: Change MAX_ADDRESS to become unsigned long long · 234ff4c5

Helge Deller authored Apr 05, 2022

Dave noticed that for the 32-bit kernel MAX_ADDRESS should be a ULL,
otherwise this define would become 0:
	MAX_ADDRESS   (1UL << MAX_ADDRBITS)
It has no real effect on the kernel.
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>

234ff4c5

parisc: Merge model and model name into one line in /proc/cpuinfo · 5b89966b

Helge Deller authored Apr 03, 2022

The Linux tool "lscpu" shows the double amount of CPUs if we have
"model" and "model name" in two different lines in /proc/cpuinfo.
This change combines the model and the model name into one line.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org

5b89966b

parisc: Re-enable GENERIC_CPU_DEVICES for !SMP · 1955c4f8

Helge Deller authored Apr 01, 2022

In commit 62773112 ("parisc: Switch from GENERIC_CPU_DEVICES to
GENERIC_ARCH_TOPOLOGY") GENERIC_CPU_DEVICES was unconditionally turned
off, but this triggers a warning in topology_add_dev(). Turning it back
on for the !SMP case avoids this warning.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 62773112 ("parisc: Switch from GENERIC_CPU_DEVICES to GENERIC_ARCH_TOPOLOGY")
Signed-off-by: Helge Deller <deller@gmx.de>

1955c4f8

parisc: Update 32- and 64-bit defconfigs · 7e93a3dd

Helge Deller authored Apr 01, 2022

Enable CONFIG_CGROUPS=y on 32-bit defconfig for systemd-support, and
enable CONFIG_NAMESPACES and CONFIG_USER_NS.
Signed-off-by: Helge Deller <deller@gmx.de>

7e93a3dd

parisc: Only list existing CPUs in cpu_possible_mask · 0921244f

Helge Deller authored Apr 01, 2022

The inventory knows which CPUs are in the system, so this bitmask should
be in cpu_possible_mask instead of the bitmask based on CONFIG_NR_CPUS.

Reset the cpu_possible_mask before scanning the system for CPUs, and
mark each existing CPU as possible during initialization of that CPU.

This avoids those warnings later on too:

 register_cpu_capacity_sysctl: too early to get CPU4 device!
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>

0921244f