- 23 May, 2024 1 commit
-
-
Yu Kuai authored
Writing 'power' and 'submit_queues' concurrently will trigger kernel panic: Test script: modprobe null_blk nr_devices=0 mkdir -p /sys/kernel/config/nullb/nullb0 while true; do echo 1 > submit_queues; echo 4 > submit_queues; done & while true; do echo 1 > power; echo 0 > power; done Test result: BUG: kernel NULL pointer dereference, address: 0000000000000148 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:__lock_acquire+0x41d/0x28f0 Call Trace: <TASK> lock_acquire+0x121/0x450 down_write+0x5f/0x1d0 simple_recursive_removal+0x12f/0x5c0 blk_mq_debugfs_unregister_hctxs+0x7c/0x100 blk_mq_update_nr_hw_queues+0x4a3/0x720 nullb_update_nr_hw_queues+0x71/0xf0 [null_blk] nullb_device_submit_queues_store+0x79/0xf0 [null_blk] configfs_write_iter+0x119/0x1e0 vfs_write+0x326/0x730 ksys_write+0x74/0x150 This is because del_gendisk() can concurrent with blk_mq_update_nr_hw_queues(): nullb_device_power_store nullb_apply_submit_queues null_del_dev del_gendisk nullb_update_nr_hw_queues if (!dev->nullb) // still set while gendisk is deleted return 0 blk_mq_update_nr_hw_queues dev->nullb = NULL Fix this problem by resuing the global mutex to protect nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs. Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured") Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240523153934.1937851-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 22 May, 2024 2 commits
-
-
Dr. David Alan Gilbert authored
'avg_latency_bucket' is unused since commit bf20ab53 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20240522172458.334173-1-linux@treblig.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
With the following two conditions, bio will be lost: 1) blk plug is not enabled, for example, __blkdev_direct_IO_simple() and __blkdev_direct_IO_async(); 2) bio plug is enabled, for example write IO for raid1/raid10 while bitmap is enabled; Root cause is that blk_finish_plug() will add the bio to curent->bio_list, while such bio will not be handled: __submit_bio_noacct current->bio_list = bio_list_on_stack; blk_start_plug do { dm_submit_bio md_handle_request raid10_write_request -> generate new bio for underlying disks raid1_add_bio_to_plug -> bio is added to plug } while ((bio = bio_list_pop(&bio_list_on_stack[0]))) -> previous bio are all handled blk_finish_plug raid10_unplug raid1_submit_write submit_bio_noacct if (current->bio_list) bio_list_add(¤t->bio_list[0], bio) -> add new bio current->bio_list = NULL -> new bio is lost Fix the problem by moving the plug into the while loop, so that current->bio_list will still be handled after blk_finish_plug(). By the way, enable plug for raid1/raid10 in this case will also prevent delay IO handling into daemon thread, which should also improve IO performance. Fixes: 060406c6 ("block: add plug while submitting IO") Reported-by: Changhui Zhong <czhong@redhat.com> Closes: https://lore.kernel.org/all/CAGVVp+Xsmzy2G9YuEatfMT6qv1M--YdOCQ0g7z7OVmcTbBxQAg@mail.gmail.com/Signed-off-by: Yu Kuai <yukuai3@huawei.com> Tested-by: Changhui Zhong <czhong@redhat.com> Link: https://lore.kernel.org/r/20240521200308.983986-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 20 May, 2024 1 commit
-
-
Jeff Johnson authored
Fix the allmodconfig 'make W=1' issue: WARNING: modpost: missing MODULE_DESCRIPTION() in block/t10-pi.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240516-md-t10-pi-v1-1-44a3469374aa@quicinc.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 17 May, 2024 1 commit
-
-
Ming Lei authored
Commit a46c2702 ("blk-mq: don't schedule block kworker on isolated CPUs") rules out isolated CPUs from hctx->cpumask, and hctx->cpumask should only be used for scheduling kworker. Add helper blk_mq_cpu_mapped_to_hctx() and apply it into cpuhp handlers. This patch avoids to forget clearing INACTIVE of hctx state in case that one isolated CPU becomes online, and fixes hang issue when allocating request from this hctx's tags. Cc: Raju Cheerla <rcheerla@redhat.com> Fixes: a46c2702 ("blk-mq: don't schedule block kworker on isolated CPUs") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240517020514.149771-1-ming.lei@redhat.comTested-by: Raju Cheerla <rcheerla@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 May, 2024 3 commits
-
-
Waiman Long authored
During a cgroup_rstat_flush() call, the lowest level of nodes are flushed first before their parents. Since commit 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()"), iostat propagation was still done to the parent. Grandparent, however, may not get the iostat update if the parent has no blkg_iostat_set queued in its lhead lockless list. Fix this iostat propagation problem by queuing the parent's global blkg->iostat into one of its percpu lockless lists to make sure that the delta will always be propagated up to the grandparent and so on toward the root blkcg. Note that successive calls to __blkcg_rstat_flush() are serialized by the cgroup_rstat_lock. So no special barrier is used in the reading and writing of blkg->iostat.lqueued. Fixes: 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()") Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com> Closes: https://lore.kernel.org/lkml/ZkO6l%2FODzadSgdhC@dschatzberg-fedora-PF3DHTBV/Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240515143059.276677-1-longman@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
__blkcg_rstat_flush() can be run anytime, especially when blk_cgroup_bio_start is being executed. If WRITE of `->lqueued` is re-ordered with READ of 'bisc->lnode.next' in the loop of __blkcg_rstat_flush(), `next_bisc` can be assigned with one stat instance being added in blk_cgroup_bio_start(), then the local list in __blkcg_rstat_flush() could be corrupted. Fix the issue by adding one barrier. Cc: Tejun Heo <tj@kernel.org> Cc: Waiman Long <longman@redhat.com> Fixes: 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240515013157.443672-3-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Since commit 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()"), each iostat instance is added to blkcg percpu list, so blkcg_reset_stats() can't reset the stat instance by memset(), otherwise the llist may be corrupted. Fix the issue by only resetting the counter part. Cc: Tejun Heo <tj@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Jay Shin <jaeshin@redhat.com> Fixes: 3b8cc629 ("blk-cgroup: Optimize blkcg_rstat_flush()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20240515013157.443672-2-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 May, 2024 1 commit
-
-
Justin Stitt authored
When running syzkaller with the newly reintroduced signed integer wrap sanitizer we encounter this splat: [ 366.015950] UBSAN: signed-integer-overflow in ../drivers/cdrom/cdrom.c:2361:33 [ 366.021089] -9223372036854775808 - 346321 cannot be represented in type '__s64' (aka 'long long') [ 366.025894] program syz-executor.4 is using a deprecated SCSI ioctl, please convert it to SG_IO [ 366.027502] CPU: 5 PID: 28472 Comm: syz-executor.7 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1 [ 366.027512] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 366.027518] Call Trace: [ 366.027523] <TASK> [ 366.027533] dump_stack_lvl+0x93/0xd0 [ 366.027899] handle_overflow+0x171/0x1b0 [ 366.038787] ata1.00: invalid multi_count 32 ignored [ 366.043924] cdrom_ioctl+0x2c3f/0x2d10 [ 366.063932] ? __pm_runtime_resume+0xe6/0x130 [ 366.071923] sr_block_ioctl+0x15d/0x1d0 [ 366.074624] ? __pfx_sr_block_ioctl+0x10/0x10 [ 366.077642] blkdev_ioctl+0x419/0x500 [ 366.080231] ? __pfx_blkdev_ioctl+0x10/0x10 ... Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang. It was re-enabled in the kernel with Commit 557f8c58 ("ubsan: Reintroduce signed overflow sanitizer"). Let's rearrange the check to not perform any arithmetic, thus not tripping the sanitizer. Link: https://github.com/llvm/llvm-project/pull/82432 [1] Closes: https://github.com/KSPP/linux/issues/354 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/lkml/20240507-b4-sio-ata1-v1-1-810ffac6080a@google.comReviewed-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/lkml/ZjqU0fbzHrlnad8D@equinoxSigned-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20240507222520.1445-2-phil@philpotter.co.ukSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 May, 2024 7 commits
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe updates and fixes from Keith: "nvme updates for Linux 6.10 - Fabrics connection retries (Daniel, Hannes) - Fabrics logging enhancements (Tokunori) - RDMA delete optimization (Sagi)" * tag 'nvme-6.10-2024-05-14' of git://git.infradead.org/nvme: nvme-rdma, nvme-tcp: include max reconnects for reconnect logging nvmet-rdma: Avoid o(n^2) loop in delete_ctrl nvme: do not retry authentication failures nvme-fabrics: short-circuit reconnect retries nvme: return kernel error codes for admin queue connect nvmet: return DHCHAP status codes from nvmet_setup_auth() nvmet: lock config semaphore when accessing DH-HMAC-CHAP key
-
Bart Van Assche authored
Both nbd_send_cmd() and nbd_handle_cmd() return either a negative error number or a positive blk_status_t value. nbd_queue_rq() converts these return values into a blk_status_t value. There is a bug in the conversion code: if nbd_send_cmd() returns BLK_STS_RESOURCE, nbd_queue_rq() should return BLK_STS_RESOURCE instead of BLK_STS_OK. Fix this, move the conversion code into nbd_handle_cmd() and fix the remaining sparse warnings. This patch fixes the following sparse warnings: drivers/block/nbd.c:673:32: warning: incorrect type in return expression (different base types) drivers/block/nbd.c:673:32: expected int drivers/block/nbd.c:673:32: got restricted blk_status_t [usertype] drivers/block/nbd.c:714:48: warning: incorrect type in return expression (different base types) drivers/block/nbd.c:714:48: expected int drivers/block/nbd.c:714:48: got restricted blk_status_t [usertype] drivers/block/nbd.c:1120:21: warning: incorrect type in assignment (different base types) drivers/block/nbd.c:1120:21: expected int [assigned] ret drivers/block/nbd.c:1120:21: got restricted blk_status_t [usertype] drivers/block/nbd.c:1125:16: warning: incorrect type in return expression (different base types) drivers/block/nbd.c:1125:16: expected restricted blk_status_t drivers/block/nbd.c:1125:16: got int [assigned] ret Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Markus Pargmann <mpa@pengutronix.de> Fixes: fc17b653 ("blk-mq: switch ->queue_rq return value to blk_status_t") Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240510202313.25209-6-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
blk_rq_bytes() returns an unsigned int while 'size' has type unsigned long. This is confusing. Improve code readability by removing the local variable 'size'. Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240510202313.25209-5-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Document locking assumptions with lockdep_assert_held() instead of source code comments. The advantage of lockdep_assert_held() is that it is verified at runtime if lockdep is enabled in the kernel config. Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240510202313.25209-4-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
In Linux kernel code it is preferred not to use a cast when converting a void pointer to another pointer type. Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240510202313.25209-3-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
This patch fixes the following sparse warnings: drivers/block/nbd.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/nbd.h): ./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer drivers/block/nbd.c: note: in included file (through include/trace/perf.h, include/trace/define_trace.h, include/trace/events/nbd.h): ./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240510202313.25209-2-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
The ramdisk memory utilization can only go up when data is written to new pages. Implement discard to provide the possibility to reduce memory usage for pages no longer in use. Aligned discards will free the associated pages, if any, and determinisitically return zeroed data until written again. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240429102308.147627-1-kbusch@meta.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 13 May, 2024 24 commits
-
-
Bart Van Assche authored
Fix the following sparse warnings: drivers/block/null_blk/main.c:1243:35: warning: incorrect type in return expression (different base types) drivers/block/null_blk/main.c:1243:35: expected int drivers/block/null_blk/main.c:1243:35: got restricted blk_status_t drivers/block/null_blk/main.c:1291:30: warning: incorrect type in return expression (different base types) drivers/block/null_blk/main.c:1291:30: expected restricted blk_status_t drivers/block/null_blk/main.c:1291:30: got int Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240510201816.24921-1-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
By default, this will be 511, as that's the block layer default. But drivers these days can support memory alignments that aren't tied to the sector sizes, instead just being limited by what the DMA engine supports. An example is NVMe, where it's generally set to a 32-bit or 64-bit boundary. As ublk itself doesn't really care, just set it low enough that we don't run into issues with NVMe where the required O_DIRECT memory alignment is now more restrictive on ublk than it is on the underlying device. This was triggered by spurious -EINVAL returns on O_DIRECT IO on a setup with ublk managing NVMe devices, which previously worked just fine on the NVMe device itself. With the alignment relaxed, the test works fine. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Linus Torvalds authored
Merge tag 'tag-chrome-platform-firmware-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform firmware updates from Tzung-Bi Shih: - Set driver owner in the core registration so that coreboot drivers don't need to set it individually * tag 'tag-chrome-platform-firmware-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: firmware: google: cbmem: drop driver owner initialization firmware: coreboot: store owner from modules with coreboot_driver_register()
-
Linus Torvalds authored
Merge tag 'tag-chrome-platform-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform updates from Tzung-Bi Shih: "New: - Support Framework Laptop 13 and 16 (AMD Ryzen) Improvements: - Use sysfs_emit() instead of sprintf() for sysfs' show() Fixes: - Fix flex-array-member-not-at-end compiler warnings by using DEFINE_RAW_FLEX() - Add HAS_IOPORT dependencies - Fix long pending events during suspend after resume Misc cleanups: - Provide ID tables for avoiding fallback match - Replace deprecated UNIVERSAL_DEV_PM_OPS()" * tag 'tag-chrome-platform-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (22 commits) platform/chrome: cros_ec: Handle events during suspend after resume completion platform/chrome: cros_ec_lpc: add quirks for the Framework Laptop (AMD) platform/chrome: cros_ec_lpc: add a "quirks" system platform/chrome: cros_ec_lpc: pass driver_data from DMI to the device platform/chrome: cros_ec_lpc: introduce a priv struct for the lpc device platform/chrome: add HAS_IOPORT dependencies platform/chrome: cros_hps_i2c: Replace deprecated UNIVERSAL_DEV_PM_OPS() platform/chrome: cros_kbd_led_backlight: provide ID table for avoiding fallback match platform/chrome: wilco_ec: core: provide ID table for avoiding fallback match platform/chrome: wilco_ec: event: remove redundant MODULE_ALIAS platform/chrome: wilco_ec: debugfs: provide ID table for avoiding fallback match platform/chrome: wilco_ec: telemetry: provide ID table for avoiding fallback match platform/chrome: cros_ec_vbc: provide ID table for avoiding fallback match platform/chrome: cros_ec_lightbar: provide ID table for avoiding fallback match platform/chrome: cros_ec_sysfs: provide ID table for avoiding fallback match platform/chrome: cros_ec_debugfs: provide ID table for avoiding fallback match platform/chrome: cros_ec_chardev: provide ID table for avoiding fallback match platform/chrome: cros_usbpd_notify: provide ID table for avoiding fallback match platform/chrome: cros_usbpd_logger: provide ID table for avoiding fallback match platform/chrome: cros_ec_sensorhub: provide ID table for avoiding fallback match ...
-
https://github.com/Rust-for-Linux/linuxLinus Torvalds authored
Pull Rust updates from Miguel Ojeda: "The most notable change is the drop of the 'alloc' in-tree fork. This is nicely reflected in the diffstat as a ~10k lines drop. In turn, this makes the version upgrades way simpler and smaller in the future, e.g. the latest one in commit 56f64b37 ("rust: upgrade to Rust 1.78.0"). More importantly, this increases the chances that a newer compiler version just works, which in turn means supporting several compiler versions is easier now. Thus we will look into finally setting a minimum version in the near future. Toolchain and infrastructure: - Upgrade to Rust 1.78.0 This time around, due to how the kernel and Rust schedules have aligned, there are two upgrades in fact. These allow us to remove one more unstable feature ('offset_of') from the list, among other improvements - Drop 'alloc' in-tree fork of the standard library crate, which means all the unstable features used by 'alloc' (~30 language ones, ~60 library ones) are not a concern anymore - Support DWARFv5 via the '-Zdwarf-version' flag - Support zlib and zstd debuginfo compression via the '-Zdebuginfo-compression' flag 'kernel' crate: - Support allocation flags ('GFP_*'), particularly in 'Box' (via 'BoxExt'), 'Vec' (via 'VecExt'), 'Arc' and 'UniqueArc', as well as in the 'init' module APIs - Remove usage of the 'allocator_api' unstable feature - Remove 'try_' prefix in allocation APIs' names - Add 'VecExt' (an extension trait) to be able to drop the 'alloc' fork - Add the '{make,to}_{upper,lower}case()' methods to 'CStr'/'CString' - Add the 'as_ptr' method to 'ThisModule' - Add the 'from_raw' method to 'ArcBorrow' - Add the 'into_unique_or_drop' method to 'Arc' - Display column number in the 'dbg!' macro output by applying the equivalent change done to the standard library one - Migrate 'Work' to '#[pin_data]' thanks to the changes in the 'macros' crate, which allows to remove an unsafe call in its 'new' associated function - Prevent namespacing issues when using the '[try_][pin_]init!' macros by changing the generated name of guard variables - Make the 'get' method in 'Opaque' const - Implement the 'Default' trait for 'LockClassKey' - Remove unneeded 'kernel::prelude' imports from doctests - Remove redundant imports 'macros' crate: - Add 'decl_generics' to 'parse_generics()' to support default values, and use that to allow them in '#[pin_data]' Helpers: - Trivial English grammar fix Documentation: - Add section on Rust Kselftests to the 'Testing' document - Expand the 'Abstractions vs. bindings' section of the 'General Information' document" * tag 'rust-6.10' of https://github.com/Rust-for-Linux/linux: (31 commits) rust: alloc: fix dangling pointer in VecExt<T>::reserve() rust: upgrade to Rust 1.78.0 rust: kernel: remove redundant imports rust: sync: implement `Default` for `LockClassKey` docs: rust: extend abstraction and binding documentation docs: rust: Add instructions for the Rust kselftest rust: remove unneeded `kernel::prelude` imports from doctests rust: update `dbg!()` to format column number rust: helpers: Fix grammar in comment rust: init: change the generated name of guard variables rust: sync: add `Arc::into_unique_or_drop` rust: sync: add `ArcBorrow::from_raw` rust: types: Make Opaque::get const rust: kernel: remove usage of `allocator_api` unstable feature rust: init: update `init` module to take allocation flags rust: sync: update `Arc` and `UniqueArc` to take allocation flags rust: alloc: update `VecExt` to take allocation flags rust: alloc: introduce the `BoxExt` trait rust: alloc: introduce allocation flags rust: alloc: remove our fork of the `alloc` crate ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto updates from Herbert Xu: "API: - Remove crypto stats interface Algorithms: - Add faster AES-XTS on modern x86_64 CPUs - Forbid curves with order less than 224 bits in ecc (FIPS 186-5) - Add ECDSA NIST P521 Drivers: - Expose otp zone in atmel - Add dh fallback for primes > 4K in qat - Add interface for live migration in qat - Use dma for aes requests in starfive - Add full DMA support for stm32mpx in stm32 - Add Tegra Security Engine driver Others: - Introduce scope-based x509_certificate allocation" * tag 'v6.10-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (123 commits) crypto: atmel-sha204a - provide the otp content crypto: atmel-sha204a - add reading from otp zone crypto: atmel-i2c - rename read function crypto: atmel-i2c - add missing arg description crypto: iaa - Use kmemdup() instead of kzalloc() and memcpy() crypto: sahara - use 'time_left' variable with wait_for_completion_timeout() crypto: api - use 'time_left' variable with wait_for_completion_killable_timeout() crypto: caam - i.MX8ULP donot have CAAM page0 access crypto: caam - init-clk based on caam-page0-access crypto: starfive - Use fallback for unaligned dma access crypto: starfive - Do not free stack buffer crypto: starfive - Skip unneeded fallback allocation crypto: starfive - Skip dma setup for zeroed message crypto: hisilicon/sec2 - fix for register offset crypto: hisilicon/debugfs - mask the unnecessary info from the dump crypto: qat - specify firmware files for 402xx crypto: x86/aes-gcm - simplify GCM hash subkey derivation crypto: x86/aes-gcm - delete unused GCM assembly code crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath() hwrng: stm32 - repair clock handling ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull hardening updates from Kees Cook: "The bulk of the changes here are related to refactoring and expanding the KUnit tests for string helper and fortify behavior. Some trivial strncpy replacements in fs/ were carried in my tree. Also some fixes to SCSI string handling were carried in my tree since the helper for those was introduce here. Beyond that, just little fixes all around: objtool getting confused about LKDTM+KCFI, preparing for future refactors (constification of sysctl tables, additional __counted_by annotations), a Clang UBSAN+i386 crash fix, and adding more options in the hardening.config Kconfig fragment. Summary: - selftests: Add str*cmp tests (Ivan Orlov) - __counted_by: provide UAPI for _le/_be variants (Erick Archer) - Various strncpy deprecation refactors (Justin Stitt) - stackleak: Use a copy of soon-to-be-const sysctl table (Thomas Weißschuh) - UBSAN: Work around i386 -regparm=3 bug with Clang prior to version 19 - Provide helper to deal with non-NUL-terminated string copying - SCSI: Fix older string copying bugs (with new helper) - selftests: Consolidate string helper behavioral tests - selftests: add memcpy() fortify tests - string: Add additional __realloc_size() annotations for "dup" helpers - LKDTM: Fix KCFI+rodata+objtool confusion - hardening.config: Enable KCFI" * tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (29 commits) uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be} stackleak: Use a copy of the ctl_table argument string: Add additional __realloc_size() annotations for "dup" helpers kunit/fortify: Fix replaced failure path to unbreak __alloc_size hardening: Enable KCFI and some other options lkdtm: Disable CFI checking for perms functions kunit/fortify: Add memcpy() tests kunit/fortify: Do not spam logs with fortify WARNs kunit/fortify: Rename tests to use recommended conventions init: replace deprecated strncpy with strscpy_pad kunit/fortify: Fix mismatched kvalloc()/vfree() usage scsi: qla2xxx: Avoid possible run-time warning with long model_num scsi: mpi3mr: Avoid possible run-time warning with long manufacturer strings scsi: mptfusion: Avoid possible run-time warning with long manufacturer strings fs: ecryptfs: replace deprecated strncpy with strscpy hfsplus: refactor copy_name to not use strncpy reiserfs: replace deprecated strncpy with scnprintf virt: acrn: replace deprecated strncpy with strscpy ubsan: Avoid i386 UBSAN handler crashes with Clang ubsan: Remove 1-element array usage in debug reporting ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull execve updates from Kees Cook: - Provide knob to change (previously fixed) coredump NOTES size (Allen Pais) - Add sched_prepare_exec tracepoint (Marco Elver) - Make /proc/$pid/auxv work under binfmt_elf_fdpic (Max Filippov) - Convert ARCH_HAVE_EXTRA_ELF_NOTES to proper Kconfig (Vignesh Balasubramanian) - Leave a gap between .bss and brk * tag 'execve-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fs/coredump: Enable dynamic configuration of max file note size binfmt_elf_fdpic: fix /proc/<pid>/auxv binfmt_elf: Leave a gap between .bss and brk Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig tracing: Add sched_prepare_exec tracepoint
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull seccomp update from Kees Cook: - Prepare for sysctl table constification * tag 'seccomp-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Constify sysctl subhelpers
-
git://git.kernel.dk/linuxLinus Torvalds authored
Pull block updates from Jens Axboe: - Add a partscan attribute in sysfs, fixing an issue with systemd relying on an internal interface that went away. - Attempt #2 at making long running discards interruptible. The previous attempt went into 6.9, but we ended up mostly reverting it as it had issues. - Remove old ida_simple API in bcache - Support for zoned write plugging, greatly improving the performance on zoned devices. - Remove the old throttle low interface, which has been experimental since 2017 and never made it beyond that and isn't being used. - Remove page->index debugging checks in brd, as it hasn't caught anything and prepares us for removing in struct page. - MD pull request from Song - Don't schedule block workers on isolated CPUs * tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux: (84 commits) blk-throttle: delay initialization until configuration blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW block: fix that util can be greater than 100% block: support to account io_ticks precisely block: add plug while submitting IO bcache: fix variable length array abuse in btree_iter bcache: Remove usage of the deprecated ida_simple_xx() API md: Revert "md: Fix overflow in is_mddev_idle" blk-lib: check for kill signal in ioctl BLKDISCARD block: add a bio_await_chain helper block: add a blk_alloc_discard_bio helper block: add a bio_chain_and_submit helper block: move discard checks into the ioctl handler block: remove the discard_granularity check in __blkdev_issue_discard block/ioctl: prefer different overflow check null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION() block: fix and simplify blkdevparts= cmdline parsing block: refine the EOF check in blkdev_iomap_begin block: add a partscan sysfs attribute for disks block: add a disk_has_partscan helper ...
-
git://git.kernel.dk/linuxLinus Torvalds authored
Pull io_uring updates from Jens Axboe: - Greatly improve send zerocopy performance, by enabling coalescing of sent buffers. MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the io_uring side did not. In local testing, the crossover point for send zerocopy being faster is now around 3000 byte packets, and it performs better than the sync syscall variants as well. This feature relies on a shared branch with net-next, which was pulled into both branches. - Unification of how async preparation is done across opcodes. Previously, opcodes that required extra memory for async retry would allocate that as needed, using on-stack state until that was the case. If async retry was needed, the on-stack state was adjusted appropriately for a retry and then copied to the allocated memory. This led to some fragile and ugly code, particularly for read/write handling, and made storage retries more difficult than they needed to be. Allocate the memory upfront, as it's cheap from our pools, and use that state consistently both initially and also from the retry side. - Move away from using remap_pfn_range() for mapping the rings. This is really not the right interface to use and can cause lifetime issues or leaks. Additionally, it means the ring sq/cq arrays need to be physically contigious, which can cause problems in production with larger rings when services are restarted, as memory can be very fragmented at that point. Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply the same treatment to mapped ring provided buffers. This also helps unify the code we have dealing with allocating and mapping memory. Hard to see in the diffstat as we're adding a few features as well, but this kills about ~400 lines of code from the codebase as well. - Add support for bundles for send/recv. When used with provided buffers, bundles support sending or receiving more than one buffer at the time, improving the efficiency by only needing to call into the networking stack once for multiple sends or receives. - Tweaks for our accept operations, supporting both a DONTWAIT flag for skipping poll arm and retry if we can, and a POLLFIRST flag that the application can use to skip the initial accept attempt and rely purely on poll for triggering the operation. Both of these have identical flags on the receive side already. - Make the task_work ctx locking unconditional. We had various code paths here that would do a mix of lock/trylock and set the task_work state to whether or not it was locked. All of that goes away, we lock it unconditionally and get rid of the state flag indicating whether it's locked or not. The state struct still exists as an empty type, can go away in the future. - Add support for specifying NOP completion values, allowing it to be used for error handling testing. - Use set/test bit for io-wq worker flags. Not strictly needed, but also doesn't hurt and helps silence a KCSAN warning. - Cleanups for io-wq locking and work assignments, closing a tiny race where cancelations would not be able to find the work item reliably. - Misc fixes, cleanups, and improvements * tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits) io_uring: support to inject result for NOP io_uring: fail NOP if non-zero op flags is passed in io_uring/net: add IORING_ACCEPT_POLL_FIRST flag io_uring/net: add IORING_ACCEPT_DONTWAIT flag io_uring/filetable: don't unnecessarily clear/reset bitmap io_uring/io-wq: Use set_bit() and test_bit() at worker->flags io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring io_uring: Require zeroed sqe->len on provided-buffers send io_uring/notif: disable LAZY_WAKE for linked notifs io_uring/net: fix sendzc lazy wake polling io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it io_uring/rw: reinstate thread check for retries io_uring/notif: implement notification stacking io_uring/notif: simplify io_notif_flush() net: add callback for setting a ubuf_info to skb net: extend ubuf_info callback to ops structure io_uring/net: support bundles for recv io_uring/net: support bundles for send io_uring/kbuf: add helpers for getting/peeking multiple buffers io_uring/net: add provided buffer support for IORING_OP_SEND ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds authored
Pull vfs rw iterator updates from Christian Brauner: "The core fs signalfd, userfaultfd, and timerfd subsystems did still use f_op->read() instead of f_op->read_iter(). Convert them over since we should aim to get rid of f_op->read() at some point. Aside from that io_uring and others want to mark files as FMODE_NOWAIT so it can make use of per-IO nonblocking hints to enable more efficient IO. Converting those users to f_op->read_iter() allows them to be marked with FMODE_NOWAIT" * tag 'vfs-6.10.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: signalfd: convert to ->read_iter() userfaultfd: convert to ->read_iter() timerfd: convert to ->read_iter() new helper: copy_to_iter_full()
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds authored
Pull netfs updates from Christian Brauner: "This reworks the netfslib writeback implementation so that pages read from the cache are written to the cache through ->writepages(), thereby allowing the fscache page flag to be retired. The reworking also: - builds on top of the new writeback_iter() infrastructure - makes it possible to use vectored write RPCs as discontiguous streams of pages can be accommodated - makes it easier to do simultaneous content crypto and stream division - provides support for retrying writes and re-dividing a stream - replaces the ->launder_folio() op, so that ->writepages() is used instead - uses mempools to allocate the netfs_io_request and netfs_io_subrequest structs to avoid allocation failure in the writeback path Some code that uses the fscache page flag is retained for compatibility purposes with nfs and ceph. The code is switched to using the synonymous private_2 label instead and marked with deprecation comments. The merge commit contains additional details on the new algorithm that I've left out of here as it would probably be excessively detailed. On top of the netfslib infrastructure this contains the work to convert cifs over to netfslib" * tag 'vfs-6.10.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) cifs: Enable large folio support cifs: Remove some code that's no longer used, part 3 cifs: Remove some code that's no longer used, part 2 cifs: Remove some code that's no longer used, part 1 cifs: Cut over to using netfslib cifs: Implement netfslib hooks cifs: Make add_credits_and_wake_if() clear deducted credits cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs cifs: Set zero_point in the copy_file_range() and remap_file_range() cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c cifs: Replace the writedata replay bool with a netfs sreq flag cifs: Make wait_mtu_credits take size_t args cifs: Use more fields from netfs_io_subrequest cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest cifs: Use alternative invalidation to using launder_folio netfs, afs: Use writeback retry to deal with alternate keys netfs: Miscellaneous tidy ups netfs: Remove the old writeback code netfs: Cut over to using new writeback code ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds authored
Pull vfs mount API conversions from Christian Brauner: "This converts qnx6, minix, debugfs, tracefs, freevxfs, and openpromfs to the new mount api, further reducing the number of filesystems relying on the legacy mount api" * tag 'vfs-6.10.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: minix: convert minix to use the new mount api vfs: Convert tracefs to use the new mount API vfs: Convert debugfs to use the new mount API openpromfs: finish conversion to the new mount API freevxfs: Convert freevxfs to the new mount API. qnx6: convert qnx6 to use the new mount api
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds authored
Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That means we now have six free FMODE_* bits in total (but bit #6 already got used for FMODE_WRITE_RESTRICTED) - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup) - Add fd_raw cleanup class so we can make use of automatic cleanup provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well - Optimize seq_puts() - Simplify __seq_puts() - Add new anon_inode_getfile_fmode() api to allow specifying f_mode instead of open-coding it in multiple places - Annotate struct file_handle with __counted_by() and use struct_size() - Warn in get_file() whether f_count resurrection from zero is attempted (epoll/drm discussion) - Folio-sophize aio - Export the subvolume id in statx() for both btrfs and bcachefs - Relax linkat(AT_EMPTY_PATH) requirements - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors for dup*() equality replacing kcmp() Cleanups: - Compile out swapfile inode checks when swap isn't enabled - Use (1 << n) notation for FMODE_* bitshifts for clarity - Remove redundant variable assignment in fs/direct-io - Cleanup uses of strncpy in orangefs - Speed up and cleanup writeback - Move fsparam_string_empty() helper into header since it's currently open-coded in multiple places - Add kernel-doc comments to proc_create_net_data_write() - Don't needlessly read dentry->d_flags twice Fixes: - Fix out-of-range warning in nilfs2 - Fix ecryptfs overflow due to wrong encryption packet size calculation - Fix overly long line in xfs file_operations (follow-up to FMODE_* cleanup) - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up to FMODE_* cleanup) - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_* cleanup) - Fix stable offset api to prevent endless loops - Fix afs file server rotations - Prevent xattr node from overflowing the eraseblock in jffs2 - Move fdinfo PTRACE_MODE_READ procfs check into the .permission() operation instead of .open() operation since this caused userspace regressions" * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) afs: Fix fileserver rotation getting stuck selftests: add F_DUPDFD_QUERY selftests fcntl: add F_DUPFD_QUERY fcntl() file: add fd_raw cleanup class fs: WARN when f_count resurrection is attempted seq_file: Simplify __seq_puts() seq_file: Optimize seq_puts() proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation fs: Create anon_inode_getfile_fmode() xfs: don't call xfs_file_open from xfs_dir_open xfs: drop fop_flags for directories xfs: fix overly long line in the file_operations shmem: Fix shmem_rename2() libfs: Add simple_offset_rename() API libfs: Fix simple_offset_rename_exchange() jffs2: prevent xattr node from overflowing the eraseblock vfs, swap: compile out IS_SWAPFILE() on swapless configs vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements fs/direct-io: remove redundant assignment to variable retval fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds authored
Pull vfs iomap updates from Christian Brauner: "This contains a few cleanups to the iomap code. Nothing particularly stands out" * tag 'vfs-6.10.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: do some small logical cleanup in buffered write iomap: make iomap_write_end() return a boolean iomap: use a new variable to handle the written bytes in iomap_write_iter() iomap: don't increase i_size if it's not a write operation iomap: drop the write failure handles when unsharing and zeroing iomap: convert iomap_writepages to writeack_iter
-
git://git.lwn.net/linuxLinus Torvalds authored
Pull documentation updates from Jonathan Corbet: "Another not-too-busy cycle for documentation, including: - Some build-system changes to detect the variable fonts installed by some distributions that can break the PDF build. - Various updates and additions to the Spanish, Chinese, Italian, and Japanese translations. - Update the stable-kernel rules to match modern practice ... and the usual array of corrections, updates, and typo fixes" * tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits) cgroup: Add documentation for missing zswap memory.stat kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning. docs:core-api: fixed typos and grammar in printk-index page Documentation: tracing: Fix spelling mistakes docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4 docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4 docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4 docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4 docs: stable-kernel-rules: fix typo sent->send docs/zh_CN: remove two inconsistent spaces docs: scripts/check-variable-fonts.sh: Improve commands for detection docs: stable-kernel-rules: create special tag to flag 'no backporting' docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.) docs: stable-kernel-rules: remove code-labels tags and a indention level docs: stable-kernel-rules: call mainline by its name and change example docs: stable-kernel-rules: reduce redundancy docs, kprobes: Add riscv as supported architecture Docs: typos/spelling docs: kernel_include.py: Cope with docutils 0.21 docs: ja_JP/howto: Catch up update in v6.8 ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmddLinus Torvalds authored
Pull keys updates from Jarkko Sakkinen: - do not overwrite the key expiration once it is set - move key quota updates earlier into key_put(), instead of updating them in key_gc_unused_keys() * tag 'keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: keys: Fix overwrite of key expiration on instantiation keys: update key quotas in key_put()
-
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmddLinus Torvalds authored
Pull TPM updates from Jarkko Sakkinen: "These are the changes for the TPM driver with a single major new feature: TPM bus encryption and integrity protection. The key pair on TPM side is generated from so called null random seed per power on of the machine [1]. This supports the TPM encryption of the hard drive by adding layer of protection against bus interposer attacks. Other than that, a few minor fixes and documentation for tpm_tis to clarify basics of TPM localities for future patch review discussions (will be extended and refined over times, just a seed)" Link: https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/ [1] * tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (28 commits) Documentation: tpm: Add TPM security docs toctree entry tpm: disable the TPM if NULL name changes Documentation: add tpm-security.rst tpm: add the null key name as a sysfs export KEYS: trusted: Add session encryption protection to the seal/unseal path tpm: add session encryption protection to tpm2_get_random() tpm: add hmac checks to tpm2_pcr_extend() tpm: Add the rest of the session HMAC API tpm: Add HMAC session name/handle append tpm: Add HMAC session start and end functions tpm: Add TCG mandated Key Derivation Functions (KDFs) tpm: Add NULL primary creation tpm: export the context save and load commands tpm: add buffer function to point to returned parameters crypto: lib - implement library version of AES in CFB mode KEYS: trusted: tpm2: Use struct tpm_buf for sized buffers tpm: Add tpm_buf_read_{u8,u16,u32} tpm: TPM2B formatted buffers tpm: Store the length of the tpm_buf data separately. tpm: Update struct tpm_buf documentation comments ...
-
Linus Torvalds authored
Merge tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull trusted keys updates from Jarkko Sakkinen: "This contains a new key type for the Data Co-Processor (DCP), which is an IP core built into many NXP SoCs such as i.mx6ull" * tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: docs: trusted-encrypted: add DCP as new trust source docs: document DCP-backed trusted keys kernel params MAINTAINERS: add entry for DCP-based trusted keys KEYS: trusted: Introduce NXP DCP-backed trusted keys KEYS: trusted: improve scalability of trust source config crypto: mxs-dcp: Add support for hardware-bound keys
-
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slabLinus Torvalds authored
Pull slab updates from Vlastimil Babka: "This time it's mostly random cleanups and fixes, with two performance fixes that might have significant impact, but limited to systems experiencing particular bad corner case scenarios rather than general performance improvements. The memcg hook changes are going through the mm tree due to dependencies. - Prevent stalls when reading /proc/slabinfo (Jianfeng Wang) This fixes the long-standing problem that can happen with workloads that have alloc/free patterns resulting in many partially used slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse the long partial slab list under spinlock with disabled irqs and thus can stall other processes or even trigger the lockup detection. The traversal is only done to count free objects so that <active_objs> column can be reported along with <num_objs>. To avoid affecting fast paths with another shared counter (attempted in the past) or complex partial list traversal schemes that allow rescheduling, the chosen solution resorts to approximation - when the partial list is over 10000 slabs long, we will only traverse first 5000 slabs from head and tail each and use the average of those to estimate the whole list. Both head and tail are used as the slabs near head to tend to have more free objects than the slabs towards the tail. It is expected the approximation should not break existing /proc/slabinfo consumers. The <num_objs> field is still accurate and reflects the overall kmem_cache footprint. The <active_objs> was already imprecise due to cpu and percpu-partial slabs, so can't be relied upon to determine exact cache usage. The difference between <active_objs> and <num_objs> is mainly useful to determine the slab fragmentation, and that will be possible even with the approximation in place. - Prevent allocating many slabs when a NUMA node is full (Chen Jun) Currently, on NUMA systems with a node under significantly bigger pressure than other nodes, the fallback strategy may result in each kmalloc_node() that can't be safisfied from the preferred node, to allocate a new slab on a fallback node, and not reuse the slabs already on that node's partial list. This is now fixed and partial lists of fallback nodes are checked even for kmalloc_node() allocations. It's still preferred to allocate a new slab on the requested node before a fallback, but only with a GFP_NOWAIT attempt, which will fail quickly when the node is under a significant memory pressure. - More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee) - Fix slub_kunit self-test with hardened freelists (Guenter Roeck) - Mark racy accesses for KCSAN (linke li) - Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)" * tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slub: remove the check for NULL kmalloc_caches mm/slub: create kmalloc 96 and 192 caches regardless cache size order mm/slub: mark racy access on slab->freelist slub: use count_partial_free_approx() in slab_out_of_memory() slub: introduce count_partial_free_approx() slub: Set __GFP_COMP in kmem_cache by default mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc() mm/slub: correct comment in do_slab_free() mm/slub, kunit: Use inverted data to corrupt kmem cache mm/slub: simplify get_partial_node() mm/slub: add slub_get_cpu_partial() helper mm/slub: remove the check of !kmem_cache_has_cpu_partial() mm/slub: Reduce memory consumption in extreme scenarios mm/slub: mark racy accesses on slab->slabs mm/slub: remove dummy slabinfo functions
-
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcuLinus Torvalds authored
Pull kcsan update from Paul McKenney: "Introduce __data_racy type qualifier This adds a __data_racy type qualifier that enables kernel developers to inform KCSAN that a given variable is a shared variable without needing to mark each and every access. This allows pre-KCSAN code to be correctly (if approximately) instrumented withh very little effort, and also provides people reading the code a clear indication that the variable is in fact shared. In addition, it permits incremental transition to per-access KCSAN marking, so that (for example) a given subsystem can be transitioned one variable at a time, while avoiding large numbers of KCSAN warnings during this transition" * tag 'kcsan.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: kcsan, compiler_types: Introduce __data_racy type qualifier
-
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcuLinus Torvalds authored
Pull LKMM documentation updates from Paul McKenney: "This upgrades LKMM documentation, perhaps most notably adding a number of litmus tests illustrating cmpxchg() ordering properties. TL;DR: Failing cmpxchg() operations provide no ordering" * tag 'lkmm.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: Documentation/litmus-tests: Make cmpxchg() tests safe for klitmus Documentation/atomic_t: Emphasize that failed atomic operations give no ordering Documentation/litmus-tests: Demonstrate unordered failing cmpxchg Documentation/litmus-tests: Add locking tests to README
-
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcuLinus Torvalds authored
Pull cmpxchg updates from Paul McKenney: "Provide one-byte and two-byte cmpxchg() support on sparc32, parisc, and csky This provides native one-byte and two-byte cmpxchg() support for sparc32 and parisc, courtesy of Al Viro. This support is provided by the same hashed-array-of-locks technique used for the other atomic operations provided for these two platforms. There is also emulated one-byte cmpxchg() support for csky using a new cmpxchg_emu_u8() function that uses a four-byte cmpxchg() to emulate the one-byte variant. Similar patches for emulation of one-byte cmpxchg() for arc, sh, and xtensa have not yet received maintainer acks, so they are slated for the v6.11 merge window" * tag 'cmpxchg.2024.05.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: csky: Emulate one-byte cmpxchg lib: Add one-byte emulation function parisc: add u16 support to cmpxchg() parisc: add missing export of __cmpxchg_u8() parisc: unify implementations of __cmpxchg_u{8,32,64} parisc: __cmpxchg_u32(): lift conversion into the callers sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes sparc32: unify __cmpxchg_u{32,64} sparc32: make the first argument of __cmpxchg_u64() volatile u64 * sparc32: make __cmpxchg_u32() return u32
-