- 08 May, 2024 9 commits
-
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
- fix assorted (harmless) off-by-one errors - we were inconsistent on whether out->pos stays <= out->size on overflow; now it does, and printbuf.overflow exists to indicate if a printbuf has overflowed - factor out printbuf_advance_pos() - printbuf_nul_terminate_reserved(); use this to reduce the number of printbuf_make_room() calls Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We need to be able to test these paths in dry run mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
When a superblock write is silently dropped or it's been modified by another process we need to know which device it was. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Fix another shift-by-64 by factoring out a common helper for bch2_bkey_format_invalid() and bformat_needs_redo() (where it was already fixed). Reported-by: syzbot+9833a1d29d4a44361e2c@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Btree nodes are log structured; thus, we need to emit whiteouts when we're deleting a key that's been written out to disk. k->needs_whiteout tracks whether a key will need a whiteout when it's deleted, and this requires some careful handling; e.g. the key we're deleting may not have been written out to disk, but it may have overwritten a key that was - thus we need to carry this flag around on overwrites. Invariants: There may be multiple key for the same position in a given node (because of overwrites), but only one of them will be a live (non deleted) key, and only one key for a given position will have the needs_whiteout flag set. Additionally, we don't want to carry around whiteouts that need to be written in the main searchable part of a btree node - btree_iter_peek() will have to skip past them, and this can lead to an O(n^2) issues when doing sequential deletions (e.g. inode rm/truncate). So there's a separate region in the btree node buffer for unwritten whiteouts; these are merge sorted with the rest of the keys we're writing in the btree node write path. The unwritten whiteouts was a later optimization that bch2_sort_keys() didn't take into account; the unwritten whiteouts area means that we never have deleted keys with needs_whiteout set in the main searchable part of a btree node. That means we can simplify and optimize some sort paths, and eliminate an assertion that syzbot found: - Unless we're in the btree node write path, it's always ok to drop whiteouts when sorting - When sorting for a btree node write, we drop the whiteout if it's not from the unwritten whiteouts area, or if it's overwritten by a real key at the same position. This completely eliminates some tricky logic for propagating the needs_whiteout flag: syzbot was able to hit the assertion that checked that there shouldn't be more than one key at the same pos with needs_whiteout set, likely due to a combination of flipping on needs_whiteout on all written keys (they need whiteouts if overwritten), combined with not always dropping unneeded whiteouts, and the tricky logic in the sort path for preserving needs_whiteout that wasn't really needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
- 07 May, 2024 2 commits
-
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
bch2_write_super() was looping over online devices multiple times - dropping and retaking io_ref each time. This meant it could race with device removal; it could increment the sequence number on a device but fail to write it - and then if the device was re-added, it would get confused the next time around thinking a superblock write was silently dropped. Fix this by taking io_ref once, and stashing pointers to online devices in a darray. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
- 06 May, 2024 18 commits
-
-
Kent Overstreet authored
Define a constant for the max superblock size, to avoid a too-large shift. Reported-by: syzbot+a8b0fb419355c91dda7f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
bch2_fs_quota_read_inode() wasn't entirely updated to the bch2_snapshot_tree() helper, which takes rcu lock. Reported-by: syzbot+a3a9a61224ed3b7f0010@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Ancient versions of bcachefs produced packed formats that could represent keys that our in memory format cannot represent; bformat_needs_redo() has some tricky shifts to check for this sort of overflow. Reported-by: syzbot+594427aebfefeebe91c6@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
For forwards compatibility we have to allow unknown key types, and only run the checks that make sense against them. Fix a missing guard on k.k->type being known. Reported-by: syzbot+ae4dc916da3ce51f284f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We were forgetting to check for jset entries that overrun the end of the section - both in validate and to_text(); to_text() needs to be safe for types that fail to validate. Reported-by: syzbot+c48865e11e7e893ec4ab@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Reported-by: syzbot+10827fa6b176e1acf1d0@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Reed Riley authored
filefrag (and potentially other utilities that call fiemap) sometimes pass ULONG_MAX as the length. fiemap_prep clamps excessively large lengths - but the calculation of end can overflow if it occurs before calling fiemap_prep. When this happens, filefrag assumes it has read to the end and exits. Signed-off-by: Reed Riley <reed@riley.engineer> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
The bucket_gens array is a single array allocation (one byte per bucket), and kernel allocations are still limited to INT_MAX. Check this limit to avoid failing the bucket_gens array allocation. Reported-by: syzbot+b29f436493184ea42e2b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
bch2_get_next_dev() and bch2_get_next_online_dev() iterate over devices, dropping and taking refs as they go; we can't access the previous device (for ca->dev_idx) after we've dropped our ref to it, unless we take rcu_read_lock() first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
bch2_dev_lookup() is supposed to take a ref on the device it returns, but for_each_member_device() takes refs as it iterates, for_each_member_device_rcu() does not. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Normally this is initialized in __bch2_write(), which is executed in a loop, but the inline data path skips this. Reported-by: syzbot+fd3ccb331eb21f05d13b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Reported-by: syzbot+66b9b74f6520068596a9@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Reported-by: syzbot+a35cdb62ec34d44fb062@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We don't want the assert when we're checking if the backpointer is valid. Reported-by: syzbot+bf7215c0525098e7747a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Reported-by: syzbot+3333603f569fc2ef258c@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We're using mutex_lock() inside a wait_event() conditional - prepare_to_wait() has already flipped task state, so potentially blocking ops need annotation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
- 29 Apr, 2024 3 commits
-
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
- 28 Apr, 2024 6 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Ingo Molnar: - Fix EEVDF corner cases - Fix two nohz_full= related bugs that can cause boot crashes and warnings * tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU sched/isolation: Prevent boot crash when the boot CPU is nohz_full sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf() sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr sched/eevdf: Always update V if se->on_rq when reweighting
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Ingo Molnar: - Make the CPU_MITIGATIONS=n interaction with conflicting mitigation-enabling boot parameters a bit saner. - Re-enable CPU mitigations by default on non-x86 - Fix TDX shared bit propagation on mprotect() - Fix potential show_regs() system hang when PKE initialization is not fully finished yet. - Add the 0x10-0x1f model IDs to the Zen5 range - Harden #VC instruction emulation some more * tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n cpu: Re-enable CPU mitigations by default for !X86 architectures x86/tdx: Preserve shared bit on mprotect() x86/cpu: Fix check for RDPKRU in __show_regs() x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fix from Ingo Molnar: "Fix a double free bug in the init error path of the GICv3 irqchip driver" * tag 'irq-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Prevent double free on error
-
Oleg Nesterov authored
housekeeping_setup() checks cpumask_intersects(present, online) to ensure that the kernel will have at least one housekeeping CPU after smp_init(), but this doesn't work if the maxcpus= kernel parameter limits the number of processors available after bootup. For example, a kernel with "maxcpus=2 nohz_full=0-2" parameters crashes at boot time on a virtual machine with 4 CPUs. Change housekeeping_setup() to use cpumask_first_and() and check that the returned CPU number is valid and less than setup_max_cpus. Another corner case is "nohz_full=0" on a machine with a single CPU or with the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the WARN_ON(tick_nohz_full_running) in tick_sched_do_timer(). And how should the kernel interpret the "nohz_full=" parameter? It should be silently ignored, but currently cpulist_parse() happily returns the empty cpumask and this leads to the same problem. Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask) and do nothing in this case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Phil Auld <pauld@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240413141746.GA10008@redhat.com
-
Oleg Nesterov authored
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not include the boot CPU, which is no longer true after: 08ae95f4 ("nohz_full: Allow the boot CPU to be nohz_full"). However after: aae17ebb ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work") the kernel will crash at boot time in this case; housekeeping_any_cpu() returns an invalid CPU number until smp_init() brings the first housekeeping CPU up. Change housekeeping_any_cpu() to check the result of cpumask_any_and() and return smp_processor_id() in this case. This is just the simple and backportable workaround which fixes the symptom, but smp_processor_id() at boot time should be safe at least for type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu logic. There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to offline tick_do_timer_cpu (the 1st online housekeeping CPU). Fixes: aae17ebb ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work") Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Phil Auld <pauld@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
-
- 27 Apr, 2024 2 commits
-
-
https://github.com/Rust-for-Linux/linuxLinus Torvalds authored
Pull Rust fixes from Miguel Ojeda: - Soundness: make internal functions generated by the 'module!' macro inaccessible, do not implement 'Zeroable' for 'Infallible' and require 'Send' for the 'Module' trait. - Build: avoid errors with "empty" files and workaround 'rustdoc' ICE. - Kconfig: depend on '!CFI_CLANG' and avoid selecting 'CONSTRUCTORS'. - Code docs: remove non-existing key from 'module!' macro example. - Docs: trivial rendering fix in arch table. * tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux: rust: remove `params` from `module` macro example kbuild: rust: force `alloc` extern to allow "empty" Rust files kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICE rust: kernel: require `Send` for `Module` implementations rust: phy: implement `Send` for `Registration` rust: make mutually exclusive with CFI_CLANG rust: macros: fix soundness issue in `module!` macro rust: init: remove impl Zeroable for Infallible docs: rust: fix improper rendering in Arch Support page rust: don't select CONSTRUCTORS
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: - A fix for TASK_SIZE on rv64/NOMMU, to reflect the lack of user/kernel separation - A fix to avoid loading rv64/NOMMU kernel past the start of RAM - A fix for RISCV_HWPROBE_EXT_ZVFHMIN on ilp32 to avoid signed integer overflow in the bitmask - The sud_test kselftest has been fixed to properly swizzle the syscall number into the return register, which are not the same on RISC-V - A fix for a build warning in the perf tools on rv32 - A fix for the CBO selftests, to avoid non-constants leaking into the inline asm - A pair of fixes for T-Head PBMT errata probing, which has been renamed MAE by the vendor * tag 'riscv-for-linus-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2 perf riscv: Fix the warning due to the incompatible type riscv: T-Head: Test availability bit before enabling MAE errata riscv: thead: Rename T-Head PBMT to MAE selftests: sud_test: return correct emulated syscall value on RISC-V riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMIN riscv: Fix loading 64-bit NOMMU kernels past the start of RAM riscv: Fix TASK_SIZE on 64-bit NOMMU
-