- 17 Aug, 2021 16 commits
-
-
Peter Zijlstra authored
RT builds substitutions for rwsem, mutex, spinlock and rwlock around rtmutexes. Split the inner working out so each lock substitution can use them with the appropriate lockdep annotations. This avoids having an extra unused lockdep map in the wrapped rtmutex. No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de
-
Thomas Gleixner authored
Prepare for reusing the inner functions of rtmutex for RT lock substitutions: introduce kernel/locking/rtmutex_api.c and move them there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.726560996@linutronix.de
-
Thomas Gleixner authored
Allows the compiler to generate better code depending on the architecture. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.668958502@linutronix.de
-
Sebastian Andrzej Siewior authored
Inlines are type-safe... Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.610830960@linutronix.de
-
Peter Zijlstra authored
There are no more users left. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.552218335@linutronix.de
-
Peter Zijlstra authored
The only user of rt_mutex_is_locked() is an anti-pattern, remove it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.491442626@linutronix.de
-
Thomas Gleixner authored
The RT specific spin/rwlock implementation requires special handling of the to be woken waiters. Provide a WAKE_Q_HEAD_INITIALIZER(), which can be used by the rtmutex code to implement an RT aware wake_q derivative. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.429918071@linutronix.de
-
Thomas Gleixner authored
RT enabled kernels substitute spin/rwlocks with 'sleeping' variants based on rtmutexes. Blocking on such a lock is similar to preemption versus: - I/O scheduling and worker handling, because these functions might block on another substituted lock, or come from a lock contention within these functions. - RCU considers this like a preemption, because the task might be in a read side critical section. Add a separate scheduling point for this, and hand a new scheduling mode argument to __schedule() which allows, along with separate mode masks, to handle this gracefully from within the scheduler, without proliferating that to other subsystems like RCU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.372319055@linutronix.de
-
Thomas Gleixner authored
PREEMPT_RT needs to hand a special state into __schedule() when a task blocks on a 'sleeping' spin/rwlock. This is required to handle rcu_note_context_switch() correctly without having special casing in the RCU code. From an RCU point of view the blocking on the sleeping spinlock is equivalent to preemption, because the task might be in a read side critical section. schedule_debug() also has a check which would trigger with the !preempt case, but that could be handled differently. To avoid adding another argument and extra checks which cannot be optimized out by the compiler, the following solution has been chosen: - Replace the boolean 'preempt' argument with an unsigned integer 'sched_mode' argument and define constants to hand in: (0 == no preemption, 1 = preemption). - Add two masks to apply on that mode: one for the debug/rcu invocations, and one for the actual scheduling decision. For a non RT kernel these masks are UINT_MAX, i.e. all bits are set, which allows the compiler to optimize the AND operation out, because it is not masking out anything. IOW, it's not different from the boolean. RT enabled kernels will define these masks separately. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.315473019@linutronix.de
-
Thomas Gleixner authored
Waiting for spinlocks and rwlocks on non RT enabled kernels is task::state preserving. Any wakeup which matches the state is valid. RT enabled kernels substitutes them with 'sleeping' spinlocks. This creates an issue vs. task::__state. In order to block on the lock, the task has to overwrite task::__state and a consecutive wakeup issued by the unlocker sets the state back to TASK_RUNNING. As a consequence the task loses the state which was set before the lock acquire and also any regular wakeup targeted at the task while it is blocked on the lock. To handle this gracefully, add a 'saved_state' member to task_struct which is used in the following way: 1) When a task blocks on a 'sleeping' spinlock, the current state is saved in task::saved_state before it is set to TASK_RTLOCK_WAIT. 2) When the task unblocks and after acquiring the lock, it restores the saved state. 3) When a regular wakeup happens for a task while it is blocked then the state change of that wakeup is redirected to operate on task::saved_state. This is also required when the task state is running because the task might have been woken up from the lock wait and has not yet restored the saved state. To make it complete, provide the necessary helpers to save and restore the saved state along with the necessary documentation how the RT lock blocking is supposed to work. For non-RT kernels there is no functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.258751046@linutronix.de
-
Thomas Gleixner authored
In order to avoid more duplicate implementations for the debug and non-debug variants of the state change macros, split the debug portion out and make that conditional on CONFIG_DEBUG_ATOMIC_SLEEP=y. Suggested-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.200898048@linutronix.deSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Thomas Gleixner authored
RT kernels have an extra quirk for try_to_wake_up() to handle task state preservation across periods of blocking on a 'sleeping' spin/rwlock. For this to function correctly and under all circumstances try_to_wake_up() must be able to identify whether the wakeup is lock related or not and whether the task is waiting for a lock or not. The original approach was to use a special wake_flag argument for try_to_wake_up() and just use TASK_UNINTERRUPTIBLE for the tasks wait state and the try_to_wake_up() state argument. This works in principle, but due to the fact that try_to_wake_up() cannot determine whether the task is waiting for an RT lock wakeup or for a regular wakeup it's suboptimal. RT kernels save the original task state when blocking on an RT lock and restore it when the lock has been acquired. Any non lock related wakeup is checked against the saved state and if it matches the saved state is set to running so that the wakeup is not lost when the state is restored. While the necessary logic for the wake_flag based solution is trivial, the downside is that any regular wakeup with TASK_UNINTERRUPTIBLE in the state argument set will wake the task despite the fact that it is still blocked on the lock. That's not a fatal problem as the lock wait has do deal with spurious wakeups anyway, but it introduces unnecessary latencies. Introduce the TASK_RTLOCK_WAIT state bit which will be set when a task blocks on an RT lock. The lock wakeup will use wake_up_state(TASK_RTLOCK_WAIT), so both the waiting state and the wakeup state are distinguishable, which avoids spurious wakeups and allows better analysis. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.144989915@linutronix.deSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Thomas Gleixner authored
RT kernels have a slightly more complicated handling of wakeups due to 'sleeping' spin/rwlocks. If a task is blocked on such a lock then the original state of the task is preserved over the blocking period, and any regular (non lock related) wakeup has to be targeted at the saved state to ensure that these wakeups are not lost. Once the task acquires the lock it restores the task state from the saved state. To avoid cluttering try_to_wake_up() with that logic, split the wakeup state check out into an inline helper and use it at both places where task::__state is checked against the state argument of try_to_wake_up(). No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.088945085@linutronix.de
-
Thomas Gleixner authored
RT mutexes belong to the LD_WAIT_SLEEP class. Make them so. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.031014562@linutronix.de
-
Thomas Gleixner authored
If CONFIG_DEBUG_LOCK_ALLOC=y is enabled then local_lock_t has an 'owner' member which is checked for consistency, but nothing initialized it to zero explicitly. The static initializer does so implicit, and the run time allocated per CPU storage is usually zero initialized as well, but relying on that is not really good practice. Fixes: 91710728 ("locking: Introduce local_lock()") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211301.969975279@linutronix.de
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 15 Aug, 2021 12 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: - Fix crashes coming out of nap on 32-bit Book3s (eg. powerbooks). - Fix critical and debug interrupts on BookE, seen as crashes when using ptrace. - Fix an oops when running an SMP kernel on a UP system. - Update pseries LPAR security flavor after partition migration. - Fix an oops when using kprobes on BookE. - Fix oops on 32-bit pmac by not calling do_IRQ() from timer_interrupt(). - Fix softlockups on CPU hotplug into a CPU-less node with xive (P9). Thanks to Cédric Le Goater, Christophe Leroy, Finn Thain, Geetika Moolchandani, Laurent Dufour, Laurent Vivier, Nicholas Piggin, Pu Lehui, Radu Rendec, Srikar Dronamraju, and Stan Johnson. * tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xive: Do not skip CPU-less nodes when creating the IPIs powerpc/interrupt: Do not call single_step_exception() from other exceptions powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt() powerpc/kprobes: Fix kprobe Oops happens in booke powerpc/pseries: Fix update of LPAR security flavor after LPM powerpc/smp: Fix OOPS in topology_init() powerpc/32: Fix critical and debug interrupts on BOOKE powerpc/32s: Fix napping restore in data storage interrupt (DSI)
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Thomas Gleixner: "A set of fixes for PCI/MSI and x86 interrupt startup: - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked entries stay around e.g. when a crashkernel is booted. - Enforce masking of a MSI-X table entry when updating it, which mandatory according to speification - Ensure that writes to MSI[-X} tables are flushed. - Prevent invalid bits being set in the MSI mask register - Properly serialize modifications to the mask cache and the mask register for multi-MSI. - Cure the violation of the affinity setting rules on X86 during interrupt startup which can cause lost and stale interrupts. Move the initial affinity setting ahead of actualy enabling the interrupt. - Ensure that MSI interrupts are completely torn down before freeing them in the error handling case. - Prevent an array out of bounds access in the irq timings code" * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: driver core: Add missing kernel doc for device::msi_lock genirq/msi: Ensure deactivation on teardown genirq/timings: Prevent potential array overflow in __irq_timings_store() x86/msi: Force affinity setup before startup x86/ioapic: Force affinity setup before startup genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP PCI/MSI: Protect msi_desc::masked for multi-MSI PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() PCI/MSI: Correct misleading comments PCI/MSI: Do not set invalid bits in MSI mask PCI/MSI: Enforce MSI[X] entry updates to be visible PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Mask all unused MSI-X entries PCI/MSI: Enable and mask MSI-X early
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fix from Borislav Petkov: - Fix a CONFIG symbol's spelling * tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Use the correct rtmutex debugging config option
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI fixes from Borislav Petkov: "A batch of fixes for the arm64 stub image loader: - fix a logic bug that can make the random page allocator fail spuriously - force reallocation of the Image when it overlaps with firmware reserved memory regions - fix an oversight that defeated on optimization introduced earlier where images loaded at a suitable offset are never moved if booting without randomization - complain about images that were not loaded at the right offset by the firmware image loader" * tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: arm64: Double check image alignment at entry efi/libstub: arm64: Warn when efi_random_alloc() fails efi/libstub: arm64: Relax 2M alignment again for relocatable kernels efi/libstub: arm64: Force Image reallocation if BSS was not reserved arm64: efi: kaslr: Fix occasional random alloc (and boot) failure
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: "Two fixes: - An objdump checker fix to ignore parenthesized strings in the objdump version - Fix resctrl default monitoring groups reporting when new subgroups get created" * tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix default monitoring groups reporting x86/tools: Fix objdump version check again
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull KVM fixes from Paolo Bonzini: "ARM: - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM x86: - Fixes for the new MMU, especially a memory leak on hosts with <39 physical address bits - Remove bogus EFER.NX checks on 32-bit non-PAE hosts - WAITPKG fix" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault KVM: x86: remove dead initialization KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation KVM: arm64: Fix race when enabling KVM_ARM_CAP_MTE KVM: arm64: Fix off-by-one in range_is_memory
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Three minor fixes, all in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpt3sas: Fix incorrectly assigned error return and check scsi: storvsc: Log TEST_UNIT_READY errors as warnings scsi: lpfc: Move initialization of phba->poll_list earlier to avoid crash
-
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds authored
Pull libnvdimm fixes from Dan Williams: "A couple of fixes for long standing bugs, a warning fixup, and some miscellaneous dax cleanups. The bugs were recently found due to new platforms looking to use the ACPI NFIT "virtual" device definition, and new error injection capabilities to trigger error responses to label area requests. Ira's cleanups have been long pending, I neglected to send them earlier, and see no harm in including them now. This has all appeared in -next with no reported issues. Summary: - Fix support for NFIT "virtual" ranges (BIOS-defined memory disks) - Fix recovery from failed label storage areas on NVDIMM devices - Miscellaneous cleanups from Ira's investigation of dax_direct_access paths preparing for stray-write protection" * tag 'libnvdimm-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: tools/testing/nvdimm: Fix missing 'fallthrough' warning libnvdimm/region: Fix label activation vs errors ACPI: NFIT: Fix support for virtual SPA ranges dax: Ensure errno is returned from dax_direct_access fs/dax: Clarify nr_pages to dax_direct_access() fs/fuse: Remove unneeded kaddr parameter
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB fix from Greg KH: "A single revert of a commit that caused problems in 5.14-rc5 for 5.14-rc6. It has been in linux-next almost all week, and has resolved the issues that were reported on lots of different systems that were not the platform that the change was originally tested on (gotta love SoC cores used in multiple devices from multiple vendors...)" * tag 'usb-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: Revert "usb: dwc3: gadget: Use list_replace_init() before traversing lists"
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull IIO driver fixes from Greg KH: "Here are some small IIO driver fixes for reported problems for 5.14-rc6 (no staging driver fixes at the moment). All of them resolve reported issues and have been in linux-next all week with no reported problems. Full details are in the shortlog" * tag 'staging-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: adc: Fix incorrect exit of for-loop iio: humidity: hdc100x: Add margin to the conversion time dt-bindings: iio: st: Remove wrong items length check iio: accel: fxls8962af: fix i2c dependency iio: adis: set GPIO reset pin direction iio: adc: ti-ads7950: Ensure CS is deasserted after reading channels iio: accel: fxls8962af: fix potential use of uninitialized symbol
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "One driver bugfix, a documentation bugfix, and an "uninitialized data" leak fix for the core" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Documentation: i2c: add i2c-sysfs into index i2c: dev: zero out array used for i2c reads from userspace i2c: iproc: fix race between client unreg and tasklet
-
- 14 Aug, 2021 12 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull xen fixes from Juergen Gross: "A small cleanup patch and a fix of a rare race in the Xen evtchn driver" * tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: Fix race in set_evtchn_to_irq xen/events: remove redundant initialization of variable irq
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: - avoid passing -mno-relax to compilers that don't support it - a comment fix * tag 'riscv-for-linus-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix comment regarding kernel mapping overlapping with IS_ERR_VALUE riscv: kexec: do not add '-mno-relax' flag if compiler doesn't support it
-
git://git.infradead.org/users/hch/configfsLinus Torvalds authored
Pull configfs fix from Christoph Hellwig: - fix to revert to the historic write behavior (Bart Van Assche) * tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs: configfs: restore the kernel v5.13 text attribute write behavior
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "7 patches. Subsystems affected by this patch series: mm (kasan, mm/slub, mm/madvise, and memcg), and lib" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: lib: use PFN_PHYS() in devmem_is_allowed() mm/memcg: fix incorrect flushing of lruvec data in obj_stock mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE) mm: slub: fix slub_debug disabling for list of slabs slub: fix kmalloc_pagealloc_invalid_free unit test kasan, slub: reset tag when printing address kasan, kmemleak: reset tags when scanning block
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close, and one for the 'modefromsid' mount option (when 'idsfromsid' not specified)" * tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Call close synchronously during unlink/rename/lease break. cifs: Handle race conditions during rename cifs: use the correct max-length for dentry_path_raw() cifs: create sd context must be a multiple of 8
-
Linus Torvalds authored
Merge tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fix from Shuah Khan: "A single patch to sgx test to fix Q1 and Q2 calculation" * tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c
-
Liang Wang authored
The physical address may exceed 32 bits on 32-bit systems with more than 32 bits of physcial address. Use PFN_PHYS() in devmem_is_allowed(), or the physical address may overflow and be truncated. We found this bug when mapping a high addresses through devmem tool, when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem is used to map a high address that is not in the iomem address range, an unexpected error indicating no permission is returned. This bug was initially introduced from v2.6.37, and the function was moved to lib in v5.11. Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com Fixes: 087aaffc ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem") Fixes: 527701ed ("lib: Add a generic version of devmem_is_allowed()") Signed-off-by: Liang Wang <wangliang101@huawei.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Liang Wang <wangliang101@huawei.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <stable@vger.kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Waiman Long authored
When mod_objcg_state() is called with a pgdat that is different from that in the obj_stock, the old lruvec data cached in obj_stock are flushed out. Unfortunately, they were flushed to the new pgdat and so the data go to the wrong node. This will screw up the slab data reported in /sys/devices/system/node/node*/meminfo. Fix that by flushing the data to the cached pgdat instead. Link: https://lkml.kernel.org/r/20210802143834.30578-1-longman@redhat.com Fixes: 68ac5b3c ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp") Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Chris Down <chris@chrisdown.name> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Hildenbrand authored
Doing some extended tests and polishing the man page update for MADV_POPULATE_(READ|WRITE), I realized that we end up converting also SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another madvise() user error. We want to report only problematic mappings and permission problems that the user could have know as -EINVAL. Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL, but instead indicate -EFAULT to user space. While we could also convert it to -ENOMEM, using -EFAULT looks more helpful when user space might want to troubleshoot what's going wrong: MADV_POPULATE_(READ|WRITE) is not part of an final Linux release and we can still adjust the behavior. Link: https://lkml.kernel.org/r/20210726154932.102880-1-david@redhat.com Fixes: 4ca9b385 ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
Vijayanand Jitta reports: Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would want to disable slub_debug for few slabs. Using boot parameter with slub_debug=-,slab_name syntax doesn't work as expected i.e; only disabling debugging for the specified list of slabs. Instead it disables debugging for all slabs, which is wrong. This patch fixes it by delaying the moment when the global slub_debug flags variable is updated. In case a "slub_debug=-,slab_name" has been passed, the global flags remain as initialized (depending on CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0. Link: https://lkml.kernel.org/r/8a3d992a-473a-467b-28a0-4ad2ff60ab82@suse.czSigned-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Vijayanand Jitta <vjitta@codeaurora.org> Reviewed-by: Vijayanand Jitta <vjitta@codeaurora.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shakeel Butt authored
The unit test kmalloc_pagealloc_invalid_free makes sure that for the higher order slub allocation which goes to page allocator, the free is called with the correct address i.e. the virtual address of the head page. Commit f227f0fa ("slub: fix unreclaimable slab stat for bulk free") unified the free code paths for page allocator based slub allocations but instead of using the address passed by the caller, it extracted the address from the page. Thus making the unit test kmalloc_pagealloc_invalid_free moot. So, fix this by using the address passed by the caller. Should we fix this? I think yes because dev expect kasan to catch these type of programming bugs. Link: https://lkml.kernel.org/r/20210802180819.1110165-1-shakeelb@google.com Fixes: f227f0fa ("slub: fix unreclaimable slab stat for bulk free") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kuan-Ying Lee authored
The address still includes the tags when it is printed. With hardware tag-based kasan enabled, we will get a false positive KASAN issue when we access metadata. Reset the tag before we access the metadata. Link: https://lkml.kernel.org/r/20210804090957.12393-3-Kuan-Ying.Lee@mediatek.com Fixes: aa1ef4d7 ("kasan, mm: reset tags when accessing metadata") Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-