- 24 Mar, 2024 4 commits
-
-
Tom Lendacky authored
When running with 5-level page tables, the kernel mapping PGD entry is updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC, which, when SME is active (mem_encrypt=on), results in a page table entry without the encryption mask set, causing the system to crash on boot. Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so that the encryption mask is set for the PGD entry. Fixes: 533568e0 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
-
Tony Luck authored
This one is the regular laptop CPU. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
-
Adamos Ttofari authored
Commit 67236547 ("x86/fpu: Update XFD state where required") and commit 8bf26758 ("x86/fpu: Add XFD state to fpstate") introduced a per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in order to avoid unnecessary writes to the MSR. On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which wipes out any stale state. But the per CPU cached xfd value is not reset, which brings them out of sync. As a consequence a subsequent xfd_update_state() might fail to update the MSR which in turn can result in XRSTOR raising a #NM in kernel space, which crashes the kernel. To fix this, introduce xfd_set_state() to write xfd_state together with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD. Fixes: 67236547 ("x86/fpu: Update XFD state where required") Signed-off-by: Adamos Ttofari <attofari@amazon.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
-
Tony Luck authored
The memory bandwidth software controller uses 2^20 units rather than 10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M" Linux define for 0x00100000. Update the documentation to use MiB when describing this feature. It's too late to fix the mount option "mba_MBps" as that is now an established user interface. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240322182016.196544-1-tony.luck@intel.com
-
- 23 Mar, 2024 4 commits
-
-
Thomas Gleixner authored
The APIC address is registered twice. First during the early detection and afterwards when actually scanning the table for APIC IDs. The APIC and topology core warn about the second attempt. Restrict it to the early detection call. Fixes: 81287ad6 ("x86/apic: Sanitize APIC address setup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
-
Thomas Gleixner authored
If there is no local APIC enumerated and registered then the topology bitmaps are empty. Therefore, topology_init_possible_cpus() will die with a division by zero exception. Prevent this by registering a fake APIC id to populate the topology bitmap. This also allows to use all topology query interfaces unconditionally. It does not affect the actual APIC code because either the local APIC address was not registered or no local APIC could be detected. Fixes: f1f758a8 ("x86/topology: Add a mechanism to track topology via APIC IDs") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
-
Thomas Gleixner authored
The local APICs have not yet been enumerated so the logical ID evaluation from the topology bitmaps does not work and would return an error code. Skip the evaluation during the early boot CPUID evaluation and only apply it on the final run. Fixes: 380414be ("x86/cpu/topology: Use topology logical mapping mechanism") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
-
Thomas Gleixner authored
The boot sequence evaluates CPUID information twice: 1) During early boot 2) When finalizing the early setup right before mitigations are selected and alternatives are patched. In both cases the evaluation is stored in boot_cpu_data, but on UP the copying of boot_cpu_data to the per CPU info of the boot CPU happens between #1 and #2. So any update which happens in #2 is never propagated to the per CPU info instance. Consolidate the whole logic and copy boot_cpu_data right before applying alternatives as that's the point where boot_cpu_data is in it's final state and not supposed to change anymore. This also removes the voodoo mb() from smp_prepare_cpus_common() which had absolutely no purpose. Fixes: 71eb4893 ("x86/percpu: Cure per CPU madness on UP") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
-
- 22 Mar, 2024 3 commits
-
-
Masami Hiramatsu (Google) authored
Read from an unsafe address with copy_from_kernel_nofault() in arch_adjust_kprobe_addr() because this function is used before checking the address is in text or not. Syzcaller bot found a bug and reported the case if user specifies inaccessible data area, arch_adjust_kprobe_addr() will cause a kernel panic. [ mingo: Clarified the comment. ] Fixes: cc66bb91 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes") Reported-by: Qiang Zhang <zzqq0103.hey@gmail.com> Tested-by: Jinghao Jia <jinghao7@illinois.edu> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/171042945004.154897.2221804961882915806.stgit@devnote2
-
Anton Altaparmakov authored
Since: 7ee18d67 ("x86/power: Make restore_processor_context() sane") kmemleak reports this issue: unreferenced object 0xf68241e0 (size 32): comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s) hex dump (first 32 bytes): 00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00 ....)........... 00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc .B.............. backtrace: [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260 [<ea65e13b>] __kmalloc+0x54/0x160 [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100 [<46635aff>] pm_check_save_msr+0x63/0x80 [<6b6bb938>] do_one_initcall+0x41/0x1f0 [<3f3add60>] kernel_init_freeable+0x199/0x1e8 [<3b538fde>] kernel_init+0x1a/0x110 [<938ae2b2>] ret_from_fork+0x1c/0x28 Which is a false positive. Reproducer: - Run rsync of whole kernel tree (multiple times if needed). - start a kmemleak scan - Note this is just an example: a lot of our internal tests hit these. The root cause is similar to the fix in: b0b592cf x86/pm: Fix false positive kmemleak report in msr_build_context() ie. the alignment within the packed struct saved_context which has everything unaligned as there is only "u16 gs;" at start of struct where in the past there were four u16 there thus aligning everything afterwards. The issue is with the fact that Kmemleak only searches for pointers that are aligned (see how pointers are scanned in kmemleak.c) so when the struct members are not aligned it doesn't see them. Testing: We run a lot of tests with our CI, and after applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above report. From a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, which is quite a lot. With this fix applied we get zero kmemleak related failures. Fixes: 7ee18d67 ("x86/power: Make restore_processor_context() sane") Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240314142656.17699-1-anton@tuxera.com
-
Dave Young authored
crashkernel reservation failed on a Thinkpad t440s laptop recently. Actually the memblock reservation succeeded, but later insert_resource() failed. Test steps: kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */ kexec reboot -> dmesg|grep "crashkernel reserved"; crashkernel memory range like below reserved successfully: 0x00000000d0000000 - 0x00000000da000000 But no such "Crash kernel" region in /proc/iomem The background story: Currently the E820 code reserves setup_data regions for both the current kernel and the kexec kernel, and it inserts them into the resources list. Before the kexec kernel reboots nobody passes the old setup_data, and kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed. Thus the old setup data memory is not used at all. Due to old kernel updates the kexec e820 table as well so kexec kernel sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data regions are inserted into resources list in the kexec kernel by e820__reserve_resources(). Note, due to no setup_data is passed in for those old regions they are not early reserved (by function early_reserve_memory), and the crashkernel memblock reservation will just treat them as usable memory and it could reserve the crashkernel region which overlaps with the old setup_data regions. And just like the bug I noticed here, kdump insert_resource failed because e820__reserve_resources has added the overlapped chunks in /proc/iomem already. Finally, looking at the code, the old setup_data regions are not used at all as no setup_data is passed in by the kexec boot loader. Although something like SETUP_PCI etc could be needed, kexec should pass the info as new setup_data so that kexec kernel can take care of them. This should be taken care of in other separate patches if needed. Thus drop the useless buggy code here. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Eric DeVolder <eric.devolder@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/Zf0T3HCG-790K-pZ@darkstar.users.ipa.redhat.com
-
- 21 Mar, 2024 1 commit
-
-
Masahiro Yamada authored
Kconfig emits a warning for the following command: $ make ARCH=x86_64 tinyconfig ... .config:1380:warning: override: UNWINDER_GUESS changes choice state When X86_64=y, the unwinder is exclusively selected from the following three options: - UNWINDER_ORC - UNWINDER_FRAME_POINTER - UNWINDER_GUESS However, arch/x86/configs/tiny.config only specifies the values of the last two. UNWINDER_ORC must be explicitly disabled. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240320154313.612342-1-masahiroy@kernel.org
-
- 18 Mar, 2024 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fix from Ingo Molnar: "A RISC-V irqchip driver fix" * tag 'irq-urgent-2024-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/riscv-intc: Fix use of AIA interrupts 32-63 on riscv32
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "Two regression fixes that had been introduced in this merge window, additional HD-audio quirks, and a further enhancement for the new kunit" * tag 'sound-fix-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: core: add kunitconfig ALSA: hda/realtek: add in quirk for Acer Swift Go 16 - SFG16-71 Revert "ALSA: usb-audio: Name feature ctl using output if input is PCM" ALSA: timer: Fix missing irq-disable at closing ALSA: hda/realtek: Add quirk for Lenovo Yoga 9 14IMH9
-
- 17 Mar, 2024 10 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linuxLinus Torvalds authored
Pull i3c updates from Alexandre Belloni: "Not much this cycle with only three patches. Core: - i3c_bus_type is now const Drivers: - dw: disabling IBI is only allowed when hot join and SIR are disabled" * tag 'i3c/for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c: Make i3c_bus_type const i3c: dw: Disable IBI IRQ depends on hot-join and SIR enabling dt-bindings: i3c: drop "master" node name suffix
-
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efiLinus Torvalds authored
Pull EFI fix from Ard Biesheuvel: "This fixes an oversight on my part in the recent EFI stub rework for x86, which is needed to get Linux/x86 distro builds signed again for secure boot by Microsoft. For this reason, most of this work is being backported to v6.1, which is therefore also affected by this regression. - Explicitly wipe BSS in the native EFI entrypoint, so that globals shared with the legacy decompressor are zero-initialized correctly" * tag 'efi-fixes-for-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Clear decompressor BSS in native EFI entrypoint
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fix from Ingo Molnar: "Fix timer migration bug that can result in long bootup delays and other oddities" * tag 'timers-urgent-2024-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timer/migration: Remove buggy early return on deactivation
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 perf event fixes from Ingo Molnar: - Work around AMD erratum to filter out bogus LBR stack entries - Fix incorrect PMU reset that can result in warnings (or worse) during suspend/hibernation * tag 'perf-urgent-2024-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/core: Avoid register reset when CPU is dead perf/x86/amd/lbr: Discard erroneous branch entries
-
git://www.linux-watchdog.org/linux-watchdogLinus Torvalds authored
Pull watchdog updates from Wim Van Sebroeck: - Remove usage of the deprecated ida_simple_xx() API - Add kernel-doc for wdt_set_timeout() - Add support for R-Car V4M, StarFive's JH8100 and sam9x7-wdt - Fixes and small improvements * tag 'linux-watchdog-6.9-rc1' of git://www.linux-watchdog.org/linux-watchdog: watchdog: intel-mid_wdt: Get platform data via dev_get_platdata() watchdog: intel-mid_wdt: Don't use "proxy" headers watchdog: intel-mid_wdt: Remove unused intel-mid.h dt-bindings: watchdog: sama5d4-wdt: add compatible for sam9x7-wdt dt-bindings: watchdog: sprd,sp9860-wdt: convert to YAML dt-bindings: watchdog: starfive,jh7100-wdt: Add compatible for JH8100 watchdog: stm32_iwdg: initialize default timeout dt-bindings: watchdog: arm,sp805: document the reset signal watchdog: sp805_wdt: deassert the reset if available watchdog/hpwdt: Support Suspend and Resume dt-bindings: watchdog: renesas-wdt: Add support for R-Car V4M watchdog: starfive: check watchdog status before enabling in system resume watchdog: starfive: Check pm_runtime_enabled() before decrementing usage counter watchdog: qcom: fine tune the max timeout value calculation watchdog: Add kernel-doc for wdt_set_timeout() watchdog: core: Remove usage of the deprecated ida_simple_xx() API
-
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linuxLinus Torvalds authored
Pull PCMCIA updates from Dominik Brodowski: "Mark some structs 'const' now that the driver core supports it (Ricardo B Marliere)" * tag 'pcmcia-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: cs: make pcmcia_socket_class constant pcmcia: ds: make pcmcia_bus_type const
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input updates from Dmitry Torokhov: - a new driver for Goodix Berlin I2C and SPI touch controllers - support for IQS7222D v1.1 and v1.2 in iqs7222 driver - support for IST3032C and IST3038B parts in Imagis touchscreen driver - support for touch keys for Imagis touchscreen controllers - support for Snakebyte GAMEPADs in xpad driver - various cleanups and conversions to yaml for device tree bindings - assorted fixes and cleanups - old Synaptics navpoint driver has been removed since the only board that used it (HP iPAQ hx4700) was removed a while ago. * tag 'input-for-v6.9-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (37 commits) Input: xpad - add support for Snakebyte GAMEPADs dt-bindings: input: samsung,s3c6410-keypad: convert to DT Schema Input: imagis - add touch key support dt-bindings: input: imagis: Document touch keys Input: imagis - use FIELD_GET where applicable Input: make input_class constant dt-bindings: input: atmel,captouch: convert bindings to YAML Input: iqs7222 - add support for IQS7222D v1.1 and v1.2 dt-bindings: input: allwinner,sun4i-a10-lrad: drop redundant type from label Input: serio - make serio_bus const Input: synaptics-rmi4 - make rmi_bus_type const Input: xilinx_ps2 - fix kernel-doc for xps2_of_probe function input/touchscreen: imagis: add support for IST3032C dt-bindings: input/touchscreen: imagis: add compatible for IST3032C input/touchscreen: imagis: Add support for Imagis IST3038B dt-bindings: input/touchscreen: Add compatible for IST3038B input/touchscreen: imagis: Correct the maximum touch area value Input: leds - change config symbol dependency for audio mute trigger Input: ti_am335x_tsc - remove redundant assignment to variable config Input: xpad - sort xpad_device by vendor and product ID ...
-
Takashi Sakamoto authored
It is helpful to add .kunitconfig if we work with the tools provided by KUnit project. The file describes the series of kernel configurations to satisfy the dependency to build the target test. For example: $ ./tools/testing/kunit/kunit.py run --arch=arm64 --cross_compile=aarch64-linux-gnu- --kunitconfig=sound/core/ [11:35:13] Configuring KUnit Kernel ... Regenerating .config ... Populating config with: $ make ARCH=arm64 O=.kunit olddefconfig CROSS_COMPILE=aarch64-linux-gnu- [11:35:19] Building KUnit Kernel ... Populating config with: $ make ARCH=arm64 O=.kunit olddefconfig CROSS_COMPILE=aarch64-linux-gnu- Building with: $ make ARCH=arm64 O=.kunit --jobs=8 CROSS_COMPILE=aarch64-linux-gnu- [11:37:35] Starting KUnit Kernel (1/1)... [11:37:35] ============================================================ Running tests with: $ qemu-system-aarch64 -nodefaults -m 1024 -kernel .kunit/arch/arm64/boot/Image.gz -append 'kunit.enable=1 console=ttyAMA0 kunit_shutdown=reboot' -no-reboot -nographic -serial stdio -machine virt -cpu max,pauth-impdef=on [11:37:35] ============== sound-core-test (10 subtests) =============== [11:37:35] [PASSED] test_phys_format_size [11:37:35] [PASSED] test_format_width [11:37:35] [PASSED] test_format_endianness [11:37:35] [PASSED] test_format_signed [11:37:35] [PASSED] test_format_fill_silence [11:37:35] [PASSED] test_playback_avail [11:37:35] [PASSED] test_capture_avail [11:37:35] [PASSED] test_card_set_id [11:37:35] [PASSED] test_pcm_format_name [11:37:35] [PASSED] test_card_add_component [11:37:35] ================= [PASSED] sound-core-test ================= [11:37:35] ============================================================ [11:37:35] Testing complete. Ran 10 tests: passed: 10 [11:37:35] Elapsed time: 142.333s total, 5.617s configuring, 136.047s building, 0.630s running Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Message-ID: <20240317024050.588370-1-o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
-
Ian Murphy authored
Keyboard has an LED that is ON/OFF when mic is muted/active - LED is controlled by GPIO pin - Patch enables led to appear in /sys/class/leds/ as hda::micmute - Enables LED when mic is MUTED - Disables LED when mic is active [ fixed white spaces by tiwai ] Signed-off-by: Ian Murphy <iano200@gmail.com> Message-ID: <20240316094157.13890-1-iano200@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
-
Takashi Iwai authored
This reverts commit 1601cd53. This fix is applied globally to all devices, and it may change the existing control names. When the devices are managed with the fixed configuration like UCM, such control name mismatch may lead to significant regressions. For avoiding that kind of regression, we would need to apply such changes conditionally, but it'd take time to settle down. While the original fix is a good thing in general, in order to address the regression, let's revert the change for now. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218605Reported-and-tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com> Message-ID: <20240316083744.28126-1-tiwai@suse.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
-
- 16 Mar, 2024 11 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI updates from James Bottomley: "Only a couple of driver updates this time (lpfc and mpt3sas) plus the usual assorted minor fixes and updates. The major core update is a set of patches moving retries out of the drivers and into the core" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (84 commits) scsi: core: Constify the struct device_type usage scsi: libfc: replace deprecated strncpy() with memcpy() scsi: lpfc: Replace deprecated strncpy() with strscpy() scsi: bfa: Fix function pointer type mismatch for state machines scsi: bfa: Fix function pointer type mismatch for hcb_qe->cbfn scsi: bfa: Remove additional unnecessary struct declarations scsi: csiostor: Avoid function pointer casts scsi: qla1280: Remove redundant assignment to variable 'mr' scsi: core: Make scsi_bus_type const scsi: core: Really include kunit tests with SCSI_LIB_KUNIT_TEST scsi: target: tcm_loop: Make tcm_loop_lld_bus const scsi: scsi_debug: Make pseudo_lld_bus const scsi: iscsi: Make iscsi_flashnode_bus const scsi: fcoe: Make fcoe_bus_type const scsi: lpfc: Copyright updates for 14.4.0.0 patches scsi: lpfc: Update lpfc version to 14.4.0.0 scsi: lpfc: Change lpfc_vport load_flag member into a bitmask scsi: lpfc: Change lpfc_vport fc_flag member into a bitmask scsi: lpfc: Protect vport fc_nodes list with an explicit spin lock scsi: lpfc: Change nlp state statistic counters into atomic_t ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linuxLinus Torvalds authored
Pull parisc architecture updates and fixes from Helge Deller: "Fixes for the IPv4 and IPv6 checksum functions, a fix for the 64-bit unaligned memory exception handler and various code cleanups. Most of the patches are tagged for stable series. - Fix inline assembly in ipv4 and ipv6 checksum functions (Guenter Roeck) - Rewrite 64-bit inline assembly of emulate_ldd() (Guenter Roeck) - Do not clobber carry/borrow bits in tophys and tovirt macros (John David Anglin) - Warn when kernel accesses unaligned memory" * tag 'parisc-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: led: Convert to platform remove callback returning void parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds parisc: Fix csum_ipv6_magic on 64-bit systems parisc: Fix csum_ipv6_magic on 32-bit systems parisc: Fix ip_fast_csum parisc: Avoid clobbering the C/B bits in the PSW with tophys and tovirt macros parisc/unaligned: Rewrite 64-bit inline assembly of emulate_ldd() parisc: make parisc_bus_type const parisc: avoid c23 'nullptr' idenitifier parisc: Show kernel unaligned memory accesses parisc: Use irq_enter_rcu() to fix warning at kernel/context_tracking.c:367
-
Frederic Weisbecker authored
When a CPU enters into idle and deactivates itself from the timer migration hierarchy without any global timer of its own to propagate, the group event of that CPU is set to "ignore" and tmigr_update_events() accordingly performs an early return without considering timers queued by other CPUs. If the hierarchy has a single level, and the CPU is the last one to enter idle, it will ignore others' global timers, as in the following layout: [GRP0:0] migrator = 0 active = 0 nextevt = T0i / \ 0 1 active (T0i) idle (T1) 0) CPU 0 is active thus its event is ignored (the letter 'i') and so are upper levels' events. CPU 1 is idle and has the timer T1 enqueued. [GRP0:0] migrator = NONE active = NONE nextevt = T0i / \ 0 1 idle (T0i) idle (T1) 1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is pushed as its next expiry and its own event kept as "ignore". As a result tmigr_update_events() ignores T1 and CPU 0 goes to idle with T1 unhandled. This isn't proper to single level hierarchy though. A similar issue, although slightly different, may arise on multi-level: [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0i, T0:1 / \ [GRP0:0] [GRP0:1] migrator = 0 migrator = NONE active = 0 active = NONE nextevt = T0i nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 active idle idle idle 0) CPU 0 is active thus its event is ignored (the letter 'i') and so are upper levels' events. CPU 1 is idle and has the timer T1 enqueued. CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2 [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0i, T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T0i nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 idle idle idle idle 1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is pushed as its next expiry and its own event kept as "ignore". As a result tmigr_update_events() ignores T1. The change only propagated up to 1st level so far. [GRP1:0] migrator = NONE active = NONE nextevt = T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T0i nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 idle idle idle idle 2) The change now propagates up to the top. tmigr_update_events() finds that the child event is ignored and thus removes it. The top level next event is now T2 which is returned to CPU 0 as its next effective expiry to take account for as the global idle migrator. However T1 has been ignored along the way, leaving it unhandled. Fix those issues with removing the buggy related early return. Ignored child events must not prevent from evaluating the other events within the same group. Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Florian Fainelli <f.fainelli@gmail.com> Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/ZfOhB9ZByTZcBy4u@lothringen
-
git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds authored
Pull NFS client updates from Trond Myklebust: "Highlights include: Bugfixes: - Fix for an Oops in the NFSv4.2 listxattr handler - Correct an incorrect buffer size in listxattr - Fix for an Oops in the pNFS flexfiles layout - Fix a refcount leak in NFS O_DIRECT writes - Fix missing locking in NFS O_DIRECT - Avoid an infinite loop in pnfs_update_layout - Fix an overflow in the RPC waitqueue queue length counter - Ensure that pNFS I/O is also protected by TLS when xprtsec is specified by the mount options - Fix a leaked folio lock in the netfs read code - Fix a potential deadlock in fscache - Allow setting the fscache uniquifier in NFSv4 - Fix an off by one in root_nfs_cat() - Fix another off by one in rpc_sockaddr2uaddr() - nfs4_do_open() can incorrectly trigger state recovery - Various fixes for connection shutdown Features and cleanups: - Ensure that containers only see their own RPC and NFS stats - Enable nconnect for RDMA - Remove dead code from nfs_writepage_locked() - Various tracepoint additions to track EXCHANGE_ID, GETDEVICEINFO, and mount options" * tag 'nfs-for-6.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits) nfs: fix panic when nfs4_ff_layout_prepare_ds() fails NFS: trace the uniquifier of fscache NFS: Read unlock folio on nfs_page_create_from_folio() error NFS: remove unused variable nfs_rpcstat nfs: fix UAF in direct writes nfs: properly protect nfs_direct_req fields NFS: enable nconnect for RDMA NFSv4: nfs4_do_open() is incorrectly triggering state recovery NFS: avoid infinite loop in pnfs_update_layout. NFS: remove sync_mode test from nfs_writepage_locked() NFSv4.1/pnfs: fix NFS with TLS in pnfs NFS: Fix an off by one in root_nfs_cat() nfs: make the rpc_stat per net namespace nfs: expose /proc/net/sunrpc/nfs in net namespaces sunrpc: add a struct rpc_stats arg to rpc_create_args nfs: remove unused NFS_CALL macro NFSv4.1: add tracepoint to trunked nfs4_exchange_id calls NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int nfs: fix regression in handling of fsc= option in NFSv4 ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phyLinus Torvalds authored
Pull phy updates from Vinod Koul: "New hardware support: - Qualcomm X1E80100 PCIe phy support, SM8550 PCIe1 PHY, SC7180 UFS PHY and SDM630 USBC support - Rockchip HDMI/eDP Combo PHY driver - Mediatek MT8365 CSI phy driver Updates: - Rework on Qualcomm phy PCS registers and type-c handling - Cadence torrent phy updates for multilink configuration - TI gmii resume support" * tag 'phy-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (41 commits) phy: constify of_phandle_args in xlate phy: ti: tusb1210: Define device IDs phy: ti: tusb1210: Use temporary variable for struct device phy: rockchip: Add Samsung HDMI/eDP Combo PHY driver dt-bindings: phy: Add Rockchip HDMI/eDP Combo PHY schema phy: ti: gmii-sel: add resume support phy: mtk-mipi-csi: add driver for CSI phy dt-bindings: phy: add mediatek MIPI CD-PHY module v0.5 phy: cadence-torrent: Add USXGMII(156.25MHz) + SGMII/QSGMII(100MHz) multilink config for TI J7200 dt-bindings: phy: cadence-torrent: Add a separate compatible for TI J7200 phy: cadence-torrent: Add USXGMII(156.25MHz) + SGMII/QSGMII(100MHz) multilink configuration phy: cadence-torrent: Add PCIe(100MHz) + USXGMII(156.25MHz) multilink configuration dt-bindings: phy: cadence-torrent: Add optional input reference clock for PLL1 phy: qcom-qmp-ufs: Switch to devm_clk_bulk_get_all() API dt-bindings: phy: qmp-ufs: Fix PHY clocks phy: qcom: sgmii-eth: move PCS registers to separate header phy: qcom: sgmii-eth: use existing register definitions phy: qcom: qmp-usbc: drop has_pwrdn_delay handling phy: qcom: qmp: move common bits definitions to common header phy: qcom: qmp: split DP PHY registers to separate headers ...
-
Linus Torvalds authored
Merge tag 'firewire-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire updates from Takashi Sakamoto: "Small changes to string processing in device attribute" * tag 'firewire-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: core: fix build failure due to the caller of fw_csr_string() firewire: Convert snprintf/sprintf to sysfs_emit firewire: Kill unnecessary buf check in device_attribute.show
-
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds authored
Pull CXL updates from Dan Williams: "CXL has mechanisms to enumerate the performance characteristics of memory devices. Those mechanisms allow Linux to build the equivalent of ACPI SRAT, SLIT, and HMAT tables dynamically at runtime. That capability is necessary because static ACPI can not represent dynamic CXL configurations (and reconfigurations). So, building on the v6.8 work to add "Quality of Service" enumeration, this update plumbs CXL "access coordinates" (read/write access latency and bandwidth) in all the same places that ACPI HMAT feeds similar data. Follow-on patches from the -mm side can then use that data to feed mechanisms like mm/memory-tiers.c. Greg has acked the touch to drivers/base/. The other feature update this cycle is support for CXL error injection via the ACPI EINJ module. That facility enables injection of bus protocol errors provided the user knows the magic address values to insert in the interface. To hide that magic, and make this easier to use, new error injection attributes were added to CXL debugfs. That interface injects the errors relative to a CXL object rather than require user tooling to know how to lookup and inject RCRB (Root Complex Register Block) addresses into the raw EINJ debugfs interface. It received some helpful review comments from Tony, but no explicit acks from the ACPI side. The primary user visible change for existing EINJ users is that they may find that einj.ko was already loaded by cxl_core.ko. Previously, einj.ko was only loaded on demand. The usual collection of miscellaneous cleanups are also present this cycle. Summary: - Supplement ACPI HMAT reported memory performance with native CXL memory performance enumeration - Add support for CXL error injection via the ACPI EINJ mechanism - Cleanup CXL DOE and CDAT integration - Miscellaneous cleanups and fixes" * tag 'cxl-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits) Documentation/ABI/testing/debugfs-cxl: Fix "Unexpected indentation" lib/firmware_table: Provide buffer length argument to cdat_table_parse() cxl/pci: Get rid of pointer arithmetic reading CDAT table cxl/pci: Rename DOE mailbox handle to doe_mb cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location cxl/core: Add CXL EINJ debugfs files EINJ, Documentation: Update EINJ kernel doc EINJ: Add CXL error type support EINJ: Migrate to a platform driver cxl/region: Deal with numa nodes not enumerated by SRAT cxl/region: Add memory hotplug notifier for cxl region cxl/region: Add sysfs attribute for locality attributes of CXL regions cxl/region: Calculate performance data for a region cxl: Set cxlmd->endpoint before adding port device cxl: Move QoS class to be calculated from the nearest CPU cxl: Split out host bridge access coordinates cxl: Split out combine_coordinates() for common shared usage ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes ACPI: HMAT: Introduce 2 levels of generic port access class base/node / ACPI: Enumerate node access class for 'struct access_coordinate' ...
-
Josef Bacik authored
We've been seeing the following panic in production BUG: kernel NULL pointer dereference, address: 0000000000000065 PGD 2f485f067 P4D 2f485f067 PUD 2cc5d8067 PMD 0 RIP: 0010:ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles] Call Trace: <TASK> ? __die+0x78/0xc0 ? page_fault_oops+0x286/0x380 ? __rpc_execute+0x2c3/0x470 [sunrpc] ? rpc_new_task+0x42/0x1c0 [sunrpc] ? exc_page_fault+0x5d/0x110 ? asm_exc_page_fault+0x22/0x30 ? ff_layout_free_layoutreturn+0x110/0x110 [nfs_layout_flexfiles] ? ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles] ? ff_layout_cancel_io+0x6f/0x90 [nfs_layout_flexfiles] pnfs_mark_matching_lsegs_return+0x1b0/0x360 [nfsv4] pnfs_error_mark_layout_for_return+0x9e/0x110 [nfsv4] ? ff_layout_send_layouterror+0x50/0x160 [nfs_layout_flexfiles] nfs4_ff_layout_prepare_ds+0x11f/0x290 [nfs_layout_flexfiles] ff_layout_pg_init_write+0xf0/0x1f0 [nfs_layout_flexfiles] __nfs_pageio_add_request+0x154/0x6c0 [nfs] nfs_pageio_add_request+0x26b/0x380 [nfs] nfs_do_writepage+0x111/0x1e0 [nfs] nfs_writepages_callback+0xf/0x30 [nfs] write_cache_pages+0x17f/0x380 ? nfs_pageio_init_write+0x50/0x50 [nfs] ? nfs_writepages+0x6d/0x210 [nfs] ? nfs_writepages+0x6d/0x210 [nfs] nfs_writepages+0x125/0x210 [nfs] do_writepages+0x67/0x220 ? generic_perform_write+0x14b/0x210 filemap_fdatawrite_wbc+0x5b/0x80 file_write_and_wait_range+0x6d/0xc0 nfs_file_fsync+0x81/0x170 [nfs] ? nfs_file_mmap+0x60/0x60 [nfs] __x64_sys_fsync+0x53/0x90 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Inspecting the core with drgn I was able to pull this >>> prog.crashed_thread().stack_trace()[0] #0 at 0xffffffffa079657a (ff_layout_cancel_io+0x3a/0x84) in ff_layout_cancel_io at fs/nfs/flexfilelayout/flexfilelayout.c:2021:27 >>> prog.crashed_thread().stack_trace()[0]['idx'] (u32)1 >>> prog.crashed_thread().stack_trace()[0]['flseg'].mirror_array[1].mirror_ds (struct nfs4_ff_layout_ds *)0xffffffffffffffed This is clear from the stack trace, we call nfs4_ff_layout_prepare_ds() which could error out initializing the mirror_ds, and then we go to clean it all up and our check is only for if (!mirror->mirror_ds). This is inconsistent with the rest of the users of mirror_ds, which have if (IS_ERR_OR_NULL(mirror_ds)) to keep from tripping over this exact scenario. Fix this up in ff_layout_cancel_io() to make sure we don't panic when we get an error. I also spot checked all the other instances of checking mirror_ds and we appear to be doing the correct checks everywhere, only unconditionally dereferencing mirror_ds when we know it would be valid. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Fixes: b739a5bd ("NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Borislav Petkov (AMD) authored
Update them to the correct revision numbers. Fixes: 522b1d69 ("x86/cpu/amd: Add a Zenbleed fix") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc updates from Michael Ellerman: - Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use by glibc - Add support for recognising the Power11 architected and raw PVRs - Add support for nr_cpus=n on the command line where the boot CPU is >= n - Add ppcxx_allmodconfig targets for all 32-bit sub-arches - Other small features, cleanups and fixes Thanks to Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan, Li zeming, Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor, Nicholas Piggin, Peter Bergner, Qiheng Lin, Randy Dunlap, Ricardo B. Marliere, Rob Herring, Sathvika Vasireddy, Shrikanth Hegde, Uwe Kleine-König, Vaibhav Jain, and Wen Xiong. * tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (71 commits) powerpc/macio: Make remove callback of macio driver void returned powerpc/83xx: Fix build failure with FPU=n powerpc/64s: Fix get_hugepd_cache_index() build failure powerpc/4xx: Fix warp_gpio_leds build failure powerpc/amigaone: Make several functions static powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc. macintosh/adb: make adb_dev_class constant powerpc: xor_vmx: Add '-mhard-float' to CFLAGS powerpc/fsl: Fix mfpmr() asm constraint error powerpc: Remove cpu-as-y completely powerpc/fsl: Modernise mt/mfpmr powerpc/fsl: Fix mfpmr build errors with newer binutils powerpc/64s: Use .machine power4 around dcbt powerpc/64s: Move dcbt/dcbtst sequence into a macro powerpc/mm: Code cleanup for __hash_page_thp powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks powerpc/irq: Allow softirq to hardirq stack transition powerpc: Stop using of_root powerpc/machdep: Define 'compatibles' property in ppc_md and use it of: Reimplement of_machine_is_compatible() using of_machine_compatible_match() ...
-
Oliver Upton authored
This reverts commits 99101dda and b80b701d. Linus reports that the sysreg reserved bit checks in KVM have led to build failures, arising from commit fdd867fe ("arm64/sysreg: Add register fields for ID_AA64DFR1_EL1") giving meaning to fields that were previously RES0. Of course, this is a genuine issue, since KVM's sysreg emulation depends heavily on the definition of reserved fields. But at this point the build breakage is far more offensive, and the right course of action is to revert and retry later. All of these build-time assertions were on by default before commit 99101dda ("KVM: arm64: Make build-time check of RES0/RES1 bits optional"), so deliberately revert it all atomically to avoid introducing further breakage of bisection. Link: https://lore.kernel.org/all/CAHk-=whCvkhc8BbFOUf1ddOsgSGgEjwoKv77=HEY1UiVCydGqw@mail.gmail.com/Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 15 Mar, 2024 5 commits
-
-
Dmitry Torokhov authored
Prepare input updates for 6.9 merge window.
-
git://git.kernel.dk/linuxLinus Torvalds authored
Pull block fixes from Jens Axboe: - Revert of a change for mq-deadline that went into the 6.8 release, causing a performance regression for some (Bart) - Revert of the interruptible discard handling. This needs more work since the ioctl and fs path aren't properly split, and will happen for the 6.10 kernel release. For 6.9, do the minimal revert (Christoph) - Fix for an issue with the timestamp caching code (me) - kerneldoc fix (Jiapeng) * tag 'block-6.9-20240315' of git://git.kernel.dk/linux: block: fix mismatched kerneldoc function name Revert "blk-lib: check for kill signal" Revert "block/mq-deadline: use correct way to throttling write requests" block: limit block time caching to in_task() context
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto updates from Herbert Xu: "API: - Avoid unnecessary copying in scomp for trivial SG lists Algorithms: - Optimise NEON CCM implementation on ARM64 Drivers: - Add queue stop/query debugfs support in hisilicon/qm - Intel qat updates and cleanups" * tag 'v6.9-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (79 commits) Revert "crypto: remove CONFIG_CRYPTO_STATS" crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem crypto: tcrypt - add ffdhe2048(dh) test crypto: iaa - fix the missing CRYPTO_ALG_ASYNC in cra_flags crypto: hisilicon/zip - fix the missing CRYPTO_ALG_ASYNC in cra_flags hwrng: hisi - use dev_err_probe MAINTAINERS: Remove T Ambarus from few mchp entries crypto: iaa - Fix comp/decomp delay statistics crypto: iaa - Fix async_disable descriptor leak dt-bindings: rng: atmel,at91-trng: add sam9x7 TRNG dt-bindings: crypto: add sam9x7 in Atmel TDES dt-bindings: crypto: add sam9x7 in Atmel SHA dt-bindings: crypto: add sam9x7 in Atmel AES crypto: remove CONFIG_CRYPTO_STATS crypto: dh - Make public key test FIPS-only crypto: rockchip - fix to check return value crypto: jitter - fix CRYPTO_JITTERENTROPY help text crypto: qat - make ring to service map common for QAT GEN4 crypto: qat - fix ring to service map for dcc in 420xx crypto: qat - fix ring to service map for dcc in 4xxx ...
-
https://github.com/awilliam/linux-vfioLinus Torvalds authored
Pull VFIO updates from Alex Williamson: - Add warning in unlikely case that device is not captured with driver_override (Kunwu Chan) - Error handling improvements in mlx5-vfio-pci to detect firmware tracking object error states, logging of firmware error syndrom, and releasing of firmware resources in aborted migration sequence (Yishai Hadas) - Correct an un-alphabetized VFIO MAINTAINERS entry (Alex Williamson) - Make the mdev_bus_type const and also make the class struct const for a couple of the vfio-mdev sample drivers (Ricardo B. Marliere) - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's Grace-Hopper superchip. During initialization of the chip-to-chip interconnect in this hardware module, the PCI BARs of the device become unused in favor of a faster, coherent mechanism for exposing device memory. This driver primarily changes the VFIO representation of the device to masquerade this coherent aperture to replace the physical PCI BARs for userspace drivers. This also incorporates use of a new vma flag allowing KVM to use write combining attributes for uncached device memory (Ankit Agrawal) - Reset fixes and cleanups for the pds-vfio-pci driver. Save and restore files were previously leaked if the device didn't pass through an error state, this is resolved and later re-fixed to prevent access to the now freed files. Reset handling is also refactored to remove the complicated deferred reset mechanism (Brett Creeley) - Remove some references to pl330 in the vfio-platform amba driver (Geert Uytterhoeven) - Remove twice redundant and ugly code to unpin incidental pins of the zero-page (Alex Williamson) - Deferred reset logic is also removed from the hisi-acc-vfio-pci driver as a simplification (Shameer Kolothum) - Enforce that mlx5-vfio-pci devices must support PRE_COPY and remove resulting unnecessary code. There is no device firmware that has been available publicly without this support (Yishai Hadas) - Switch over to using the .remove_new callback for vfio-platform in support of the broader transition for a void remove function (Uwe Kleine-König) - Resolve multiple issues in interrupt code for VFIO bus drivers that allow calling eventfd_signal() on a NULL context. This also remove a potential race in INTx setup on certain hardware for vfio-pci, races with various mechanisms to mask INTx, and leaked virqfds in vfio-platform (Alex Williamson) * tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio: (29 commits) vfio/fsl-mc: Block calling interrupt handler without trigger vfio/platform: Create persistent IRQ handlers vfio/platform: Disable virqfds on cleanup vfio/pci: Create persistent INTx handler vfio: Introduce interface to flush virqfd inject workqueue vfio/pci: Lock external INTx masking ops vfio/pci: Disable auto-enable of exclusive INTx IRQ vfio/pds: Refactor/simplify reset logic vfio/pds: Make sure migration file isn't accessed after reset vfio/platform: Convert to platform remove callback returning void vfio/mlx5: Enforce PRE_COPY support vfio/mbochs: make mbochs_class constant vfio/mdpy: make mdpy_class constant hisi_acc_vfio_pci: Remove the deferred_reset logic Revert "vfio/type1: Unpin zero pages" vfio/nvgrace-gpu: Convey kvm to map device memory region as noncached vfio: amba: Rename pl330_ids[] to vfio_amba_ids[] vfio/pds: Always clear the save/restore FDs on reset vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper vfio/pci: rename and export range_intersect_range ...
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm updates from Paolo Bonzini: "S390: - Changes to FPU handling came in via the main s390 pull request - Only deliver to the guest the SCLP events that userspace has requested - More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same) - Fix selftests undefined behavior x86: - Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec - Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests) - Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized - Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot - Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels - Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives - Fix the debugregs ABI for 32-bit KVM - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work - Cleanup the logic for checking if the currently loaded vCPU is in-kernel - Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel x86 Xen emulation: - Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior) - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs RISC-V: - Support exception and interrupt handling in selftests - New self test for RISC-V architectural timer (Sstc extension) - New extension support (Ztso, Zacas) - Support userspace emulation of random number seed CSRs ARM: - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: - Set reserved bits as zero in CPUCFG - Start SW timer only when vcpu is blocking - Do not restart SW timer when it is expired - Remove unnecessary CSR register saving during enter guest - Misc cleanups and fixes as usual Generic: - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else - Factor common "select" statements in common code instead of requiring each architecture to specify it - Remove thoroughly obsolete APIs from the uapi headers - Move architecture-dependent stuff to uapi/asm/kvm.h - Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded - Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker Selftests: - Reduce boilerplate especially when utilize selftest TAP infrastructure - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory - Fix benign bugs where tests neglect to close() guest_memfd files" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) selftests: kvm: remove meaningless assignments in Makefiles KVM: riscv: selftests: Add Zacas extension to get-reg-list test RISC-V: KVM: Allow Zacas extension for Guest/VM KVM: riscv: selftests: Add Ztso extension to get-reg-list test RISC-V: KVM: Allow Ztso extension for Guest/VM RISC-V: KVM: Forward SEED CSR access to user space KVM: riscv: selftests: Add sstc timer test KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Add exception handling support LoongArch: KVM: Remove unnecessary CSR register saving during enter guest LoongArch: KVM: Do not restart SW timer when it is expired LoongArch: KVM: Start SW timer only when vcpu is blocking LoongArch: KVM: Set reserved bits as zero in CPUCFG KVM: selftests: Explicitly close guest_memfd files in some gmem tests KVM: x86/xen: fix recursive deadlock in timer injection KVM: pfncache: simplify locking and make more self-contained KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled KVM: x86/xen: improve accuracy of Xen timers ...
-