1. 21 Oct, 2024 5 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · d1293776
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "ARM64:
      
         - Fix the guest view of the ID registers, making the relevant fields
           writable from userspace (affecting ID_AA64DFR0_EL1 and
           ID_AA64PFR1_EL1)
      
         - Correcly expose S1PIE to guests, fixing a regression introduced in
           6.12-rc1 with the S1POE support
      
         - Fix the recycling of stage-2 shadow MMUs by tracking the context
           (are we allowed to block or not) as well as the recycling state
      
         - Address a couple of issues with the vgic when userspace
           misconfigures the emulation, resulting in various splats. Headaches
           courtesy of our Syzkaller friends
      
         - Stop wasting space in the HYP idmap, as we are dangerously close to
           the 4kB limit, and this has already exploded in -next
      
         - Fix another race in vgic_init()
      
         - Fix a UBSAN error when faking the cache topology with MTE enabled
      
        RISCV:
      
         - RISCV: KVM: use raw_spinlock for critical section in imsic
      
        x86:
      
         - A bandaid for lack of XCR0 setup in selftests, which causes trouble
           if the compiler is configured to have x86-64-v3 (with AVX) as the
           default ISA. Proper XCR0 setup will come in the next merge window.
      
         - Fix an issue where KVM would not ignore low bits of the nested CR3
           and potentially leak up to 31 bytes out of the guest memory's
           bounds
      
         - Fix case in which an out-of-date cached value for the segments
           could by returned by KVM_GET_SREGS.
      
         - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL
      
         - Override MTRR state for KVM confidential guests, making it WB by
           default as is already the case for Hyper-V guests.
      
        Generic:
      
         - Remove a couple of unused functions"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
        RISCV: KVM: use raw_spinlock for critical section in imsic
        KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
        KVM: selftests: x86: Avoid using SSE/AVX instructions
        KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
        KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
        KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
        KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
        KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
        x86/kvm: Override default caching mode for SEV-SNP and TDX
        KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
        KVM: Remove unused kvm_vcpu_gfn_to_pfn
        KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
        KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
        KVM: arm64: Fix shift-out-of-bounds bug
        KVM: arm64: Shave a few bytes from the EL2 idmap code
        KVM: arm64: Don't eagerly teardown the vgic on init error
        KVM: arm64: Expose S1PIE to guests
        KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
        KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
        KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
        ...
      d1293776
    • Linus Torvalds's avatar
      Merge tag 'probes-fixes-v6.12-rc4' of... · c1bc09d7
      Linus Torvalds authored
      Merge tag 'probes-fixes-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull uprobe fix from Masami Hiramatsu:
      
       - uprobe: avoid out-of-bounds memory access of fetching args
      
         Uprobe trace events can cause out-of-bounds memory access when
         fetching user-space data which is bigger than one page, because it
         does not check the local CPU buffer size when reading the data. This
         checks the read data size and cut it down to the local CPU buffer
         size.
      
      * tag 'probes-fixes-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        uprobe: avoid out-of-bounds memory access of fetching args
      c1bc09d7
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 7166c326
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
       "afs:
         - Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock
      
       netfs:
         - Drop the references to a folio immediately after the folio has been
           extracted to prevent races with future I/O collection
      
         - Fix a documenation build error
      
         - Downgrade the i_rwsem for buffered writes to fix a cifs reported
           performance regression when switching to netfslib
      
        vfs:
         - Explicitly return -E2BIG from openat2() if the specified size is
           unexpectedly large. This aligns openat2() with other extensible
           struct based system calls
      
         - When copying a mount namespace ensure that we only try to remove
           the new copy from the mount namespace rbtree if it has already been
           added to it
      
        nilfs:
         - Clear the buffer delay flag when clearing the buffer state clags
           when a buffer head is discarded to prevent a kernel OOPs
      
        ocfs2:
         - Fix an unitialized value warning in ocfs2_setattr()
      
        proc:
         - Fix a kernel doc warning"
      
      * tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        proc: Fix W=1 build kernel-doc warning
        afs: Fix lock recursion
        fs: Fix uninitialized value issue in from_kuid and from_kgid
        fs: don't try and remove empty rbtree node
        netfs: Downgrade i_rwsem for a buffered write
        nilfs2: fix kernel bug due to missing clearing of buffer delay flag
        openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
        netfs: fix documentation build error
        netfs: In readahead, put the folio refs as soon extracted
      7166c326
    • Linus Torvalds's avatar
      Merge tag 'v6.12-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · a777c32c
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "Fix a regression in mpi that broke RSA"
      
      * tag 'v6.12-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: lib/mpi - Fix an "Uninitialized scalar variable" issue
      a777c32c
    • Qiao Ma's avatar
      uprobe: avoid out-of-bounds memory access of fetching args · 373b9338
      Qiao Ma authored
      Uprobe needs to fetch args into a percpu buffer, and then copy to ring
      buffer to avoid non-atomic context problem.
      
      Sometimes user-space strings, arrays can be very large, but the size of
      percpu buffer is only page size. And store_trace_args() won't check
      whether these data exceeds a single page or not, caused out-of-bounds
      memory access.
      
      It could be reproduced by following steps:
      1. build kernel with CONFIG_KASAN enabled
      2. save follow program as test.c
      
      ```
      \#include <stdio.h>
      \#include <stdlib.h>
      \#include <string.h>
      
      // If string length large than MAX_STRING_SIZE, the fetch_store_strlen()
      // will return 0, cause __get_data_size() return shorter size, and
      // store_trace_args() will not trigger out-of-bounds access.
      // So make string length less than 4096.
      \#define STRLEN 4093
      
      void generate_string(char *str, int n)
      {
          int i;
          for (i = 0; i < n; ++i)
          {
              char c = i % 26 + 'a';
              str[i] = c;
          }
          str[n-1] = '\0';
      }
      
      void print_string(char *str)
      {
          printf("%s\n", str);
      }
      
      int main()
      {
          char tmp[STRLEN];
      
          generate_string(tmp, STRLEN);
          print_string(tmp);
      
          return 0;
      }
      ```
      3. compile program
      `gcc -o test test.c`
      
      4. get the offset of `print_string()`
      ```
      objdump -t test | grep -w print_string
      0000000000401199 g     F .text  000000000000001b              print_string
      ```
      
      5. configure uprobe with offset 0x1199
      ```
      off=0x1199
      
      cd /sys/kernel/debug/tracing/
      echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring"
       > uprobe_events
      echo 1 > events/uprobes/enable
      echo 1 > tracing_on
      ```
      
      6. run `test`, and kasan will report error.
      ==================================================================
      BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0
      Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ #18
      Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x55/0x70
       print_address_description.constprop.0+0x27/0x310
       kasan_report+0x10f/0x120
       ? strncpy_from_user+0x1d6/0x1f0
       strncpy_from_user+0x1d6/0x1f0
       ? rmqueue.constprop.0+0x70d/0x2ad0
       process_fetch_insn+0xb26/0x1470
       ? __pfx_process_fetch_insn+0x10/0x10
       ? _raw_spin_lock+0x85/0xe0
       ? __pfx__raw_spin_lock+0x10/0x10
       ? __pte_offset_map+0x1f/0x2d0
       ? unwind_next_frame+0xc5f/0x1f80
       ? arch_stack_walk+0x68/0xf0
       ? is_bpf_text_address+0x23/0x30
       ? kernel_text_address.part.0+0xbb/0xd0
       ? __kernel_text_address+0x66/0xb0
       ? unwind_get_return_address+0x5e/0xa0
       ? __pfx_stack_trace_consume_entry+0x10/0x10
       ? arch_stack_walk+0xa2/0xf0
       ? _raw_spin_lock_irqsave+0x8b/0xf0
       ? __pfx__raw_spin_lock_irqsave+0x10/0x10
       ? depot_alloc_stack+0x4c/0x1f0
       ? _raw_spin_unlock_irqrestore+0xe/0x30
       ? stack_depot_save_flags+0x35d/0x4f0
       ? kasan_save_stack+0x34/0x50
       ? kasan_save_stack+0x24/0x50
       ? mutex_lock+0x91/0xe0
       ? __pfx_mutex_lock+0x10/0x10
       prepare_uprobe_buffer.part.0+0x2cd/0x500
       uprobe_dispatcher+0x2c3/0x6a0
       ? __pfx_uprobe_dispatcher+0x10/0x10
       ? __kasan_slab_alloc+0x4d/0x90
       handler_chain+0xdd/0x3e0
       handle_swbp+0x26e/0x3d0
       ? __pfx_handle_swbp+0x10/0x10
       ? uprobe_pre_sstep_notifier+0x151/0x1b0
       irqentry_exit_to_user_mode+0xe2/0x1b0
       asm_exc_int3+0x39/0x40
      RIP: 0033:0x401199
      Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce
      RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206
      RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2
      RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0
      RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20
      R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040
      R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      
      This commit enforces the buffer's maxlen less than a page-size to avoid
      store_trace_args() out-of-memory access.
      
      Link: https://lore.kernel.org/all/20241015060148.1108331-1-mqaio@linux.alibaba.com/
      
      Fixes: dcad1a20 ("tracing/uprobes: Fetch args before reserving a ring buffer")
      Signed-off-by: default avatarQiao Ma <mqaio@linux.alibaba.com>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      373b9338
  2. 20 Oct, 2024 25 commits
    • Linus Torvalds's avatar
      Linux 6.12-rc4 · 42f7652d
      Linus Torvalds authored
      42f7652d
    • Linus Torvalds's avatar
      Merge tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · d7f513ae
      Linus Torvalds authored
      Pull bluetooth fixes from Luiz Augusto Von Dentz:
      
       - ISO: Fix multiple init when debugfs is disabled
      
       - Call iso_exit() on module unload
      
       - Remove debugfs directory on module init failure
      
       - btusb: Fix not being able to reconnect after suspend
      
       - btusb: Fix regression with fake CSR controllers 0a12:0001
      
       - bnep: fix wild-memory-access in proto_unregister
      
      Note: normally the bluetooth fixes go through the networking tree, but
      this missed the weekly merge, and two of the commits fix regressions
      that have caused a fair amount of noise and have now hit stable too:
      
        https://lore.kernel.org/all/4e1977ca-6166-4891-965e-34a6f319035f@leemhuis.info/
      
      So I'm pulling it directly just to expedite things and not miss yet
      another -rc release. This is not meant to become a new pattern.
      
      * tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001
        Bluetooth: bnep: fix wild-memory-access in proto_unregister
        Bluetooth: btusb: Fix not being able to reconnect after suspend
        Bluetooth: Remove debugfs directory on module init failure
        Bluetooth: Call iso_exit() on module unload
        Bluetooth: ISO: Fix multiple init when debugfs is disabled
      d7f513ae
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · dd4f5037
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Mostly error path fixes, but one pretty serious interrupt problem in
        the Ocelot driver as well:
      
         - Fix two error paths and a missing semicolon in the Intel driver
      
         - Add a missing ACPI ID for the Intel Panther Lake
      
         - Check return value of devm_kasprintf() in the Apple and STM32
           drivers
      
         - Add a missing mutex_destroy() in the aw9523 driver
      
         - Fix a double free in cv1800_pctrl_dt_node_to_map() in the Sophgo
           driver
      
         - Fix a double free in ma35_pinctrl_dt_node_to_map_func() in the
           Nuvoton driver
      
         - Fix a bug in the Ocelot interrupt handler making the system hang"
      
      * tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: ocelot: fix system hang on level based interrupts
        pinctrl: nuvoton: fix a double free in ma35_pinctrl_dt_node_to_map_func()
        pinctrl: sophgo: fix double free in cv1800_pctrl_dt_node_to_map()
        pinctrl: intel: platform: Add Panther Lake to the list of supported
        pinctrl: aw9523: add missing mutex_destroy
        pinctrl: stm32: check devm_kasprintf() returned value
        pinctrl: apple: check devm_kasprintf() returned value
        pinctrl: intel: platform: use semicolon instead of comma in ncommunities assignment
        pinctrl: intel: platform: fix error path in device_for_each_child_node()
      dd4f5037
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · c5522822
      Linus Torvalds authored
      Pull misc driver fixes from Greg KH:
       "Here are a number of small char/misc/iio driver fixes for 6.12-rc4:
      
         - loads of small iio driver fixes for reported problems
      
         - parport driver out-of-bounds fix
      
         - Kconfig description and MAINTAINERS file updates
      
        All of these, except for the Kconfig and MAINTAINERS file updates have
        been in linux-next all week. Those other two are just documentation
        changes and will have no runtime issues and were merged on Friday"
      
      * tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (39 commits)
        misc: rtsx: list supported models in Kconfig help
        MAINTAINERS: Remove some entries due to various compliance requirements.
        misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for OTP device
        misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for EEPROM device
        parport: Proper fix for array out-of-bounds access
        iio: frequency: admv4420: fix missing select REMAP_SPI in Kconfig
        iio: frequency: {admv4420,adrf6780}: format Kconfig entries
        iio: adc: ad4695: Add missing Kconfig select
        iio: adc: ti-ads8688: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
        iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency()
        iioc: dac: ltc2664: Fix span variable usage in ltc2664_channel_config()
        iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig
        iio: dac: ltc1660: add missing select REGMAP_SPI in Kconfig
        iio: dac: ad5770r: add missing select REGMAP_SPI in Kconfig
        iio: amplifiers: ada4250: add missing select REGMAP_SPI in Kconfig
        iio: frequency: adf4377: add missing select REMAP_SPI in Kconfig
        iio: resolver: ad2s1210: add missing select (TRIGGERED_)BUFFER in Kconfig
        iio: resolver: ad2s1210 add missing select REGMAP in Kconfig
        iio: proximity: mb1232: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
        iio: pressure: bm1390: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
        ...
      c5522822
    • Linus Torvalds's avatar
      Merge tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · c01ac4b9
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are some small tty and serial driver fixes for 6.12-rc4:
      
         - qcom-geni serial driver fixes, wow what a mess of a UART chip that
           thing is...
      
         - vt infoleak fix for odd font sizes
      
         - imx serial driver bugfix
      
         - yet-another n_gsm ldisc bugfix, slowly chipping down the issues in
           that piece of code
      
        All of these have been in linux-next for over a week with no reported
        issues"
      
      * tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: qcom-geni: rename suspend functions
        serial: qcom-geni: drop unused receive parameter
        serial: qcom-geni: drop flip buffer WARN()
        serial: qcom-geni: fix rx cancel dma status bit
        serial: qcom-geni: fix receiver enable
        serial: qcom-geni: fix dma rx cancellation
        serial: qcom-geni: fix shutdown race
        serial: qcom-geni: revert broken hibernation support
        serial: qcom-geni: fix polled console initialisation
        serial: imx: Update mctrl old_status on RTSD interrupt
        tty: n_gsm: Fix use-after-free in gsm_cleanup_mux
        vt: prevent kernel-infoleak in con_font_get()
      c01ac4b9
    • Linus Torvalds's avatar
      Merge tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · b68c1895
      Linus Torvalds authored
      Pull USB driver fixes from Greg KH:
       "Here are some small USB driver fixes and new device ids for 6.12-rc4:
      
         - xhci driver fixes for a number of reported issues
      
         - new usb-serial driver ids
      
         - dwc3 driver fixes for reported problems.
      
         - usb gadget driver fixes for reported problems
      
         - typec driver fixes
      
         - MAINTAINER file updates
      
        All of these have been in linux-next this week with no reported issues"
      
      * tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: serial: option: add Telit FN920C04 MBIM compositions
        USB: serial: option: add support for Quectel EG916Q-GL
        xhci: dbc: honor usb transfer size boundaries.
        usb: xhci: Fix handling errors mid TD followed by other errors
        xhci: Mitigate failed set dequeue pointer commands
        xhci: Fix incorrect stream context type macro
        USB: gadget: dummy-hcd: Fix "task hung" problem
        usb: gadget: f_uac2: fix return value for UAC2_ATTRIBUTE_STRING store
        usb: dwc3: core: Fix system suspend on TI AM62 platforms
        xhci: tegra: fix checked USB2 port number
        usb: dwc3: Wait for EndXfer completion before restoring GUSB2PHYCFG
        usb: typec: qcom-pmic-typec: fix sink status being overwritten with RP_DEF
        usb: typec: altmode should keep reference to parent
        MAINTAINERS: usb: raw-gadget: add bug tracker link
        MAINTAINERS: Add an entry for the LJCA drivers
      b68c1895
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · db87114d
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Explicitly disable the TSC deadline timer when going idle to address
         some CPU errata in that area
      
       - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
         late microcode loading path
      
       - Clear CPU buffers later in the NMI exit path on 32-bit to avoid
         register clearing while they still contain sensitive data, for the
         RDFS mitigation
      
       - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
         path on 32-bit
      
       - Fix parsing issues of memory bandwidth specification in sysfs for
         resctrl's memory bandwidth allocation feature
      
       - Other small cleanups and improvements
      
      * tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic: Always explicitly disarm TSC-deadline timer
        x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
        x86/bugs: Use code segment selector for VERW operand
        x86/entry_32: Clear CPU buffers after register restore in NMI return
        x86/entry_32: Do not clobber user EFLAGS.ZF
        x86/resctrl: Annotate get_mem_config() functions as __init
        x86/resctrl: Avoid overflow in MB settings in bw_validate()
        x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
      db87114d
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 949c9ef5
      Linus Torvalds authored
      Pull irq fixes from Borislav Petkov:
      
       - Fix a case for sifive-plic where an interrupt gets disabled *and*
         masked and remains masked when it gets reenabled later
      
       - Plug a small race in GIC-v4 where userspace can force an affinity
         change of a virtual CPU (vPE) in its unmapping path
      
       - Do not mix the two sets of ocelot irqchip's registers in the mask
         calculation of the main interrupt sticky register
      
      - Other smaller fixlets and cleanups
      
      * tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/renesas-rzg2l: Fix missing put_device
        irqchip/riscv-intc: Fix SMP=n boot with ACPI
        irqchip/sifive-plic: Unmask interrupt in plic_irq_enable()
        irqchip/gic-v4: Don't allow a VMOVP on a dying VPE
        irqchip/sifive-plic: Return error code on failure
        irqchip/riscv-imsic: Fix output text of base address
        irqchip/ocelot: Comment sticky register clearing code
        irqchip/ocelot: Fix trigger register address
        irqchip: Remove obsolete config ARM_GIC_V3_ITS_PCI
      949c9ef5
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2b4d2501
      Linus Torvalds authored
      Pull scheduling fixes from Borislav Petkov:
      
       - Add PREEMPT_RT maintainers
      
       - Fix another aspect of delayed dequeued tasks wrt determining their
         state, i.e., whether they're runnable or blocked
      
       - Handle delayed dequeued tasks and their migration wrt PSI properly
      
       - Fix the situation where a delayed dequeue task gets enqueued into a
         new class, which should not happen
      
       - Fix a case where memory allocation would happen while the runqueue
         lock is held, which is a no-no
      
       - Do not over-schedule when tasks with shorter slices preempt the
         currently running task
      
       - Make sure delayed to deque entities are properly handled before
         unthrottling
      
       - Other smaller cleanups and improvements
      
      * tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MAINTAINERS: Add an entry for PREEMPT_RT.
        sched/fair: Fix external p->on_rq users
        sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug
        sched/core: Dequeue PSI signals for blocked tasks that are delayed
        sched: Fix delayed_dequeue vs switched_from_fair()
        sched/core: Disable page allocation in task_tick_mm_cid()
        sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl()
        sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running
        sched: Fix sched_delayed vs cfs_bandwidth
      2b4d2501
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.12a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · a5ee44c8
      Linus Torvalds authored
      Pull xen fix from Juergen Gross:
       "A single fix for a build failure introduced this merge window"
      
      * tag 'for-linus-6.12a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: Remove dependency between pciback and privcmd
      a5ee44c8
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping · 10e93e19
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
       "Just another small tracing fix from Sean"
      
      * tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: fix tracing dma_alloc/free with vmalloc'd memory
      10e93e19
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.12-3' of... · e9001a38
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.12, take #3
      
      - Stop wasting space in the HYP idmap, as we are dangerously close
        to the 4kB limit, and this has already exploded in -next
      
      - Fix another race in vgic_init()
      
      - Fix a UBSAN error when faking the cache topology with MTE
        enabled
      e9001a38
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.12-2' of... · ddd5c582
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.12, take #2
      
      - Fix the guest view of the ID registers, making the relevant fields
        writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)
      
      - Correcly expose S1PIE to guests, fixing a regression introduced
        in 6.12-rc1 with the S1POE support
      
      - Fix the recycling of stage-2 shadow MMUs by tracking the context
        (are we allowed to block or not) as well as the recycling state
      
      - Address a couple of issues with the vgic when userspace misconfigures
        the emulation, resulting in various splats. Headaches courtesy
        of our Syzkaller friends
      ddd5c582
    • Cyan Yang's avatar
      RISCV: KVM: use raw_spinlock for critical section in imsic · 3ec4350d
      Cyan Yang authored
      For the external interrupt updating procedure in imsic, there was a
      spinlock to protect it already. But since it should not be preempted in
      any cases, we should turn to use raw_spinlock to prevent any preemption
      in case PREEMPT_RT was enabled.
      Signed-off-by: default avatarCyan Yang <cyan.yang@sifive.com>
      Reviewed-by: default avatarYong-Xuan Wang <yongxuan.wang@sifive.com>
      Reviewed-by: default avatarAnup Patel <anup@brainfault.org>
      Message-ID: <20240919160126.44487-1-cyan.yang@sifive.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3ec4350d
    • Sean Christopherson's avatar
      KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups · 773cca18
      Sean Christopherson authored
      When looking for a "mangled", i.e. dynamic, CPUID entry, terminate the
      walk based on the number of array _entries_, not the size in bytes of
      the array.  Iterating based on the total size of the array can result in
      false passes, e.g. if the random data beyond the array happens to match
      a CPUID entry's function and index.
      
      Fixes: fb18d053 ("selftest: kvm: x86: test KVM_GET_CPUID2 and guest visible CPUIDs against KVM_GET_SUPPORTED_CPUID")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-ID: <20241003234337.273364-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      773cca18
    • Vitaly Kuznetsov's avatar
      KVM: selftests: x86: Avoid using SSE/AVX instructions · 9a400068
      Vitaly Kuznetsov authored
      Some distros switched gcc to '-march=x86-64-v3' by default and while it's
      hard to find a CPU which doesn't support it today, many KVM selftests fail
      with
      
        ==== Test Assertion Failure ====
          lib/x86_64/processor.c:570: Unhandled exception in guest
          pid=72747 tid=72747 errno=4 - Interrupted system call
          Unhandled exception '0x6' at guest RIP '0x4104f7'
      
      The failure is easy to reproduce elsewhere with
      
         $ make clean && CFLAGS='-march=x86-64-v3' make -j && ./x86_64/kvm_pv_test
      
      The root cause of the problem seems to be that with '-march=x86-64-v3' GCC
      uses AVX* instructions (VMOVQ in the example above) and without prior
      XSETBV() in the guest this results in #UD. It is certainly possible to add
      it there, e.g. the following saves the day as well:
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-ID: <20240920154422.2890096-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9a400068
    • Sean Christopherson's avatar
      KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory · f559b2e9
      Sean Christopherson authored
      Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits
      4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't
      enforce 32-byte alignment of nCR3.
      
      In the absolute worst case scenario, failure to ignore bits 4:0 can result
      in an out-of-bounds read, e.g. if the target page is at the end of a
      memslot, and the VMM isn't using guard pages.
      
      Per the APM:
      
        The CR3 register points to the base address of the page-directory-pointer
        table. The page-directory-pointer table is aligned on a 32-byte boundary,
        with the low 5 address bits 4:0 assumed to be 0.
      
      And the SDM's much more explicit:
      
        4:0    Ignored
      
      Note, KVM gets this right when loading PDPTRs, it's only the nSVM flow
      that is broken.
      
      Fixes: e4e517b4 ("KVM: MMU: Do not unconditionally read PDPTE from guest memory")
      Reported-by: default avatarKirk Swidowski <swidowski@google.com>
      Cc: Andy Nguyen <theflow@google.com>
      Cc: 3pvd <3pvd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20241009140838.1036226-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f559b2e9
    • Maxim Levitsky's avatar
      KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() · 731285fb
      Maxim Levitsky authored
      Reset the segment cache after segment initialization in vmx_vcpu_reset()
      to harden KVM against caching stale/uninitialized data.  Without the
      recent fix to bypass the cache in kvm_arch_vcpu_put(), the following
      scenario is possible:
      
       - vCPU is just created, and the vCPU thread is preempted before
         SS.AR_BYTES is written in vmx_vcpu_reset().
      
       - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
         vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.
      
       - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
         invoke vmx_segment_cache_clear() to invalidate the cache.
      
      As a result, KVM retains a stale value in the cache, which can be read,
      e.g. via KVM_GET_SREGS.  Usually this is not a problem because the VMX
      segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM
      selftests) reads and writes system registers just after the vCPU was
      created, _without_ modifying SS.AR_BYTES, userspace will write back the
      stale '0' value and ultimately will trigger a VM-Entry failure due to
      incorrect SS segment type.
      
      Invalidating the cache after writing the VMCS doesn't address the general
      issue of cache accesses from IRQ context being unsafe, but it does prevent
      KVM from clobbering the VMCS, i.e. mitigates the harm done _if_ KVM has a
      bug that results in an unsafe cache access.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Fixes: 2fb92db1 ("KVM: VMX: Cache vmcs segment fields")
      [sean: rework changelog to account for previous patch]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20241009175002.1118178-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      731285fb
    • Sean Christopherson's avatar
      KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL · 5a279842
      Sean Christopherson authored
      Massage the documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL to call out that
      it applies to moved memslots as well as deleted memslots, to avoid KVM's
      "fast zap" terminology (which has no meaning for userspace), and to reword
      the documented targeted zap behavior to specifically say that KVM _may_
      zap a subset of all SPTEs.  As evidenced by the fix to zap non-leafs SPTEs
      with gPTEs, formally documenting KVM's exact internal behavior is risky
      and unnecessary.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20241009192345.1148353-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5a279842
    • Sean Christopherson's avatar
      KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range() · 28cf4978
      Sean Christopherson authored
      Add a lockdep assertion in kvm_unmap_gfn_range() to ensure that either
      mmu_invalidate_in_progress is elevated, or that the range is being zapped
      due to memslot removal (loosely detected by slots_lock being held).
      Zapping SPTEs without mmu_invalidate_{in_progress,seq} protection is unsafe
      as KVM's page fault path snapshots state before acquiring mmu_lock, and
      thus can create SPTEs with stale information if vCPUs aren't forced to
      retry faults (due to seeing an in-progress or past MMU invalidation).
      
      Memslot removal is a special case, as the memslot is retrieved outside of
      mmu_invalidate_seq, i.e. doesn't use the "standard" protections, and
      instead relies on SRCU synchronization to ensure any in-flight page faults
      are fully resolved before zapping SPTEs.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20241009192345.1148353-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28cf4978
    • Sean Christopherson's avatar
      KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot · 58a20a94
      Sean Christopherson authored
      When performing a targeted zap on memslot removal, zap only MMU pages that
      shadow guest PTEs, as zapping all SPs that "match" the gfn is inexact and
      unnecessary.  Furthermore, for_each_gfn_valid_sp() arguably shouldn't
      exist, because it doesn't do what most people would it expect it to do.
      The "round gfn for level" adjustment that is done for direct SPs (no gPTE)
      means that the exact gfn comparison will not get a match, even when a SP
      does "cover" a gfn, or was even created specifically for a gfn.
      
      For memslot deletion specifically, KVM's behavior will vary significantly
      based on the size and alignment of a memslot, and in weird ways.  E.g. for
      a 4KiB memslot, KVM will zap more SPs if the slot is 1GiB aligned than if
      it's only 4KiB aligned.  And as described below, zapping SPs in the
      aligned case overzaps for direct MMUs, as odds are good the upper-level
      SPs are serving other memslots.
      
      To iterate over all potentially-relevant gfns, KVM would need to make a
      pass over the hash table for each level, with the gfn used for lookup
      rounded for said level.  And then check that the SP is of the correct
      level, too, e.g. to avoid over-zapping.
      
      But even then, KVM would massively overzap, as processing every level is
      all but guaranteed to zap SPs that serve other memslots, especially if the
      memslot being removed is relatively small.  KVM could mitigate that issue
      by processing only levels that can be possible guest huge pages, i.e. are
      less likely to be re-used for other memslot, but while somewhat logical,
      that's quite arbitrary and would be a bit of a mess to implement.
      
      So, zap only SPs with gPTEs, as the resulting behavior is easy to describe,
      is predictable, and is explicitly minimal, i.e. KVM only zaps SPs that
      absolutely must be zapped.
      
      Cc: Yan Zhao <yan.y.zhao@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Tested-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Message-ID: <20241009192345.1148353-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      58a20a94
    • Kirill A. Shutemov's avatar
      x86/kvm: Override default caching mode for SEV-SNP and TDX · 8e690b81
      Kirill A. Shutemov authored
      AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
      advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
      CR0.CD clear).
      
      This results in guests using uncached mappings where it shouldn't and
      pmd/pud_set_huge() failures due to non-uniform memory type reported by
      mtrr_type_lookup().
      
      Override MTRR state, making it WB by default as the kernel does for
      Hyper-V guests.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Suggested-by: default avatarBinbin Wu <binbin.wu@intel.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Message-ID: <20241015095818.357915-1-kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8e690b81
    • Dr. David Alan Gilbert's avatar
      KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic · bc07eea2
      Dr. David Alan Gilbert authored
      The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit
      1bbc60d0 ("KVM: x86/mmu: Remove MMU auditing")
      
      Remove it.
      Signed-off-by: default avatarDr. David Alan Gilbert <linux@treblig.org>
      Message-ID: <20241001141354.18009-3-linux@treblig.org>
      [Adjust Documentation/virt/kvm/locking.rst. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bc07eea2
    • Dr. David Alan Gilbert's avatar
      KVM: Remove unused kvm_vcpu_gfn_to_pfn · 88a387cf
      Dr. David Alan Gilbert authored
      The last use of kvm_vcpu_gfn_to_pfn was removed by commit
      b1624f99 ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()")
      
      Remove it.
      Signed-off-by: default avatarDr. David Alan Gilbert <linux@treblig.org>
      Message-ID: <20241001141354.18009-2-linux@treblig.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      88a387cf
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.12-20241019' of git://git.kernel.dk/linux · 715ca9dd
      Linus Torvalds authored
      Pull one more io_uring fix from Jens Axboe:
       "Fix for a regression introduced in 6.12-rc2, where a condition check
        was negated and hence -EAGAIN would bubble back up up to userspace
        rather than trigger a retry condition"
      
      * tag 'io_uring-6.12-20241019' of git://git.kernel.dk/linux:
        io_uring/rw: fix wrong NOWAIT check in io_rw_init_file()
      715ca9dd
  3. 19 Oct, 2024 9 commits
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 531643fc
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Fixes all in drivers. The largest is the mpi3mr which corrects a phy
        count limit that should only apply to the controller but was being
        incorrectly applied to expander phys"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: target: core: Fix null-ptr-deref in target_alloc_device()
        scsi: mpi3mr: Validate SAS port assignments
        scsi: ufs: core: Set SDEV_OFFLINE when UFS is shut down
        scsi: ufs: core: Requeue aborted request
        scsi: ufs: core: Fix the issue of ICU failure
      531643fc
    • Linus Torvalds's avatar
      Merge tag 'ftrace-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 06526daa
      Linus Torvalds authored
      Pull ftrace fixes from Steven Rostedt:
       "A couple of fixes to function graph infrastructure:
      
         - Fix allocation of idle shadow stack allocation during hotplug
      
           If function graph tracing is started when a CPU is offline, if it
           were come online during the trace then the idle task that
           represents the CPU will not get a shadow stack allocated for it.
           This means all function graph hooks that happen while that idle
           task is running (including in interrupt mode) will have all its
           events dropped.
      
           Switch over to the CPU hotplug mechanism that will have any newly
           brought on line CPU get a callback that can allocate the shadow
           stack for its idle task.
      
         - Fix allocation size of the ret_stack_list array
      
           When function graph tracing converted over to allowing more than
           one user at a time, it had to convert its shadow stack from an
           array of ret_stack structures to an array of unsigned longs. The
           shadow stacks are allocated in batches of 32 at a time and assigned
           to every running task. The batch is held by the ret_stack_list
           array.
      
           But when the conversion happened, instead of allocating an array of
           32 pointers, it was allocated as a ret_stack itself (PAGE_SIZE).
           This ret_stack_list gets passed to a function that iterates over
           what it believes is its size defined by the
           FTRACE_RETSTACK_ALLOC_SIZE macro (which is 32).
      
           Luckily (PAGE_SIZE) is greater than 32 * sizeof(long), otherwise
           this would have been an array overflow. This still should be fixed
           and the ret_stack_list should be allocated to the size it is
           expected to be as someday it may end up being bigger than
           SHADOW_STACK_SIZE"
      
      * tag 'ftrace-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        fgraph: Allocate ret_stack_list with proper size
        fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks
      06526daa
    • Linus Torvalds's avatar
      Merge tag 'ipe-pr-20241018' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe · 8203ca38
      Linus Torvalds authored
      Pull ipe fixes from Fan Wu:
       "This addresses several issues identified by Luca when attempting to
        enable IPE on Debian and systemd:
      
         - address issues with IPE policy update errors and policy update
           version check, improving the clarity of error messages for better
           understanding by userspace programs.
      
         - enable IPE policies to be signed by secondary and platform
           keyrings, facilitating broader use across general Linux
           distributions like Debian.
      
         - updates the IPE entry in the MAINTAINERS file to reflect the new
           tree URL and my updated email from kernel.org"
      
      * tag 'ipe-pr-20241018' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
        MAINTAINERS: update IPE tree url and Fan Wu's email
        ipe: fallback to platform keyring also if key in trusted keyring is rejected
        ipe: allow secondary and platform keyrings to install/update policies
        ipe: also reject policy updates with the same version
        ipe: return -ESTALE instead of -EINVAL on update when new policy has a lower version
      8203ca38
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · f9e48255
      Linus Torvalds authored
      Pull input fixes from Dmitry Torokhov:
      
       - a fix for Zinitix driver to not fail probing if the property enabling
         touch keys functionality is not defined. Support for touch keys was
         added in 6.12 merge window so this issue does not affect users of
         released kernels
      
       - a couple new vendor/device IDs in xpad driver to enable support for
         more hardware
      
      * tag 'input-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: zinitix - don't fail if linux,keycodes prop is absent
        Input: xpad - add support for MSI Claw A1M
        Input: xpad - add support for 8BitDo Ultimate 2C Wireless Controller
      f9e48255
    • Linus Torvalds's avatar
      Merge tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux · 9197b73f
      Linus Torvalds authored
      Pull 9p fixes from Dominique Martinet:
       "Mashed-up update that I sat on too long:
      
         - fix for multiple slabs created with the same name
      
         - enable multipage folios
      
         - theorical fix to also look for opened fids by inode if none was
           found by dentry"
      
      [ Enabling multi-page folios should have been done during the merge
        window, but it's a one-liner, and the actual meat of the enablement
        is in netfs and already in use for other filesystems...  - Linus ]
      
      * tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux:
        9p: Avoid creating multiple slab caches with the same name
        9p: Enable multipage folios
        9p: v9fs_fid_find: also lookup by inode if not found dentry
      9197b73f
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux · 4e6bd4a3
      Linus Torvalds authored
      Pull rust fixes from Miguel Ojeda:
       "Toolchain and infrastructure:
      
         - Fix several issues with the 'rustc-option' macro. It includes a
           refactor from Masahiro of three '{cc,rust}-*' macros, which is not
           a fix but avoids repeating the same commands (which would be
           several lines in the case of 'rustc-option').
      
         - Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It
           includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not
           a fix but is needed for the actual fix.
      
        And a trivial grammar fix"
      
      * tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux:
        cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
        kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION`
        kbuild: fix issues with rustc-option
        kbuild: refactor cc-option-yn, cc-disable-warning, rust-option-yn macros
        lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOW
      4e6bd4a3
    • Jens Axboe's avatar
      io_uring/rw: fix wrong NOWAIT check in io_rw_init_file() · ae6a888a
      Jens Axboe authored
      A previous commit improved how !FMODE_NOWAIT is dealt with, but
      inadvertently negated a check whilst doing so. This caused -EAGAIN to be
      returned from reading files with O_NONBLOCK set. Fix up the check for
      REQ_F_SUPPORT_NOWAIT.
      Reported-by: default avatarJulian Orth <ju.orth@gmail.com>
      Link: https://github.com/axboe/liburing/issues/1270
      Fixes: f7c91343 ("io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ae6a888a
    • Steven Rostedt's avatar
      fgraph: Allocate ret_stack_list with proper size · fae4078c
      Steven Rostedt authored
      The ret_stack_list is an array of ret_stack shadow stacks for the function
      graph usage. When the first function graph is enabled, all tasks in the
      system get a shadow stack. The ret_stack_list is a 32 element array of
      pointers to these shadow stacks. It allocates the shadow stack in batches
      (32 stacks at a time), assigns them to running tasks, and continues until
      all tasks are covered.
      
      When the function graph shadow stack changed from an array of
      ftrace_ret_stack structures to an array of longs, the allocation of
      ret_stack_list went from allocating an array of 32 elements to just a
      block defined by SHADOW_STACK_SIZE. Luckily, that's defined as PAGE_SIZE
      and is much more than enough to hold 32 pointers. But it is way overkill
      for the amount needed to allocate.
      
      Change the allocation of ret_stack_list back to a kcalloc() of
      FTRACE_RETSTACK_ALLOC_SIZE pointers.
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Link: https://lore.kernel.org/20241018215212.23f13f40@rorschach
      Fixes: 42675b72 ("function_graph: Convert ret_stack to a series of longs")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      fae4078c
    • Steven Rostedt's avatar
      fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks · 2c02f737
      Steven Rostedt authored
      The function graph infrastructure allocates a shadow stack for every task
      when enabled. This includes the idle tasks. The first time the function
      graph is invoked, the shadow stacks are created and never freed until the
      task exits. This includes the idle tasks.
      
      Only the idle tasks that were for online CPUs had their shadow stacks
      created when function graph tracing started. If function graph tracing is
      enabled and a CPU comes online, the idle task representing that CPU will
      not have its shadow stack created, and all function graph tracing for that
      idle task will be silently dropped.
      
      Instead, use the CPU hotplug mechanism to allocate the idle shadow stacks.
      This will include idle tasks for CPUs that come online during tracing.
      
      This issue can be reproduced by:
      
       # cd /sys/kernel/tracing
       # echo 0 > /sys/devices/system/cpu/cpu1/online
       # echo 0 > set_ftrace_pid
       # echo function_graph > current_tracer
       # echo 1 > options/funcgraph-proc
       # echo 1 > /sys/devices/system/cpu/cpu1
       # grep '<idle>' per_cpu/cpu1/trace | head
      
      Before, nothing would show up.
      
      After:
       1)    <idle>-0    |   0.811 us    |                        __enqueue_entity();
       1)    <idle>-0    |   5.626 us    |                      } /* enqueue_entity */
       1)    <idle>-0    |               |                      dl_server_update_idle_time() {
       1)    <idle>-0    |               |                        dl_scaled_delta_exec() {
       1)    <idle>-0    |   0.450 us    |                          arch_scale_cpu_capacity();
       1)    <idle>-0    |   1.242 us    |                        }
       1)    <idle>-0    |   1.908 us    |                      }
       1)    <idle>-0    |               |                      dl_server_start() {
       1)    <idle>-0    |               |                        enqueue_dl_entity() {
       1)    <idle>-0    |               |                          task_contending() {
      
      Note, if tracing stops and restarts, the old way would then initialize
      the onlined CPUs.
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/20241018214300.6df82178@rorschach
      Fixes: 868baf07 ("ftrace: Fix memory leak with function graph and cpu hotplug")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      2c02f737
  4. 18 Oct, 2024 1 commit
    • Linus Torvalds's avatar
      Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 3d5ad2d4
      Linus Torvalds authored
      Pull bpf fixes from Daniel Borkmann:
      
       - Fix BPF verifier to not affect subreg_def marks in its range
         propagation (Eduard Zingerman)
      
       - Fix a truncation bug in the BPF verifier's handling of
         coerce_reg_to_size_sx (Dimitar Kanaliev)
      
       - Fix the BPF verifier's delta propagation between linked registers
         under 32-bit addition (Daniel Borkmann)
      
       - Fix a NULL pointer dereference in BPF devmap due to missing rxq
         information (Florian Kauer)
      
       - Fix a memory leak in bpf_core_apply (Jiri Olsa)
      
       - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
         arrays of nested structs (Hou Tao)
      
       - Fix build ID fetching where memory areas backing the file were
         created with memfd_secret (Andrii Nakryiko)
      
       - Fix BPF task iterator tid filtering which was incorrectly using pid
         instead of tid (Jordan Rome)
      
       - Several fixes for BPF sockmap and BPF sockhash redirection in
         combination with vsocks (Michal Luczaj)
      
       - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)
      
       - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
         of an infinite BPF tailcall (Pu Lehui)
      
       - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
         be resolved (Thomas Weißschuh)
      
       - Fix a bug in kfunc BTF caching for modules where the wrong BTF object
         was returned (Toke Høiland-Jørgensen)
      
       - Fix a BPF selftest compilation error in cgroup-related tests with
         musl libc (Tony Ambardar)
      
       - Several fixes to BPF link info dumps to fill missing fields (Tyrone
         Wu)
      
       - Add BPF selftests for kfuncs from multiple modules, checking that the
         correct kfuncs are called (Simon Sundberg)
      
       - Ensure that internal and user-facing bpf_redirect flags don't overlap
         (Toke Høiland-Jørgensen)
      
       - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
         Riel)
      
       - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
         under RT (Wander Lairson Costa)
      
      * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
        lib/buildid: Handle memfd_secret() files in build_id_parse()
        selftests/bpf: Add test case for delta propagation
        bpf: Fix print_reg_state's constant scalar dump
        bpf: Fix incorrect delta propagation between linked registers
        bpf: Properly test iter/task tid filtering
        bpf: Fix iter/task tid filtering
        riscv, bpf: Make BPF_CMPXCHG fully ordered
        bpf, vsock: Drop static vsock_bpf_prot initialization
        vsock: Update msg_count on read_skb()
        vsock: Update rx_bytes on read_skb()
        bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
        selftests/bpf: Add asserts for netfilter link info
        bpf: Fix link info netfilter flags to populate defrag flag
        selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
        selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
        bpf: Fix truncation bug in coerce_reg_to_size_sx()
        selftests/bpf: Assert link info uprobe_multi count & path_size if unset
        bpf: Fix unpopulated path_size when uprobe_multi fields unset
        selftests/bpf: Fix cross-compiling urandom_read
        selftests/bpf: Add test for kfunc module order
        ...
      3d5ad2d4