1. 30 Nov, 2021 8 commits
    • Paolo Bonzini's avatar
      KVM: x86: Use a stable condition around all VT-d PI paths · 53b7ca1a
      Paolo Bonzini authored
      Currently, checks for whether VT-d PI can be used refer to the current
      status of the feature in the current vCPU; or they more or less pick
      vCPU 0 in case a specific vCPU is not available.
      
      However, these checks do not attempt to synchronize with changes to
      the IRTE.  In particular, there is no path that updates the IRTE when
      APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU
      that has APICv disabled, if the wakeup occurs because of an IRTE
      that points to a posted interrupt.
      
      To fix this, always go through the VT-d PI path as long as there are
      assigned devices and APICv is available on both the host and the VM side.
      Since the relevant condition was copied over three times, take the hint
      and factor it into a separate function.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20211123004311.2954158-5-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53b7ca1a
    • Paolo Bonzini's avatar
      KVM: x86: check PIR even for vCPUs with disabled APICv · 37c4dbf3
      Paolo Bonzini authored
      The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even
      if APICv is disabled on the vCPU that receives it.  In that case, the
      interrupt will just cause a vmexit and leave the ON bit set together
      with the PIR bit corresponding to the interrupt.
      
      Right now, the interrupt would not be delivered until APICv is re-enabled.
      However, fixing this is just a matter of always doing the PIR->IRR
      synchronization, even if the vCPU has temporarily disabled APICv.
      
      This is not a problem for performance, or if anything it is an
      improvement.  First, in the common case where vcpu->arch.apicv_active is
      true, one fewer check has to be performed.  Second, static_call_cond will
      elide the function call if APICv is not present or disabled.  Finally,
      in the case for AMD hardware we can remove the sync_pir_to_irr callback:
      it is only needed for apic_has_interrupt_for_ppr, and that function
      already has a fallback for !APICv.
      
      Cc: stable@vger.kernel.org
      Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20211123004311.2954158-4-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      37c4dbf3
    • Paolo Bonzini's avatar
      KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled · 7e1901f6
      Paolo Bonzini authored
      If APICv is disabled for this vCPU, assigned devices may still attempt to
      post interrupts.  In that case, we need to cancel the vmentry and deliver
      the interrupt with KVM_REQ_EVENT.  Extend the existing code that handles
      injection of L1 interrupts into L2 to cover this case as well.
      
      vmx_hwapic_irr_update is only called when APICv is active so it would be
      confusing to add a check for vcpu->arch.apicv_active in there.  Instead,
      just use vmx_set_rvi directly in vmx_sync_pir_to_irr.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211123004311.2954158-3-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7e1901f6
    • Maciej S. Szmigiero's avatar
      KVM: selftests: page_table_test: fix calculation of guest_test_phys_mem · 81835ee1
      Maciej S. Szmigiero authored
      A kvm_page_table_test run with its default settings fails on VMX due to
      memory region add failure:
      > ==== Test Assertion Failure ====
      >  lib/kvm_util.c:952: ret == 0
      >  pid=10538 tid=10538 errno=17 - File exists
      >     1  0x00000000004057d1: vm_userspace_mem_region_add at kvm_util.c:947
      >     2  0x0000000000401ee9: pre_init_before_test at kvm_page_table_test.c:302
      >     3   (inlined by) run_test at kvm_page_table_test.c:374
      >     4  0x0000000000409754: for_each_guest_mode at guest_modes.c:53
      >     5  0x0000000000401860: main at kvm_page_table_test.c:500
      >     6  0x00007f82ae2d8554: ?? ??:0
      >     7  0x0000000000401894: _start at ??:?
      >  KVM_SET_USER_MEMORY_REGION IOCTL failed,
      >  rc: -1 errno: 17
      >  slot: 1 flags: 0x0
      >  guest_phys_addr: 0xc0000000 size: 0x40000000
      
      This is because the memory range that this test is trying to add
      (0x0c0000000 - 0x100000000) conflicts with LAPIC mapping at 0x0fee00000.
      
      Looking at the code it seems that guest_test_*phys*_mem variable gets
      mistakenly overwritten with guest_test_*virt*_mem while trying to adjust
      the former for alignment.
      With the correct variable adjusted this test runs successfully.
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <52e487458c3172923549bbcf9dfccfbe6faea60b.1637940473.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      81835ee1
    • Sean Christopherson's avatar
      KVM: x86/mmu: Handle "default" period when selectively waking kthread · f47491d7
      Sean Christopherson authored
      Account for the '0' being a default, "let KVM choose" period, when
      determining whether or not the recovery worker needs to be awakened in
      response to userspace reducing the period.  Failure to do so results in
      the worker not being awakened properly, e.g. when changing the period
      from '0' to any small-ish value.
      
      Fixes: 4dfe4f40 ("kvm: x86: mmu: Make NX huge page recovery period configurable")
      Cc: stable@vger.kernel.org
      Cc: Junaid Shahid <junaids@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211120015706.3830341-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f47491d7
    • Paolo Bonzini's avatar
      KVM: MMU: shadow nested paging does not have PKU · 28f091bc
      Paolo Bonzini authored
      Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding
      incorrect interpretations of the nested hypervisor's page tables.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28f091bc
    • Sean Christopherson's avatar
      KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path · 4b85c921
      Sean Christopherson authored
      Drop the "flush" param and return values to/from the TDP MMU's helper for
      zapping collapsible SPTEs.  Because the helper runs with mmu_lock held
      for read, not write, it uses tdp_mmu_zap_spte_atomic(), and the atomic
      zap handles the necessary remote TLB flush.
      
      Similarly, because mmu_lock is dropped and re-acquired between zapping
      legacy MMUs and zapping TDP MMUs, kvm_mmu_zap_collapsible_sptes() must
      handle remote TLB flushes from the legacy MMU before calling into the TDP
      MMU.
      
      Fixes: e2209710 ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211120045046.3940942-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4b85c921
    • Sean Christopherson's avatar
      KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping · 75333772
      Sean Christopherson authored
      Use the yield-safe variant of the TDP MMU iterator when handling an
      unmapping event from the MMU notifier, as most occurences of the event
      allow yielding.
      
      Fixes: e1eed584 ("KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211120015008.3780032-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      75333772
  2. 26 Nov, 2021 17 commits
  3. 25 Nov, 2021 1 commit
  4. 24 Nov, 2021 2 commits
  5. 22 Nov, 2021 2 commits
  6. 21 Nov, 2021 5 commits
    • Linus Torvalds's avatar
      Linux 5.16-rc2 · 13605725
      Linus Torvalds authored
      13605725
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 40c93d7f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Move the command line preparation and the early command line parsing
         earlier so that the command line parameters which affect
         early_reserve_memory(), e.g. efi=nosftreserve, are taken into
         account. This was broken when the invocation of
         early_reserve_memory() was moved recently.
      
       - Use an atomic type for the SGX page accounting, which is read and
         written locklessly, to plug various race conditions related to it.
      
      * tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sgx: Fix free page accounting
        x86/boot: Pull up cmdline preparation and early param parsing
      40c93d7f
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · af16bdea
      Linus Torvalds authored
      Pull x86 perf fixes from Thomas Gleixner:
      
       - Remove unneded PEBS disabling when taking LBR snapshots to prevent an
         unchecked MSR access error.
      
       - Fix IIO event constraints for Snowridge and Skylake server chips.
      
      * tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/perf: Fix snapshot_branch_stack warning in VM
        perf/x86/intel/uncore: Fix IIO event constraints for Snowridge
        perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
        perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server
      af16bdea
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 75603b14
      Linus Torvalds authored
      Pull more powerpc fixes from Michael Ellerman:
      
       - Fix a bug in copying of sigset_t for 32-bit systems, which caused X
         to not start.
      
       - Fix handling of shared LSIs (rare) with the xive interrupt controller
         (Power9/10).
      
       - Fix missing TOC setup in some KVM code, which could result in oopses
         depending on kernel data layout.
      
       - Fix DMA mapping when we have persistent memory and only one DMA
         window available.
      
       - Fix further problems with STRICT_KERNEL_RWX on 8xx, exposed by a
         recent fix.
      
       - A couple of other minor fixes.
      
      Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Cédric Le Goater,
      Christian Zigotzky, Christophe Leroy, Daniel Axtens, Finn Thain, Greg
      Kurz, Masahiro Yamada, Nicholas Piggin, and Uwe Kleine-König.
      
      * tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/xive: Change IRQ domain to a tree domain
        powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
        powerpc/signal32: Fix sigset_t copy
        powerpc/book3e: Fix TLBCAM preset at boot
        powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window
        powerpc/pseries/ddw: simplify enable_ddw()
        powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
        powerpc/pseries: Fix numa FORM2 parsing fallback code
        powerpc/pseries: rename numa_dist_table to form2_distances
        powerpc: clean vdso32 and vdso64 directories
        powerpc/83xx/mpc8349emitx: Drop unused variable
        KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
      75603b14
    • Geert Uytterhoeven's avatar
      pstore/blk: Use "%lu" to format unsigned long · 61eb495c
      Geert Uytterhoeven authored
      On 32-bit:
      
          fs/pstore/blk.c: In function ‘__best_effort_init’:
          include/linux/kern_levels.h:5:18: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 3 has type ‘long unsigned int’ [-Wformat=]
      	5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
      	  |                  ^~~~~~
          include/linux/kern_levels.h:14:19: note: in expansion of macro ‘KERN_SOH’
             14 | #define KERN_INFO KERN_SOH "6" /* informational */
      	  |                   ^~~~~~~~
          include/linux/printk.h:373:9: note: in expansion of macro ‘KERN_INFO’
            373 |  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
      	  |         ^~~~~~~~~
          fs/pstore/blk.c:314:3: note: in expansion of macro ‘pr_info’
            314 |   pr_info("attached %s (%zu) (no dedicated panic_write!)\n",
      	  |   ^~~~~~~
      
      Cc: stable@vger.kernel.org
      Fixes: 7bb9557b ("pstore/blk: Use the normal block device I/O path")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210629103700.1935012-1-geert@linux-m68k.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      61eb495c
  7. 20 Nov, 2021 5 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 923dcc5e
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "15 patches.
      
        Subsystems affected by this patch series: ipc, hexagon, mm (swap,
        slab-generic, kmemleak, hugetlb, kasan, damon, and highmem), and proc"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        proc/vmcore: fix clearing user buffer by properly using clear_user()
        kmap_local: don't assume kmap PTEs are linear arrays in memory
        mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
        mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
        kasan: test: silence intentional read overflow warnings
        hugetlb, userfaultfd: fix reservation restore on userfaultfd error
        hugetlb: fix hugetlb cgroup refcounting during mremap
        mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
        hexagon: ignore vmlinux.lds
        hexagon: clean up timer-regs.h
        hexagon: export raw I/O routines for modules
        mm: emit the "free" trace report before freeing memory in kmem_cache_free()
        shm: extend forced shm destroy to support objects from several IPC nses
        ipc: WARN if trying to remove ipc object which is absent
        mm/swap.c:put_pages_list(): reinitialise the page list
      923dcc5e
    • Linus Torvalds's avatar
      Merge tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-block · 61564e7b
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Flip a cap check to avoid a selinux error (Alistair)
      
       - Fix for a regression this merge window where we can miss a queue ref
         put (me)
      
       - Un-mark pstore-blk as broken, as the condition that triggered that
         change has been rectified (Kees)
      
       - Queue quiesce and sync fixes (Ming)
      
       - FUA insertion fix (Ming)
      
       - blk-cgroup error path put fix (Yu)
      
      * tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-block:
        blk-mq: don't insert FUA request with data into scheduler queue
        blk-cgroup: fix missing put device in error path from blkg_conf_pref()
        block: avoid to quiesce queue in elevator_init_mq
        Revert "mark pstore-blk as broken"
        blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release()
        block: fix missing queue put in error path
        block: Check ADMIN before NICE for IOPRIO_CLASS_RT
      61564e7b
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · b100274c
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "There is an ACPI stubs fix which is ACKed by the ACPI maintainer for
        merging through my tree.
      
        One item stand out and that is that I delete the <linux/sdb.h> header
        that is used by nothing. I deleted this subsystem (through the GPIO
        tree) a while back so I feel responsible for tidying up the floor.
      
        Other than that it is the usual mistakes, a bit noisy around build
        issue and Kconfig then driver fixes.
      
        Specifics:
      
         - Fix some stubs causing compile issues for ACPI.
      
         - Fix some wakeups on AMD IRQs shared between GPIO and SCI.
      
         - Fix a build warning in the Tegra driver.
      
         - Fix a Kconfig issue in the Qualcomm driver.
      
         - Add a missing include the RALink driver.
      
         - Return a valid type for the Apple pinctrl IRQs.
      
         - Implement some Qualcomm SDM845 dual-edge errata.
      
         - Remove the unused <linux/sdb.h> header. (The subsystem was once
           deleted by the pinctrl maintainer...)
      
         - Fix a duplicate initialized in the Tegra driver.
      
         - Fix register offsets for UFS and SDC in the Qualcomm SM8350 driver"
      
      * tag 'pinctrl-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: qcom: sm8350: Correct UFS and SDC offsets
        pinctrl: tegra194: remove duplicate initializer again
        Remove unused header <linux/sdb.h>
        pinctrl: qcom: sdm845: Enable dual edge errata
        pinctrl: apple: Always return valid type in apple_gpio_irq_type
        pinctrl: ralink: include 'ralink_regs.h' in 'pinctrl-mt7620.c'
        pinctrl: qcom: fix unmet dependencies on GPIOLIB for GPIOLIB_IRQCHIP
        pinctrl: tegra: Return const pointer from tegra_pinctrl_get_group()
        pinctrl: amd: Fix wakeups when IRQ is shared with SCI
        ACPI: Add stubs for wakeup handler functions
      b100274c
    • Linus Torvalds's avatar
      Merge tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 6b38e2fb
      Linus Torvalds authored
      Pull s390 updates from Heiko Carstens:
      
       - Add missing Kconfig option for ftrace direct multi sample, so it can
         be compiled again, and also add s390 support for this sample.
      
       - Update Christian Borntraeger's email address.
      
       - Various fixes for memory layout setup. Besides other this makes it
         possible to load shared DCSS segments again.
      
       - Fix copy to user space of swapped kdump oldmem.
      
       - Remove -mstack-guard and -mstack-size compile options when building
         vdso binaries. This can happen when CONFIG_VMAP_STACK is disabled and
         results in broken vdso code which causes more or less random
         exceptions. Also remove the not needed -nostdlib option.
      
       - Fix memory leak on cpu hotplug and return code handling in kexec
         code.
      
       - Wire up futex_waitv system call.
      
       - Replace snprintf with sysfs_emit where appropriate.
      
      * tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        ftrace/samples: add s390 support for ftrace direct multi sample
        ftrace/samples: add missing Kconfig option for ftrace direct multi sample
        MAINTAINERS: update email address of Christian Borntraeger
        s390/kexec: fix memory leak of ipl report buffer
        s390/kexec: fix return code handling
        s390/dump: fix copying to user-space of swapped kdump oldmem
        s390: wire up sys_futex_waitv system call
        s390/vdso: filter out -mstack-guard and -mstack-size
        s390/vdso: remove -nostdlib compiler flag
        s390: replace snprintf in show functions with sysfs_emit
        s390/boot: simplify and fix kernel memory layout setup
        s390/setup: re-arrange memblock setup
        s390/setup: avoid using memblock_enforce_memory_limit
        s390/setup: avoid reserving memory above identity mapping
      6b38e2fb
    • Linus Torvalds's avatar
      Merge tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · b38bfc74
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three small cifs/smb3 fixes: two to address minor coverity issues and
        one cleanup"
      
      * tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: introduce cifs_ses_mark_for_reconnect() helper
        cifs: protect srv_count with cifs_tcp_ses_lock
        cifs: move debug print out of spinlock
      b38bfc74