1. 10 Feb, 2022 25 commits
  2. 08 Feb, 2022 9 commits
    • Maxim Levitsky's avatar
      KVM: x86: SVM: move avic definitions from AMD's spec to svm.h · 39150352
      Maxim Levitsky authored
      asm/svm.h is the correct place for all values that are defined in
      the SVM spec, and that includes AVIC.
      
      Also add some values from the spec that were not defined before
      and will be soon useful.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220207155447.840194-10-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      39150352
    • Maxim Levitsky's avatar
      KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it · 755c2bf8
      Maxim Levitsky authored
      kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits
      can be set by the CPU after it is called, and don't cause the irr_pending
      to be set to true.
      
      Also logic in avic_kick_target_vcpu doesn't expect a race with this
      function so to make it simple, just keep irr_pending set to true and
      let the next interrupt injection to the guest clear it.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220207155447.840194-9-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      755c2bf8
    • Maxim Levitsky's avatar
      KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them · 2b0ecccb
      Maxim Levitsky authored
      Fix a corner case in which the L1 hypervisor intercepts
      interrupts (INTERCEPT_INTR) and either doesn't set
      virtual interrupt masking (V_INTR_MASKING) or enters a
      nested guest with EFLAGS.IF disabled prior to the entry.
      
      In this case, despite the fact that L1 intercepts the interrupts,
      KVM still needs to set up an interrupt window to wait before
      injecting the INTR vmexit.
      
      Currently the KVM instead enters an endless loop of 'req_immediate_exit'.
      
      Exactly the same issue also happens for SMIs and NMI.
      Fix this as well.
      
      Note that on VMX, this case is impossible as there is only
      'vmexit on external interrupts' execution control which either set,
      in which case both host and guest's EFLAGS.IF
      are ignored, or not set, in which case no VMexits are delivered.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220207155447.840194-8-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2b0ecccb
    • Maxim Levitsky's avatar
      KVM: x86: nSVM: expose clean bit support to the guest · 91f673b3
      Maxim Levitsky authored
      KVM already honours few clean bits thus it makes sense
      to let the nested guest know about it.
      
      Note that KVM also doesn't check if the hardware supports
      clean bits, and therefore nested KVM was
      already setting clean bits and L0 KVM
      was already honouring them.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220207155447.840194-6-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      91f673b3
    • Maxim Levitsky's avatar
      KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM · 759cbd59
      Maxim Levitsky authored
      While RSM induced VM entries are not full VM entries,
      they still need to be followed by actual VM entry to complete it,
      unlike setting the nested state.
      
      This patch fixes boot of hyperv and SMM enabled
      windows VM running nested on KVM, which fail due
      to this issue combined with lack of dirty bit setting.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Message-Id: <20220207155447.840194-5-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      759cbd59
    • Maxim Levitsky's avatar
      KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state · e8efa4ff
      Maxim Levitsky authored
      While usually, restoring the smm state makes the KVM enter
      the nested guest thus a different vmcb (vmcb02 vs vmcb01),
      KVM should still mark it as dirty, since hardware
      can in theory cache multiple vmcbs.
      
      Failure to do so, combined with lack of setting the
      nested_run_pending (which is fixed in the next patch),
      might make KVM re-enter vmcb01, which was just exited from,
      with completely different set of guest state registers
      (SMM vs non SMM) and without proper dirty bits set,
      which results in the CPU reusing stale IDTR pointer
      which leads to a guest shutdown on any interrupt.
      
      On the real hardware this usually doesn't happen,
      but when running nested, L0's KVM does check and
      honour few dirty bits, causing this issue to happen.
      
      This patch fixes boot of hyperv and SMM enabled
      windows VM running nested on KVM.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Message-Id: <20220207155447.840194-4-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8efa4ff
    • Maxim Levitsky's avatar
      KVM: x86: nSVM: fix potential NULL derefernce on nested migration · e1779c27
      Maxim Levitsky authored
      Turns out that due to review feedback and/or rebases
      I accidentally moved the call to nested_svm_load_cr3 to be too early,
      before the NPT is enabled, which is very wrong to do.
      
      KVM can't even access guest memory at that point as nested NPT
      is needed for that, and of course it won't initialize the walk_mmu,
      which is main issue the patch was addressing.
      
      Fix this for real.
      
      Fixes: 232f75d3 ("KVM: nSVM: call nested_svm_load_cr3 on nested state load")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220207155447.840194-3-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e1779c27
    • Maxim Levitsky's avatar
      KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case · c53bbe21
      Maxim Levitsky authored
      When the guest doesn't enable paging, and NPT/EPT is disabled, we
      use guest't paging CR3's as KVM's shadow paging pointer and
      we are technically in direct mode as if we were to use NPT/EPT.
      
      In direct mode we create SPTEs with user mode permissions
      because usually in the direct mode the NPT/EPT doesn't
      need to restrict access based on guest CPL
      (there are MBE/GMET extenstions for that but KVM doesn't use them).
      
      In this special "use guest paging as direct" mode however,
      and if CR4.SMAP/CR4.SMEP are enabled, that will make the CPU
      fault on each access and KVM will enter endless loop of page faults.
      
      Since page protection doesn't have any meaning in !PG case,
      just don't passthrough these bits.
      
      The fix is the same as was done for VMX in commit:
      commit 656ec4a4 ("KVM: VMX: fix SMEP and SMAP without EPT")
      
      This fixes the boot of windows 10 without NPT for good.
      (Without this patch, BSP boots, but APs were stuck in endless
      loop of page faults, causing the VM boot with 1 CPU)
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Message-Id: <20220207155447.840194-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c53bbe21
    • Sean Christopherson's avatar
      Revert "svm: Add warning message for AVIC IPI invalid target" · dd4589ee
      Sean Christopherson authored
      Remove a WARN on an "AVIC IPI invalid target" exit, the WARN is trivial
      to trigger from guest as it will fail on any destination APIC ID that
      doesn't exist from the guest's perspective.
      
      Don't bother recording anything in the kernel log, the common tracepoint
      for kvm_avic_incomplete_ipi() is sufficient for debugging.
      
      This reverts commit 37ef0c44.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dd4589ee
  3. 06 Feb, 2022 6 commits
    • Linus Torvalds's avatar
      Linux 5.17-rc3 · dfd42fac
      Linus Torvalds authored
      dfd42fac
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · d8ad2ce8
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Various bug fixes for ext4 fast commit and inline data handling.
      
        Also fix regression introduced as part of moving to the new mount API"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        fs/ext4: fix comments mentioning i_mutex
        ext4: fix incorrect type issue during replay_del_range
        jbd2: fix kernel-doc descriptions for jbd2_journal_shrink_{scan,count}()
        ext4: fix potential NULL pointer dereference in ext4_fill_super()
        jbd2: refactor wait logic for transaction updates into a common function
        jbd2: cleanup unused functions declarations from jbd2.h
        ext4: fix error handling in ext4_fc_record_modified_inode()
        ext4: remove redundant max inline_size check in ext4_da_write_inline_data_begin()
        ext4: fix error handling in ext4_restore_inline_data()
        ext4: fast commit may miss file actions
        ext4: fast commit may not fallback for ineligible commit
        ext4: modify the logic of ext4_mb_new_blocks_simple
        ext4: prevent used blocks from being allocated during fast commit replay
      d8ad2ce8
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.17-2022-02-06' of... · 18118a42
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.17-2022-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix display of grouped aliased events in 'perf stat'.
      
       - Add missing branch_sample_type to perf_event_attr__fprintf().
      
       - Apply correct label to user/kernel symbols in branch mode.
      
       - Fix 'perf ftrace' system_wide tracing, it has to be set before
         creating the maps.
      
       - Return error if procfs isn't mounted for PID namespaces when
         synthesizing records for pre-existing processes.
      
       - Set error stream of objdump process for 'perf annotate' TUI, to avoid
         garbling the screen.
      
       - Add missing arm64 support to perf_mmap__read_self(), the kernel part
         got into 5.17.
      
       - Check for NULL pointer before dereference writing debug info about a
         sample.
      
       - Update UAPI copies for asound, perf_event, prctl and kvm headers.
      
       - Fix a typo in bpf_counter_cgroup.c.
      
      * tag 'perf-tools-fixes-for-v5.17-2022-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf ftrace: system_wide collection is not effective by default
        libperf: Add arm64 support to perf_mmap__read_self()
        tools include UAPI: Sync sound/asound.h copy with the kernel sources
        perf stat: Fix display of grouped aliased events
        perf tools: Apply correct label to user/kernel symbols in branch mode
        perf bpf: Fix a typo in bpf_counter_cgroup.c
        perf synthetic-events: Return error if procfs isn't mounted for PID namespaces
        perf session: Check for NULL pointer before dereference
        perf annotate: Set error stream of objdump process for TUI
        perf tools: Add missing branch_sample_type to perf_event_attr__fprintf()
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
        tools headers UAPI: Sync linux/prctl.h with the kernel sources
        perf beauty: Make the prctl arg regexp more strict to cope with PR_SET_VMA
        tools headers cpufeatures: Sync with the kernel sources
        tools headers UAPI: Sync linux/perf_event.h with the kernel sources
        tools include UAPI: Sync sound/asound.h copy with the kernel sources
      18118a42
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c3bf8a14
      Linus Torvalds authored
      Pull perf fixes from Borislav Petkov:
      
       - Intel/PT: filters could crash the kernel
      
       - Intel: default disable the PMU for SMM, some new-ish EFI firmware has
         started using CPL3 and the PMU CPL filters don't discriminate against
         SMM, meaning that CPL3 (userspace only) events now also count EFI/SMM
         cycles.
      
       - Fixup for perf_event_attr::sig_data
      
      * tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/pt: Fix crash with stop filters in single-range mode
        perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
        selftests/perf_events: Test modification of perf_event_attr::sig_data
        perf: Copy perf_event_attr::sig_data on modification
        x86/perf: Default set FREEZE_ON_SMI for all
      c3bf8a14
    • Linus Torvalds's avatar
      Merge tag 'objtool_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · aeabe1e0
      Linus Torvalds authored
      Pull objtool fix from Borislav Petkov:
       "Fix a potential truncated string warning triggered by gcc12"
      
      * tag 'objtool_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Fix truncated string warning
      aeabe1e0
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b72e40b1
      Linus Torvalds authored
      Pull irq fix from Borislav Petkov:
       "Remove a bogus warning introduced by the recent PCI MSI irq affinity
        overhaul"
      
      * tag 'irq_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        PCI/MSI: Remove bogus warning in pci_irq_get_affinity()
      b72e40b1