1. 21 Jul, 2023 3 commits
    • Daniel Sneddon's avatar
      KVM: Add GDS_NO support to KVM · 81ac7e5d
      Daniel Sneddon authored
      
      Gather Data Sampling (GDS) is a transient execution attack using
      gather instructions from the AVX2 and AVX512 extensions. This attack
      allows malicious code to infer data that was previously stored in
      vector registers. Systems that are not vulnerable to GDS will set the
      GDS_NO bit of the IA32_ARCH_CAPABILITIES MSR. This is useful for VM
      guests that may think they are on vulnerable systems that are, in
      fact, not affected. Guests that are running on affected hosts where
      the mitigation is enabled are protected as if they were running
      on an unaffected system.
      
      On all hosts that are not affected or that are mitigated, set the
      GDS_NO bit.
      Signed-off-by: default avatarDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      81ac7e5d
    • Daniel Sneddon's avatar
      x86/speculation: Add Kconfig option for GDS · 53cf5797
      Daniel Sneddon authored
      
      Gather Data Sampling (GDS) is mitigated in microcode. However, on
      systems that haven't received the updated microcode, disabling AVX
      can act as a mitigation. Add a Kconfig option that uses the microcode
      mitigation if available and disables AVX otherwise. Setting this
      option has no effect on systems not affected by GDS. This is the
      equivalent of setting gather_data_sampling=force.
      Signed-off-by: default avatarDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      53cf5797
    • Daniel Sneddon's avatar
      x86/speculation: Add force option to GDS mitigation · 553a5c03
      Daniel Sneddon authored
      The Gather Data Sampling (GDS) vulnerability allows malicious software
      to infer stale data previously stored in vector registers. This may
      include sensitive data such as cryptographic keys. GDS is mitigated in
      microcode, and systems with up-to-date microcode are protected by
      default. However, any affected system that is running with older
      microcode will still be vulnerable to GDS attacks.
      
      Since the gather instructions used by the attacker are part of the
      AVX2 and AVX512 extensions, disabling these extensions prevents gather
      instructions from being executed, thereby mitigating the system from
      GDS. Disabling AVX2 is sufficient, but we don't have the granularity
      to do this. The XCR0[2] disables AVX, with no option to just disable
      AVX2.
      
      Add a kernel parameter gather_data_sampling=force that will enable the
      microcode mitigation if available, otherwise it will disable AVX on
      affected systems.
      
      This option will be ignored if cmdline mitigations=off.
      
      This is a *big* hammer.  It is known to break buggy userspace that
      uses incomplete, buggy AVX enumeration.  Unfortunately, such userspace
      does exist in the wild:
      
      	https://www.mail-archive.com/bug-coreutils@gnu.org/msg33046.html
      
      
      
      [ dhansen: add some more ominous warnings about disabling AVX ]
      Signed-off-by: default avatarDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      553a5c03
  2. 19 Jul, 2023 1 commit
    • Daniel Sneddon's avatar
      x86/speculation: Add Gather Data Sampling mitigation · 8974eb58
      Daniel Sneddon authored
      
      Gather Data Sampling (GDS) is a hardware vulnerability which allows
      unprivileged speculative access to data which was previously stored in
      vector registers.
      
      Intel processors that support AVX2 and AVX512 have gather instructions
      that fetch non-contiguous data elements from memory. On vulnerable
      hardware, when a gather instruction is transiently executed and
      encounters a fault, stale data from architectural or internal vector
      registers may get transiently stored to the destination vector
      register allowing an attacker to infer the stale data using typical
      side channel techniques like cache timing attacks.
      
      This mitigation is different from many earlier ones for two reasons.
      First, it is enabled by default and a bit must be set to *DISABLE* it.
      This is the opposite of normal mitigation polarity. This means GDS can
      be mitigated simply by updating microcode and leaving the new control
      bit alone.
      
      Second, GDS has a "lock" bit. This lock bit is there because the
      mitigation affects the hardware security features KeyLocker and SGX.
      It needs to be enabled and *STAY* enabled for these features to be
      mitigated against GDS.
      
      The mitigation is enabled in the microcode by default. Disable it by
      setting gather_data_sampling=off or by disabling all mitigations with
      mitigations=off. The mitigation status can be checked by reading:
      
          /sys/devices/system/cpu/vulnerabilities/gather_data_sampling
      Signed-off-by: default avatarDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      8974eb58
  3. 16 Jun, 2023 1 commit
  4. 16 Mar, 2023 1 commit
  5. 27 Feb, 2023 1 commit
  6. 25 Jan, 2023 1 commit
  7. 13 Jan, 2023 1 commit
  8. 12 Jan, 2023 1 commit
  9. 10 Jan, 2023 1 commit
  10. 04 Jan, 2023 1 commit
  11. 02 Dec, 2022 1 commit
  12. 22 Nov, 2022 1 commit
  13. 09 Nov, 2022 2 commits
    • Paolo Bonzini's avatar
      x86, KVM: remove unnecessary argument to x86_virt_spec_ctrl and callers · bd3d394e
      Paolo Bonzini authored
      
      x86_virt_spec_ctrl only deals with the paravirtualized
      MSR_IA32_VIRT_SPEC_CTRL now and does not handle MSR_IA32_SPEC_CTRL
      anymore; remove the corresponding, unused argument.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bd3d394e
    • Paolo Bonzini's avatar
      KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly · 9f2febf3
      Paolo Bonzini authored
      Restoration of the host IA32_SPEC_CTRL value is probably too late
      with respect to the return thunk training sequence.
      
      With respect to the user/kernel boundary, AMD says, "If software chooses
      to toggle STIBP (e.g., set STIBP on kernel entry, and clear it on kernel
      exit), software should set STIBP to 1 before executing the return thunk
      training sequence." I assume the same requirements apply to the guest/host
      boundary. The return thunk training sequence is in vmenter.S, quite close
      to the VM-exit. On hosts without V_SPEC_CTRL, however, the host's
      IA32_SPEC_CTRL value is not restored until much later.
      
      To avoid this, move the restoration of host SPEC_CTRL to assembly and,
      for consistency, move the restoration of the guest SPEC_CTRL as well.
      This is not particularly difficult, apart from some care to cover both
      32- and 64-bit, and to share code between SEV-ES and normal vmentry.
      
      Cc: stable@vger.kernel.org
      Fixes: a149180f
      
       ("x86: Add magic AMD return-thunk")
      Suggested-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9f2febf3
  14. 17 Oct, 2022 3 commits
  15. 18 Aug, 2022 1 commit
  16. 08 Aug, 2022 1 commit
  17. 03 Aug, 2022 1 commit
    • Daniel Sneddon's avatar
      x86/speculation: Add RSB VM Exit protections · 2b129932
      Daniel Sneddon authored
      
      tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
      documented for RET instructions after VM exits. Mitigate it with a new
      one-entry RSB stuffing mechanism and a new LFENCE.
      
      == Background ==
      
      Indirect Branch Restricted Speculation (IBRS) was designed to help
      mitigate Branch Target Injection and Speculative Store Bypass, i.e.
      Spectre, attacks. IBRS prevents software run in less privileged modes
      from affecting branch prediction in more privileged modes. IBRS requires
      the MSR to be written on every privilege level change.
      
      To overcome some of the performance issues of IBRS, Enhanced IBRS was
      introduced.  eIBRS is an "always on" IBRS, in other words, just turn
      it on once instead of writing the MSR on every privilege level change.
      When eIBRS is enabled, more privileged modes should be protected from
      less privileged modes, including protecting VMMs from guests.
      
      == Problem ==
      
      Here's a simplification of how guests are run on Linux' KVM:
      
      void run_kvm_guest(void)
      {
      	// Prepare to run guest
      	VMRESUME();
      	// Clean up after guest runs
      }
      
      The execution flow for that would look something like this to the
      processor:
      
      1. Host-side: call run_kvm_guest()
      2. Host-side: VMRESUME
      3. Guest runs, does "CALL guest_function"
      4. VM exit, host runs again
      5. Host might make some "cleanup" function calls
      6. Host-side: RET from run_kvm_guest()
      
      Now, when back on the host, there are a couple of possible scenarios of
      post-guest activity the host needs to do before executing host code:
      
      * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
      touched and Linux has to do a 32-entry stuffing.
      
      * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
      IBRS=1 shortly after VM exit, has a documented side effect of flushing
      the RSB except in this PBRSB situation where the software needs to stuff
      the last RSB entry "by hand".
      
      IOW, with eIBRS supported, host RET instructions should no longer be
      influenced by guest behavior after the host retires a single CALL
      instruction.
      
      However, if the RET instructions are "unbalanced" with CALLs after a VM
      exit as is the RET in #6, it might speculatively use the address for the
      instruction after the CALL in #3 as an RSB prediction. This is a problem
      since the (untrusted) guest controls this address.
      
      Balanced CALL/RET instruction pairs such as in step #5 are not affected.
      
      == Solution ==
      
      The PBRSB issue affects a wide variety of Intel processors which
      support eIBRS. But not all of them need mitigation. Today,
      X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
      PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
      eIBRS systems which enable legacy IBRS explicitly.
      
      However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
      and most of them need a new mitigation.
      
      Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
      which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
      
      The lighter-weight mitigation performs a CALL instruction which is
      immediately followed by a speculative execution barrier (INT3). This
      steers speculative execution to the barrier -- just like a retpoline
      -- which ensures that speculation can never reach an unbalanced RET.
      Then, ensure this CALL is retired before continuing execution with an
      LFENCE.
      
      In other words, the window of exposure is opened at VM exit where RET
      behavior is troublesome. While the window is open, force RSB predictions
      sampling for RET targets to a dead end at the INT3. Close the window
      with the LFENCE.
      
      There is a subset of eIBRS systems which are not vulnerable to PBRSB.
      Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
      Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
      
        [ bp: Massage, incorporate review comments from Andy Cooper. ]
      Signed-off-by: default avatarDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Co-developed-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      2b129932
  18. 29 Jul, 2022 1 commit
    • Thadeu Lima de Souza Cascardo's avatar
      x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available · 571c30b1
      Thadeu Lima de Souza Cascardo authored
      Some cloud hypervisors do not provide IBPB on very recent CPU processors,
      including AMD processors affected by Retbleed.
      
      Using IBPB before firmware calls on such systems would cause a GPF at boot
      like the one below. Do not enable such calls when IBPB support is not
      present.
      
        EFI Variables Facility v0.08 2004-May-17
        general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
        Workqueue: efi_rts_wq efi_call_rts
        RIP: 0010:efi_call_rts
        Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48
        RSP: 0018:ffffb373800d7e38 EFLAGS: 00010246
        RAX: 0000000000000001 RBX: 0000000000000006 RCX: 0000000000000049
        RDX: 0000000000000000 RSI: ffff94fbc19d8fe0 RDI: ffff94fbc1b2b300
        RBP: ffffb373800d7e70 R08: 0000000000000000 R09: 0000000000000000
        R10: 000000000000000b R11: 000000000000000b R12: ffffb3738001fd78
        R13: ffff94fbc2fcfc00 R14: ffffb3738001fd80 R15: 0000000000000048
        FS:  0000000000000000(0000) GS:ffff94fc3da00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff94fc30201000 CR3: 000000006f610000 CR4: 00000000000406f0
        Call Trace:
         <TASK>
         ? __wake_up
         process_one_work
         worker_thread
         ? rescuer_thread
         kthread
         ? kthread_complete_and_exit
         ret_from_fork
         </TASK>
        Modules linked in:
      
      Fixes: 28a99e95
      
       ("x86/amd: Use IBPB for firmware calls")
      Reported-by: default avatarDimitri John Ledkov <dimitri.ledkov@canonical.com>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220728122602.2500509-1-cascardo@canonical.com
      571c30b1
  19. 20 Jul, 2022 1 commit
  20. 18 Jul, 2022 1 commit
  21. 16 Jul, 2022 1 commit
  22. 14 Jul, 2022 1 commit
  23. 09 Jul, 2022 1 commit
    • Pawan Gupta's avatar
      x86/speculation: Disable RRSBA behavior · 4ad3278d
      Pawan Gupta authored
      
      Some Intel processors may use alternate predictors for RETs on
      RSB-underflow. This condition may be vulnerable to Branch History
      Injection (BHI) and intramode-BTI.
      
      Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines,
      eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against
      such attacks. However, on RSB-underflow, RET target prediction may
      fallback to alternate predictors. As a result, RET's predicted target
      may get influenced by branch history.
      
      A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback
      behavior when in kernel mode. When set, RETs will not take predictions
      from alternate predictors, hence mitigating RETs as well. Support for
      this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2).
      
      For spectre v2 mitigation, when a user selects a mitigation that
      protects indirect CALLs and JMPs against BHI and intramode-BTI, set
      RRSBA_DIS_S also to protect RETs for RSB-underflow case.
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      4ad3278d
  24. 08 Jul, 2022 1 commit
  25. 29 Jun, 2022 1 commit
  26. 27 Jun, 2022 10 commits