1. 17 Apr, 2021 5 commits
  2. 15 Apr, 2021 2 commits
  3. 12 Apr, 2021 1 commit
  4. 08 Apr, 2021 1 commit
    • Jarkko Sakkinen's avatar
      x86/sgx: Do not update sgx_nr_free_pages in sgx_setup_epc_section() · ae40aaf6
      Jarkko Sakkinen authored
      The commit in Fixes: changed the SGX EPC page sanitization to end up in
      sgx_free_epc_page() which puts clean and sanitized pages on the free
      list.
      
      This was done for the reason that it is best to keep the logic to assign
      available-for-use EPC pages to the correct NUMA lists in a single
      location.
      
      sgx_nr_free_pages is also incremented by sgx_free_epc_pages() but those
      pages which are being added there per EPC section do not belong to the
      free list yet because they haven't been sanitized yet - they land on the
      dirty list first and the sanitization happens later when ksgxd starts
      massaging them.
      
      So remove that addition there and have sgx_free_epc_page() do that
      solely.
      
       [ bp: Sanitize commit message too. ]
      
      Fixes: 51ab30eb ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20210408092924.7032-1-jarkko@kernel.org
      ae40aaf6
  5. 06 Apr, 2021 10 commits
  6. 02 Apr, 2021 2 commits
  7. 01 Apr, 2021 3 commits
  8. 30 Mar, 2021 3 commits
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages · 33a31641
      Sean Christopherson authored
      Prevent the TDP MMU from yielding when zapping a gfn range during NX
      page recovery.  If a flush is pending from a previous invocation of the
      zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
      has not accumulated a flush for the current invocation, then yielding
      will release mmu_lock with stale TLB entries.
      
      That being said, this isn't technically a bug fix in the current code, as
      the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
      will yield if and only if it has made forward progress, as defined by the
      current gfn vs. the last yielded (or starting) gfn.  Because zapping a
      single shadow page is guaranteed to (a) find that page and (b) step
      sideways at the level of the shadow page, the TDP iter will break its loop
      before getting a chance to yield.
      
      But that is all very, very subtle, and will break at the slightest sneeze,
      e.g. zapping while holding mmu_lock for read would break as the TDP MMU
      wouldn't be guaranteed to see the present shadow page, and thus could step
      sideways at a lower level.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-4-seanjc@google.com>
      [Add lockdep assertion. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      33a31641
    • Sean Christopherson's avatar
      KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping · 048f4980
      Sean Christopherson authored
      Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which
      does the flush itself if and only if it yields (which it will never do in
      this particular scenario), and otherwise expects the caller to do the
      flush.  If pages are zapped from the TDP MMU but not the legacy MMU, then
      no flush will occur.
      
      Fixes: 29cf0f50 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-3-seanjc@google.com>
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      048f4980
    • Sean Christopherson's avatar
      KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap · a835429c
      Sean Christopherson authored
      When flushing a range of GFNs across multiple roots, ensure any pending
      flush from a previous root is honored before yielding while walking the
      tables of the current root.
      
      Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local
      "flush" with the result to avoid redundant flushes.  zap_gfn_range()
      preserves and return the incoming "flush", unless of course the flush was
      performed prior to yielding and no new flush was triggered.
      
      Fixes: 1af4a960 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a835429c
  9. 26 Mar, 2021 2 commits
    • Sean Christopherson's avatar
      x86/sgx: Add SGX_CHILD_PRESENT hardware error code · 231d3dbd
      Sean Christopherson authored
      SGX driver can accurately track how enclave pages are used.  This
      enables SECS to be specifically targeted and EREMOVE'd only after all
      child pages have been EREMOVE'd.  This ensures that SGX driver will
      never encounter SGX_CHILD_PRESENT in normal operation.
      
      Virtual EPC is different.  The host does not track how EPC pages are
      used by the guest, so it cannot guarantee EREMOVE success.  It might,
      for instance, encounter a SECS with a non-zero child count.
      
      Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
      the SGX virtualization driver to handle recoverable EREMOVE errors when
      saniziting EPC pages after they are freed.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarKai Huang <kai.huang@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Acked-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Link: https://lkml.kernel.org/r/050b198e882afde7e6eba8e6a0d4da39161dbb5a.1616136308.git.kai.huang@intel.com
      231d3dbd
    • Kai Huang's avatar
      x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() · b0c7459b
      Kai Huang authored
      EREMOVE takes a page and removes any association between that page and
      an enclave. It must be run on a page before it can be added into another
      enclave. Currently, EREMOVE is run as part of pages being freed into the
      SGX page allocator. It is not expected to fail, as it would indicate a
      use-after-free of EPC pages. Rather than add the page back to the pool
      of available EPC pages, the kernel intentionally leaks the page to avoid
      additional errors in the future.
      
      However, KVM does not track how guest pages are used, which means that
      SGX virtualization use of EREMOVE might fail. Specifically, it is
      legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
      KVM guest, because KVM/kernel doesn't track SECS pages.
      
      To allow SGX/KVM to introduce a more permissive EREMOVE helper and
      to let the SGX virtualization code use the allocator directly, break
      out the EREMOVE call from the SGX page allocator. Rename the original
      sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that
      it is used to free an EPC page assigned to a host enclave. Replace
      sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so
      there's no functional change.
      
      At the same time, improve the error message when EREMOVE fails, and
      add documentation to explain to the user what that failure means and
      to suggest to the user what to do when this bug happens in the case it
      happens.
      
       [ bp: Massage commit message, fix typos and sanitize text, simplify. ]
      Signed-off-by: default avatarKai Huang <kai.huang@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
      b0c7459b
  10. 25 Mar, 2021 3 commits
  11. 24 Mar, 2021 5 commits
  12. 19 Mar, 2021 2 commits
    • Dave Hansen's avatar
      selftests/sgx: Improve error detection and messages · 4284f7ac
      Dave Hansen authored
      The SGX device file (/dev/sgx_enclave) is unusual in that it requires
      execute permissions.  It has to be both "chmod +x" *and* be on a
      filesystem without 'noexec'.
      
      In the future, udev and systemd should get updates to set up systems
      automatically.  But, for now, nobody's systems do this automatically,
      and everybody gets error messages like this when running ./test_sgx:
      
      	0x0000000000000000 0x0000000000002000 0x03
      	0x0000000000002000 0x0000000000001000 0x05
      	0x0000000000003000 0x0000000000003000 0x03
      	mmap() failed, errno=1.
      
      That isn't very user friendly, even for forgetful kernel developers.
      
      Further, the test case is rather haphazard about its use of fprintf()
      versus perror().
      
      Improve the error messages.  Use perror() where possible.  Lastly,
      do some sanity checks on opening and mmap()ing the device file so
      that we can get a decent error message out to the user.
      
      Now, if your user doesn't have permission, you'll get the following:
      
      	$ ls -l /dev/sgx_enclave
      	crw------- 1 root root 10, 126 Mar 18 11:29 /dev/sgx_enclave
      	$ ./test_sgx
      	Unable to open /dev/sgx_enclave: Permission denied
      
      If you then 'chown dave:dave /dev/sgx_enclave' (or whatever), but
      you leave execute permissions off, you'll get:
      
      	$ ls -l /dev/sgx_enclave
      	crw------- 1 dave dave 10, 126 Mar 18 11:29 /dev/sgx_enclave
      	$ ./test_sgx
      	no execute permissions on device file
      
      If you fix that with "chmod ug+x /dev/sgx" but you leave /dev as
      noexec, you'll get this:
      
      	$ mount | grep "/dev .*noexec"
      	udev on /dev type devtmpfs (rw,nosuid,noexec,...)
      	$ ./test_sgx
      	ERROR: mmap for exec: Operation not permitted
      	mmap() succeeded for PROT_READ, but failed for PROT_EXEC
      	check that user has execute permissions on /dev/sgx_enclave and
      	that /dev does not have noexec set: 'mount | grep "/dev .*noexec"'
      
      That can be fixed with:
      
      	mount -o remount,noexec /devESC
      
      Hopefully, the combination of better error messages and the search
      engines indexing this message will help people fix their systems
      until we do this properly.
      
       [ bp: Improve error messages more. ]
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Link: https://lore.kernel.org/r/20210318194301.11D9A984@viggo.jf.intel.com
      4284f7ac
    • Jarkko Sakkinen's avatar
      x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() · 901ddbb9
      Jarkko Sakkinen authored
      Background
      ==========
      
      SGX enclave memory is enumerated by the processor in contiguous physical
      ranges called Enclave Page Cache (EPC) sections.  Currently, there is a
      free list per section, but allocations simply target the lowest-numbered
      sections.  This is functional, but has no NUMA awareness.
      
      Fortunately, EPC sections are covered by entries in the ACPI SRAT table.
      These entries allow each EPC section to be associated with a NUMA node,
      just like normal RAM.
      
      Solution
      ========
      
      Implement a NUMA-aware enclave page allocator.  Mirror the buddy allocator
      and maintain a list of enclave pages for each NUMA node.  Attempt to
      allocate enclave memory first from local nodes, then fall back to other
      nodes.
      
      Note that the fallback is not as sophisticated as the buddy allocator
      and is itself not aware of NUMA distances.  When a node's free list is
      empty, it searches for the next-highest node with enclave pages (and
      will wrap if necessary).  This could be improved in the future.
      
      Other
      =====
      
      NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node().
      
       [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because
         callers do not expect that and that leads to a NULL ptr deref. ]
      
       [ dhansen: Fix an uninitialized 'nid' variable in
         __sgx_alloc_epc_page() as
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      
         to avoid any potential allocations from the wrong NUMA node or even
         premature allocation failures. ]
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarKai Huang <kai.huang@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
      Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com
      Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com
      Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
      901ddbb9
  13. 18 Mar, 2021 1 commit