Commit 8cfbdbdc authored by Paul Mackerras's avatar Paul Mackerras Committed by Michael Ellerman

KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages

Commit 76fa4975 ("KVM: PPC: Check if IOMMU page is contained in
the pinned physical page", 2018-07-17) added some checks to ensure
that guest DMA mappings don't attempt to map more than the guest is
entitled to access. However, errors in the logic mean that legitimate
guest requests to map pages for DMA are being denied in some
situations. Specifically, if the first page of the range passed to
mm_iommu_get() is mapped with a normal page, and subsequent pages are
mapped with transparent huge pages, we end up with mem->pageshift ==
0. That means that the page size checks in mm_iommu_ua_to_hpa() and
mm_iommu_up_to_hpa_rm() will always fail for every page in that
region, and thus the guest can never map any memory in that region for
DMA, typically leading to a flood of error messages like this:

  qemu-system-ppc64: VFIO_MAP_DMA: -22
  qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument)

The logic errors in mm_iommu_get() are:

  (a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte()
      call (meaning that find_linux_pte() returns the pte for the
      first address in the range, not the address we are currently up
      to);
  (b) use of 'pageshift' as the variable to receive the hugepage shift
      returned by find_linux_pte() - for a normal page this gets set
      to 0, leading to us setting mem->pageshift to 0 when we conclude
      that the pte returned by find_linux_pte() didn't match the page
      we were looking at;
  (c) comparing 'compshift', which is a page order, i.e. log base 2 of
      the number of pages, with 'pageshift', which is a log base 2 of
      the number of bytes.

To fix these problems, this patch introduces 'cur_ua' to hold the
current user address and uses that in the find_linux_pte() call;
introduces 'pteshift' to hold the hugepage shift found by
find_linux_pte(); and compares 'pteshift' with 'compshift +
PAGE_SHIFT' rather than 'compshift'.

The patch also moves the local_irq_restore to the point after the PTE
pointer returned by find_linux_pte() has been dereferenced because
otherwise the PTE could change underneath us, and adds a check to
avoid doing the find_linux_pte() call once mem->pageshift has been
reduced to PAGE_SHIFT, as an optimization.

Fixes: 76fa4975 ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
parent f08d08f3
...@@ -129,6 +129,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, ...@@ -129,6 +129,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
long i, j, ret = 0, locked_entries = 0; long i, j, ret = 0, locked_entries = 0;
unsigned int pageshift; unsigned int pageshift;
unsigned long flags; unsigned long flags;
unsigned long cur_ua;
struct page *page = NULL; struct page *page = NULL;
mutex_lock(&mem_list_mutex); mutex_lock(&mem_list_mutex);
...@@ -177,7 +178,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, ...@@ -177,7 +178,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
} }
for (i = 0; i < entries; ++i) { for (i = 0; i < entries; ++i) {
if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT), cur_ua = ua + (i << PAGE_SHIFT);
if (1 != get_user_pages_fast(cur_ua,
1/* pages */, 1/* iswrite */, &page)) { 1/* pages */, 1/* iswrite */, &page)) {
ret = -EFAULT; ret = -EFAULT;
for (j = 0; j < i; ++j) for (j = 0; j < i; ++j)
...@@ -196,7 +198,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, ...@@ -196,7 +198,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
if (is_migrate_cma_page(page)) { if (is_migrate_cma_page(page)) {
if (mm_iommu_move_page_from_cma(page)) if (mm_iommu_move_page_from_cma(page))
goto populate; goto populate;
if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT), if (1 != get_user_pages_fast(cur_ua,
1/* pages */, 1/* iswrite */, 1/* pages */, 1/* iswrite */,
&page)) { &page)) {
ret = -EFAULT; ret = -EFAULT;
...@@ -210,20 +212,21 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, ...@@ -210,20 +212,21 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
} }
populate: populate:
pageshift = PAGE_SHIFT; pageshift = PAGE_SHIFT;
if (PageCompound(page)) { if (mem->pageshift > PAGE_SHIFT && PageCompound(page)) {
pte_t *pte; pte_t *pte;
struct page *head = compound_head(page); struct page *head = compound_head(page);
unsigned int compshift = compound_order(head); unsigned int compshift = compound_order(head);
unsigned int pteshift;
local_irq_save(flags); /* disables as well */ local_irq_save(flags); /* disables as well */
pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift); pte = find_linux_pte(mm->pgd, cur_ua, NULL, &pteshift);
local_irq_restore(flags);
/* Double check it is still the same pinned page */ /* Double check it is still the same pinned page */
if (pte && pte_page(*pte) == head && if (pte && pte_page(*pte) == head &&
pageshift == compshift) pteshift == compshift + PAGE_SHIFT)
pageshift = max_t(unsigned int, pageshift, pageshift = max_t(unsigned int, pteshift,
PAGE_SHIFT); PAGE_SHIFT);
local_irq_restore(flags);
} }
mem->pageshift = min(mem->pageshift, pageshift); mem->pageshift = min(mem->pageshift, pageshift);
mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT; mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment