1. 05 Mar, 2019 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration · 35f2806b
      Aneesh Kumar K.V authored
      We added runtime allocation of 16G pages in commit 4ae279c2
      ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.") That was done
      to enable 16G allocation on PowerNV and KVM config. In case of KVM
      config, we mostly would have the entire guest RAM backed by 16G
      hugetlb pages for this to work. PAPR do support partial backing of
      guest RAM with hugepages via ibm,expected#pages node of memory node in
      the device tree. This means rest of the guest RAM won't be backed by
      16G contiguous pages in the host and hence a hash page table insertion
      can fail in such case.
      
      An example error message will look like
      
        hash-mmu: mm: Hashing failure ! EA=0x7efc00000000 access=0x8000000000000006 current=readback
        hash-mmu:     trap=0x300 vsid=0x67af789 ssize=1 base psize=14 psize 14 pte=0xc000000400000386
        readback[12260]: unhandled signal 7 at 00007efc00000000 nip 00000000100012d0 lr 000000001000127c code 2
      
      This patch address that by preventing runtime allocation of 16G
      hugepages in LPAR config. To allocate 16G hugetlb one need to kernel
      command line hugepagesz=16G hugepages=<number of 16G pages>
      
      With radix translation mode we don't run into this issue.
      
      This change will prevent runtime allocation of 16G hugetlb pages on
      kvm with hash translation mode. However, with the current upstream it
      was observed that 16G hugetlbfs backed guest doesn't boot at all.
      
      We observe boot failure with the below message:
        [131354.647546] KVM: map_vrma at 0 failed, ret=-4
      
      That means this patch is not resulting in an observable regression.
      Once we fix the boot issue with 16G hugetlb backed memory, we need to
      use ibm,expected#pages memory node attribute to indicate 16G page
      reservation to the guest. This will also enable partial backing of
      guest RAM with 16G pages.
      
      Fixes: 4ae279c2 ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      35f2806b
  2. 03 Mar, 2019 1 commit
    • Christophe Leroy's avatar
      powerpc/32: Clear on-stack exception marker upon exception return · 9580b71b
      Christophe Leroy authored
      Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order
      to avoid confusing stacktrace like the one below.
      
        Call Trace:
        [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
        [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
        [c0e9dd10] [c0895130] memchr+0x24/0x74
        [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
        [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
        [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
        --- interrupt: c0e9df00 at 0x400f330
            LR = init_stack+0x1f00/0x2000
        [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
        [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
        [c0e9df50] [c0c15434] start_kernel+0x310/0x488
        [c0e9dff0] [00003484] 0x3484
      
      With this patch the trace becomes:
      
        Call Trace:
        [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable)
        [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180
        [c0e9dd10] [c0895150] memchr+0x24/0x74
        [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574
        [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8
        [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4
        [c0e9de80] [c00ae3e4] printk+0xa8/0xcc
        [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
        [c0e9df50] [c0c15434] start_kernel+0x310/0x488
        [c0e9dff0] [00003484] 0x3484
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9580b71b
  3. 02 Mar, 2019 6 commits
  4. 01 Mar, 2019 1 commit
  5. 28 Feb, 2019 1 commit
    • Alexey Kardashevskiy's avatar
      powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables · 11f5acce
      Alexey Kardashevskiy authored
      We store 2 multilevel tables in iommu_table - one for the hardware and
      one with the corresponding userspace addresses. Before allocating
      the tables, the iommu_table_group_ops::get_table_size() hook returns
      the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
      the locked_vm counter correctly. When the table is actually allocated,
      the amount of allocated memory is stored in iommu_table::it_allocated_size
      and used to decrement the locked_vm counter when we release the memory
      used by the table; .get_table_size() and .create_table() calculate it
      independently but the result is expected to be the same.
      
      However the allocator does not add the userspace table size to
      .it_allocated_size so when we destroy the table because of VFIO PCI
      unplug (i.e. VFIO container is gone but the userspace keeps running),
      we decrement locked_vm by just a half of size of memory we are
      releasing.
      
      To make things worse, since we enabled on-demand allocation of
      indirect levels, it_allocated_size contains only the amount of memory
      actually allocated at the table creation time which can just be a
      fraction. It is not a problem with incrementing locked_vm (as
      get_table_size() value is used) but it is with decrementing.
      
      As the result, we leak locked_vm and may not be able to allocate more
      IOMMU tables after few iterations of hotplug/unplug.
      
      This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
      hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
      have a single place which calculates the maximum memory a table can
      occupy. The original meaning of it_allocated_size is somewhat lost now
      though.
      
      We do not ditch it_allocated_size whatsoever here and we do not call
      get_table_size() from vfio_iommu_spapr_tce.c when decrementing
      locked_vm as we may have multiple IOMMU groups per container and even
      though they all are supposed to have the same get_table_size()
      implementation, there is a small chance for failure or confusion.
      
      Fixes: 090bad39 ("powerpc/powernv: Add indirect levels to it_userspace")
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      11f5acce
  6. 27 Feb, 2019 2 commits
  7. 26 Feb, 2019 9 commits
  8. 25 Feb, 2019 6 commits
  9. 23 Feb, 2019 13 commits