1. 28 May, 2020 31 commits
  2. 26 May, 2020 9 commits
    • Cédric Le Goater's avatar
      powerpc/xive: Clear the page tables for the ESB IO mapping · a101950f
      Cédric Le Goater authored
      Commit 1ca3dec2 ("powerpc/xive: Prevent page fault issues in the
      machine crash handler") fixed an issue in the FW assisted dump of
      machines using hash MMU and the XIVE interrupt mode under the POWER
      hypervisor. It forced the mapping of the ESB page of interrupts being
      mapped in the Linux IRQ number space to make sure the 'crash kexec'
      sequence worked during such an event. But it didn't handle the
      un-mapping.
      
      This mapping is now blocking the removal of a passthrough IO adapter
      under the POWER hypervisor because it expects the guest OS to have
      cleared all page table entries related to the adapter. If some are
      still present, the RTAS call which isolates the PCI slot returns error
      9001 "valid outstanding translations".
      
      Remove these mapping in the IRQ data cleanup routine.
      
      Under KVM, this cleanup is not required because the ESB pages for the
      adapter interrupts are un-mapped from the guest by the hypervisor in
      the KVM XIVE native device. This is now redundant but it's harmless.
      
      Fixes: 1ca3dec2 ("powerpc/xive: Prevent page fault issues in the machine crash handler")
      Cc: stable@vger.kernel.org # v5.5+
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org
      a101950f
    • Michael Ellerman's avatar
      powerpc: Add ppc_inst_as_u64() · 16ef9767
      Michael Ellerman authored
      The code patching code wants to get the value of a struct ppc_inst as
      a u64 when the instruction is prefixed, so we can pass the u64 down to
      __put_user_asm() and write it with a single store.
      
      The optprobes code wants to load a struct ppc_inst as an immediate
      into a register so it is useful to have it as a u64 to use the
      existing helper function.
      
      Currently this is a bit awkward because the value differs based on the
      CPU endianness, so add a helper to do the conversion.
      
      This fixes the usage in arch_prepare_optimized_kprobe() which was
      previously incorrect on big endian.
      
      Fixes: 650b55b7 ("powerpc: Add prefixed instructions to instruction data type")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarJordan Niethe <jniethe5@gmail.com>
      Link: https://lore.kernel.org/r/20200526072630.2487363-1-mpe@ellerman.id.au
      16ef9767
    • Michael Ellerman's avatar
      powerpc: Add ppc_inst_next() · c5ff46d6
      Michael Ellerman authored
      In a few places we want to calculate the address of the next
      instruction. Previously that was simple, we just added 4 bytes, or if
      using a u32 * we incremented that pointer by 1.
      
      But prefixed instructions make it more complicated, we need to advance
      by either 4 or 8 bytes depending on the actual instruction. We also
      can't do pointer arithmetic using struct ppc_inst, because it is
      always 8 bytes in size on 64-bit, even though we might only need to
      advance by 4 bytes.
      
      So add a ppc_inst_next() helper which calculates the location of the
      next instruction, if the given instruction was located at the given
      address. Note the instruction doesn't need to actually be at the
      address in memory.
      
      Although it would seem natural for the value to be passed by value,
      that makes it too easy to write a loop that will read off the end of a
      page, eg:
      
      	for (; src < end; src = ppc_inst_next(src, *src),
      			  dest = ppc_inst_next(dest, *dest))
      
      As noticed by Christophe and Jordan, if end is the exact end of a
      page, and the next page is not mapped, this will fault, because *dest
      will read 8 bytes, 4 bytes into the next page.
      
      So value is passed by reference, so the helper can be careful to use
      ppc_inst_read() on it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarJordan Niethe <jniethe5@gmail.com>
      Link: https://lore.kernel.org/r/20200522133318.1681406-1-mpe@ellerman.id.au
      c5ff46d6
    • Michael Ellerman's avatar
      Merge branch 'fixes' into next · baddc87d
      Michael Ellerman authored
      Merge our fixes branch from this cycle. It contains several important
      fixes we need in next for testing purposes, and also some that will
      conflict with upcoming changes.
      baddc87d
    • Michael Ellerman's avatar
      Merge "Use hugepages to map kernel mem on 8xx" into next · bb5f33c0
      Michael Ellerman authored
      Merge Christophe's large series to use huge pages for the linear
      mapping on 8xx.
      
      From his cover letter:
      
      The main purpose of this big series is to:
      - reorganise huge page handling to avoid using mm_slices.
      - use huge pages to map kernel memory on the 8xx.
      
      The 8xx supports 4 page sizes: 4k, 16k, 512k and 8M.
      It uses 2 Level page tables, PGD having 1024 entries, each entry
      covering 4M address space. Then each page table has 1024 entries.
      
      At the time being, page sizes are managed in PGD entries, implying
      the use of mm_slices as it can't mix several pages of the same size
      in one page table.
      
      The first purpose of this series is to reorganise things so that
      standard page tables can also handle 512k pages. This is done by
      adding a new _PAGE_HUGE flag which will be copied into the Level 1
      entry in the TLB miss handler. That done, we have 2 types of pages:
      - PGD entries to regular page tables handling 4k/16k and 512k pages
      - PGD entries to hugepd tables handling 8M pages.
      
      There is no need to mix 8M pages with other sizes, because a 8M page
      will use more than what a single PGD covers.
      
      Then comes the second purpose of this series. At the time being, the
      8xx has implemented special handling in the TLB miss handlers in order
      to transparently map kernel linear address space and the IMMR using
      huge pages by building the TLB entries in assembly at the time of the
      exception.
      
      As mm_slices is only for user space pages, and also because it would
      anyway not be convenient to slice kernel address space, it was not
      possible to use huge pages for kernel address space. But after step
      one of the series, it is now more flexible to use huge pages.
      
      This series drop all assembly 'just in time' handling of huge pages
      and use huge pages in page tables instead.
      
      Once the above is done, then comes icing on the cake:
      - Use huge pages for KASAN shadow mapping
      - Allow pinned TLBs with strict kernel rwx
      - Allow pinned TLBs with debug pagealloc
      
      Then, last but not least, those modifications for the 8xx allows the
      following improvement on book3s/32:
      - Mapping KASAN shadow with BATs
      - Allowing BATs with debug pagealloc
      
      All this allows to considerably simplify TLB miss handlers and associated
      initialisation. The overhead of reading page tables is negligible
      compared to the reduction of the miss handlers.
      
      While we were at touching pte_update(), some cleanup was done
      there too.
      
      Tested widely on 8xx and 832x. Boot tested on QEMU MAC99.
      bb5f33c0
    • Christophe Leroy's avatar
      powerpc/32s: Implement dedicated kasan_init_region() · 7974c473
      Christophe Leroy authored
      Implement a kasan_init_region() dedicated to book3s/32 that
      allocates KASAN regions using BATs.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/709e821602b48a1d7c211a9b156da26db98c3e9d.1589866984.git.christophe.leroy@csgroup.eu
      7974c473
    • Christophe Leroy's avatar
      powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOC · 2b279c03
      Christophe Leroy authored
      DEBUG_PAGEALLOC only manages RW data.
      
      Text and RO data can still be mapped with BATs.
      
      In order to map with BATs, also enforce data alignment. Set
      by default to 256M which is a good compromise for keeping
      enough BATs for also KASAN and IMMR.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/fd29c1718ee44d82115d0e835ced808eb4ccbf51.1589866984.git.christophe.leroy@csgroup.eu
      2b279c03
    • Christophe Leroy's avatar
      powerpc/8xx: Implement dedicated kasan_init_region() · a2feeb2c
      Christophe Leroy authored
      Implement a kasan_init_region() dedicated to 8xx that
      allocates KASAN regions using huge pages.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/d2d60202a8821dc81cffe6ff59cc13c15b7e4bb6.1589866984.git.christophe.leroy@csgroup.eu
      a2feeb2c
    • Christophe Leroy's avatar
      powerpc/8xx: Allow large TLBs with DEBUG_PAGEALLOC · fcdafd10
      Christophe Leroy authored
      DEBUG_PAGEALLOC only manages RW data.
      
      Text and RO data can still be mapped with hugepages and pinned TLB.
      
      In order to map with hugepages, also enforce a 512kB data alignment
      minimum. That's a trade-off between size of speed, taking into
      account that DEBUG_PAGEALLOC is a debug option. Anyway the alignment
      is still tunable.
      
      We also allow tuning of alignment for book3s to limit the complexity
      of the test in Kconfig that will anyway disappear in the following
      patches once DEBUG_PAGEALLOC is handled together with BATs.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/c13256f2d356a316715da61fe089b3623ef217a5.1589866984.git.christophe.leroy@csgroup.eu
      fcdafd10