1. 13 Nov, 2017 5 commits
  2. 12 Nov, 2017 13 commits
  3. 11 Nov, 2017 18 commits
  4. 10 Nov, 2017 4 commits
    • Nicholas Piggin's avatar
      powerpc/64: Set DSCR default initially from SPR · 1696d0fb
      Nicholas Piggin authored
      Take the DSCR value set by firmware as the dscr_default value,
      rather than zero.
      
      POWER9 recommends DSCR default to a non-zero value.
      Signed-off-by: default avatarFrom: Nicholas Piggin <npiggin@gmail.com>
      [mpe: Make record_spr_defaults() __init]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1696d0fb
    • Nicholas Piggin's avatar
      powerpc/powernv: Avoid waiting for secondary hold spinloop with OPAL · 339a3293
      Nicholas Piggin authored
      OPAL boot does not insert secondaries at 0x60 to wait at the secondary
      hold spinloop. Instead they are started later, and inserted at
      generic_secondary_smp_init(), which is after the secondary hold
      spinloop.
      
      Avoid waiting on this spinloop when booting with OPAL firmware. This
      wait always times out that case.
      
      This saves 100ms boot time on powernv, and 10s of seconds of real time
      when booting on the simulator in SMP.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      339a3293
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Improve TLB flushing for page table freeing · 0b2f5a8a
      Nicholas Piggin authored
      Unmaps that free page tables always flush the entire PID, which is
      sub-optimal. Provide TLB range flushing with an additional PWC flush
      that can be use for va range invalidations with PWC flush.
      
           Time to munmap N pages of memory including last level page table
           teardown (after mmap, touch), local invalidate:
           N           1       2      4      8     16     32     64
           vanilla  3.2us  3.3us  3.4us  3.6us  4.1us  5.2us  7.2us
           patched  1.4us  1.5us  1.7us  1.9us  2.6us  3.7us  6.2us
      
           Global invalidate:
           N           1       2      4      8     16      32     64
           vanilla  2.2us  2.3us  2.4us  2.6us  3.2us   4.1us  6.2us
           patched  2.1us  2.5us  3.4us  5.2us  8.7us  15.7us  6.2us
      
      Local invalidates get much better across the board. Global ones have
      the same issue where multiple tlbies for va flush do get slower than
      the single tlbie to invalidate the PID. None of this test captures
      the TLB benefits of avoiding killing everything.
      
      Global gets worse, but it is brought in to line with global invalidate
      for munmap()s that do not free page tables.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0b2f5a8a
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Introduce local single page ceiling for TLB range flush · f6f27951
      Nicholas Piggin authored
      The single page flush ceiling is the cut-off point at which we switch
      from invalidating individual pages, to invalidating the entire process
      address space in response to a range flush.
      
      Introduce a local variant of this heuristic because local and global
      tlbie have significantly different properties:
      - Local tlbiel requires 128 instructions to invalidate a PID, global
        tlbie only 1 instruction.
      - Global tlbie instructions are expensive broadcast operations.
      
      The local ceiling has been made much higher, 2x the number of
      instructions required to invalidate the entire PID (i.e., 256 pages).
      
           Time to mprotect N pages of memory (after mmap, touch), local invalidate:
           N           32     34      64     128     256     512
           vanilla  7.4us  9.0us  14.6us  26.4us  50.2us  98.3us
           patched  7.4us  7.8us  13.8us  26.4us  51.9us  98.3us
      
      The behaviour of both is identical at N=32 and N=512. Between there,
      the vanilla kernel does a PID invalidate and the patched kernel does
      a va range invalidate.
      
      At N=128, these require the same number of tlbiel instructions, so
      the patched version can be sen to be cheaper when < 128, and more
      expensive when > 128. However this does not well capture the cost
      of invalidated TLB.
      
      The additional cost at 256 pages does not seem prohibitive. It may
      be the case that increasing the limit further would continue to be
      beneficial to avoid invalidating all of the process's TLB entries.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f6f27951