1. 27 Jun, 2017 5 commits
    • Russell Currey's avatar
      powerpc/powernv/pci: Dynamically allocate PHB diag data · 5cb1f8fd
      Russell Currey authored
      Diagnostic data for PHBs currently works by allocated a fixed-sized buffer.
      This is simple, but either wastes memory (though only a few kilobytes) or
      in the case of PHB4 isn't enough to fit the whole data blob.
      
      For machines that don't describe the diagnostic data size in the device
      tree, use the hardcoded buffer size as before.  For those that do, only
      allocate exactly what's needed.
      
      In the special case of P7IOC (which has two types of diag data), the larger
      should be specified in the device tree.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5cb1f8fd
    • Russell Currey's avatar
      powerpc/powernv/pci: Reduce spam when dumping PEST · 31bbd45a
      Russell Currey authored
      Dumping the PE State Tables (PEST) can be highly verbose if a number of PEs
      are affected, especially in the case where the whole PHB is frozen and 512
      lines get printed.  Check for duplicates when dumping the PEST to reduce
      useless output.
      
      For example:
      
          PE[0f8] A/B: 9700002600000000 80000080d00000f8
          PE[0f9] A/B: 8000000000000000 0000000000000000
          PE[..0fe] A/B: as above
          PE[0ff] A/B: 8440002b00000000 0000000000000000
      
      instead of:
      
          PE[0f8] A/B: 9700002600000000 80000080d00000f8
          PE[0f9] A/B: 8000000000000000 0000000000000000
          PE[0fa] A/B: 8000000000000000 0000000000000000
          PE[0fb] A/B: 8000000000000000 0000000000000000
          PE[0fc] A/B: 8000000000000000 0000000000000000
          PE[0fd] A/B: 8000000000000000 0000000000000000
          PE[0fe] A/B: 8000000000000000 0000000000000000
          PE[0ff] A/B: 8440002b00000000 0000000000000000
      
      and you can imagine how much worse it can get for 512 PEs.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      31bbd45a
    • Michael Neuling's avatar
      powerpc/tm: Fix comment · 2bafb7ff
      Michael Neuling authored
      Update to real function name.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2bafb7ff
    • Michael Neuling's avatar
      powerpc: Fix asm offsets to point to actual FP and VMX regs · aa9a9516
      Michael Neuling authored
      The asm code assumes the FP regs are at the start of fp_state. While
      this is true now, it may not always be the case and there is nothing
      enforcing it.
      
      This fixes the asm-offsets to point to the actual FP registers inside
      the fp_state.  Similarly for VMX.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      aa9a9516
    • Michael Neuling's avatar
      powerpc: Fix /proc/cpuinfo revision for POWER9 DD2 · 64ebb9a2
      Michael Neuling authored
      The P9 PVR bits 12-15 don't indicate a revision but instead different
      chip configurations.  From BookIV we have:
         Bits      Configuration
          0 :    Scale out 12 cores
          1 :    Scale out 24 cores
          2 :    Scale up  12 cores
          3 :    Scale up  24 cores
      
      DD1 doesn't use this but DD2 does. Linux will mostly use the "Scale
      out 24 core" configuration (ie. SMT4 not SMT8) which results in a PVR
      of 0x004e1200. The reported revision in /proc/cpuinfo is hence
      reported incorrectly as "18.0".
      
      This patch fixes this to mask off only the relevant bits for the major
      revision (ie. bits 8-11) for POWER9.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      64ebb9a2
  2. 23 Jun, 2017 1 commit
    • Balbir Singh's avatar
      powerpc/mm: Trace tlbie(l) instructions · 0428491c
      Balbir Singh authored
      Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
      Entry (Local)) instructions.
      
      The tlbie instruction has changed over the years, so not all versions
      accept the same operands. Use the ISA v3 field operands because they are
      the most verbose, we may change them in future.
      
      Example output:
      
        qemu-system-ppc-5371  [016]  1412.369519: tlbie:
        	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      [mpe: Add some missing trace_tlbie()s, reword change log]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0428491c
  3. 22 Jun, 2017 1 commit
    • Paul Mackerras's avatar
      powerpc: Convert VDSO update function to use new update_vsyscall interface · d4cfb113
      Paul Mackerras authored
      This converts the powerpc VDSO time update function to use the new
      interface introduced in commit 576094b7 ("time: Introduce new
      GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
      us the time as of the last update in seconds and whole nanoseconds,
      with the new interface we get the nanoseconds part effectively in
      a binary fixed-point format with tk->tkr_mono.shift bits to the
      right of the binary point.
      
      With the old interface, the fractional nanoseconds got truncated,
      meaning that the value returned by the VDSO clock_gettime function
      would have about 1ns of jitter in it compared to the value computed
      by the generic timekeeping code in the kernel.
      
      The powerpc VDSO time functions (clock_gettime and gettimeofday)
      already work in units of 2^-32 seconds, or 0.23283 ns, because that
      makes it simple to split the result into seconds and fractional
      seconds, and represent the fractional seconds in either microseconds
      or nanoseconds.  This is good enough accuracy for now, so this patch
      avoids changing how the VDSO works or the interface in the VDSO data
      page.
      
      This patch converts the powerpc update_vsyscall_old to be called
      update_vsyscall and use the new interface.  We convert the fractional
      second to units of 2^-32 seconds without truncating to whole nanoseconds.
      (There is still a conversion to whole nanoseconds for any legacy users
      of the vdso_data/systemcfg stamp_xtime field.)
      
      In addition, this improves the accuracy of the computation of tb_to_xs
      for those systems with high-frequency timebase clocks (>= 268.5 MHz)
      by doing the right shift in two parts, one before the multiplication and
      one after, rather than doing the right shift before the multiplication.
      (We can't do all of the right shift after the multiplication unless we
      use 128-bit arithmetic.)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d4cfb113
  4. 21 Jun, 2017 4 commits
  5. 20 Jun, 2017 9 commits
  6. 19 Jun, 2017 9 commits
  7. 15 Jun, 2017 9 commits
    • Murilo Opsfelder Araujo's avatar
      drivers/watchdog/Kconfig: Update CONFIG_WATCHDOG_RTAS dependencies · 42bed042
      Murilo Opsfelder Araujo authored
      drivers/watchdog/wdrtas.c uses symbols defined in arch/powerpc/kernel/rtas.c,
      which are exported iff CONFIG_PPC_RTAS is selected. Building wdrtas.c without
      setting CONFIG_PPC_RTAS throws the following errors:
      
          ERROR: ".rtas_token" [drivers/watchdog/wdrtas.ko] undefined!
          ERROR: "rtas_data_buf" [drivers/watchdog/wdrtas.ko] undefined!
          ERROR: "rtas_data_buf_lock" [drivers/watchdog/wdrtas.ko] undefined!
          ERROR: ".rtas_get_sensor" [drivers/watchdog/wdrtas.ko] undefined!
          ERROR: ".rtas_call" [drivers/watchdog/wdrtas.ko] undefined!
      
      This was identified during a randconfig build where CONFIG_WATCHDOG_RTAS=m and
      CONFIG_PPC_RTAS was not set. Logs are here:
      
          http://kisskb.ellerman.id.au/kisskb/buildresult/12982152/
      
      This patch fixes the issue by updating CONFIG_WATCHDOG_RTAS to depend on just
      CONFIG_PPC_RTAS, removing COMPILE_TEST entirely.
      Signed-off-by: default avatarMurilo Opsfelder Araujo <mopsfelder@gmail.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      42bed042
    • Nicholas Piggin's avatar
      powerpc/64s: Avoid cpabort in context switch when possible · 07d2a628
      Nicholas Piggin authored
      The ISA v3.0B copy-paste facility only requires cpabort when switching
      to a process that has foreign real addresses mapped (direct access to
      accelerators), to clear a potential copy buffer filled by a previous
      thread. There is no accelerator driver implemented yet, so cpabort can
      be removed. It can be be re-added when a driver is implemented.
      
      POWER9 DD1 requires the copy buffer to always be cleared on context
      switch, but if accelerators are not in use, then an unpaired copy from
      a dummy region is sufficient to clear data out of the copy buffer.
      
      This increases context switch performance by about 5% on POWER9.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      07d2a628
    • Nicholas Piggin's avatar
      powerpc/64: Drop explicit hwsync in context switch · 9145effd
      Nicholas Piggin authored
      The sync (aka. hwsync, aka. heavyweight sync) in the context switch
      code to prevent MMIO access being reordered from the point of view of
      a single process if it gets migrated to a different CPU is not
      required because there is an hwsync performed earlier in the context
      switch path.
      
      Comment this so it's clear enough if anything changes on the scheduler
      or the powerpc sides. Remove the hwsync from _switch.
      
      This improves context switch performance by 2-3% on POWER8.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9145effd
    • Nicholas Piggin's avatar
      powerpc/64: Drop reservation-clearing ldarx in context switch · 837e72f7
      Nicholas Piggin authored
      There is no need to explicitly break the reservation in _switch,
      because we are guaranteed that the context switch path will include a
      larx/stcx.
      
      Comment the guarantee and remove the reservation clear from _switch.
      
      This is worth 1-2% in context switch performance.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      837e72f7
    • Nicholas Piggin's avatar
      powerpc/64s: Leave interrupts hard enabled in context switch for radix · e4c0fc5f
      Nicholas Piggin authored
      Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
      hard disabled interrupts over the low level context switch, because
      the SLB management can't cope with a PMU interrupt accesing the stack
      in that window.
      
      Radix based kernel mapping does not use the SLB so it does not require
      interrupts hard disabled here.
      
      This is worth 1-2% in context switch performance on POWER9.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e4c0fc5f
    • Nicholas Piggin's avatar
      powerpc/64: Avoid restore_math call if possible in syscall exit · bc4f65e4
      Nicholas Piggin authored
      The syscall exit code that branches to restore_math is quite heavy on
      Book3S, consisting of 2 mtmsr instructions. Threads that don't use both
      FP and vector can get caught here if the kernel ever uses FP or vector.
      Lazy-FP/vec context switching also trips this case.
      
      So check for lazy FP and vector before switching RI for restore_math.
      Move most of this case out of line.
      
      For threads that do want to restore math registers, the MSR switches are
      still suboptimal. Future direction may be to use a soft-RI bit to avoid
      MSR switches in kernel (similar to soft-EE), but for now at least the
      no-restore
      
      POWER9 context switch rate increases by about 5% due to sched_yield(2)
      return performance. I haven't constructed a test to measure the syscall
      cost.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bc4f65e4
    • Nicholas Piggin's avatar
      powerpc/64s: Optimize hypercall/syscall entry · acd7d8ce
      Nicholas Piggin authored
      After bc355125 ("powerpc/64: Allow for relocation-on interrupts from
      guest to host"), a getppid() system call goes from 307 cycles to 358
      cycles (+17%) on POWER8. This is due significantly to the scratch SPR
      used by the hypercall check.
      
      It turns out there are a some volatile registers common to both system
      call and hypercall (in particular, r12, cr0, ctr), which can be used to
      avoid the SPR and some other overheads. This brings getppid to 320 cycles
      (+4%).
      
      Testing hcall entry performance by running "sc 1" in guest userspace
      before this patch is 854 cycles, afterwards is 826. Also a small win
      there.
      
      POWER9 syscall is improved by about the same amount, hcall not tested.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      acd7d8ce
    • Michael Ellerman's avatar
      powerpc/mm/radix: Only add X for pages overlapping kernel text · 9abcc981
      Michael Ellerman authored
      Currently we map the whole linear mapping with PAGE_KERNEL_X. Instead we
      should check if the page overlaps the kernel text and only then add
      PAGE_KERNEL_X.
      
      Note that we still use 1G pages if they're available, so this will
      typically still result in a 1G executable page at KERNELBASE. So this fix is
      primarily useful for catching stray branches to high linear mapping addresses.
      
      Without this patch, we can execute at 1G in xmon using:
      
        0:mon> m c000000040000000
        c000000040000000  00 l
        c000000040000000  00000000 01006038
        c000000040000004  00000000 2000804e
        c000000040000008  00000000 x
        0:mon> di c000000040000000
        c000000040000000  38600001      li      r3,1
        c000000040000004  4e800020      blr
        0:mon> p c000000040000000
        return value is 0x1
      
      After we get a 400 as expected:
      
        0:mon> p c000000040000000
        *** 400 exception occurred
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      9abcc981
    • Michael Ellerman's avatar
      Revert "powerpc: Handle simultaneous interrupts at once" · 0edc2ca9
      Michael Ellerman authored
      This reverts commit 45cb08f4.
      
      For some reason this is causing IRQ problems on Freescale Book3E
      machines, eg on my p5020ds:
      
        irq 25: nobody cared (try booting with the "irqpoll" option)
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc3-gcc-6.3.1-00037-g45cb08f4 #624
        Call Trace:
        [c0000000fffdbb10] [c00000000049962c] .dump_stack+0xa8/0xe8 (unreliable)
        [c0000000fffdbba0] [c0000000000babf4] .__report_bad_irq+0x54/0x140
        [c0000000fffdbc40] [c0000000000bb11c] .note_interrupt+0x324/0x380
        [c0000000fffdbd00] [c0000000000b7110] .handle_irq_event_percpu+0x68/0x88
        [c0000000fffdbd90] [c0000000000b718c] .handle_irq_event+0x5c/0xa8
        [c0000000fffdbe10] [c0000000000bc01c] .handle_fasteoi_irq+0xe4/0x298
        [c0000000fffdbe90] [c0000000000b59c4] .generic_handle_irq+0x50/0x74
        [c0000000fffdbf10] [c0000000000075d8] .__do_irq+0x74/0x1f0
        [c0000000fffdbf90] [c0000000000189f8] .call_do_irq+0x14/0x24
        [c0000000f7173060] [c0000000000077e4] .do_IRQ+0x90/0x120
        [c0000000f7173100] [c00000000001d93c] exc_0x500_common+0xfc/0x100
        --- interrupt: 501 at .prepare_to_wait_event+0xc/0x14c
            LR = .fsl_elbc_run_command+0xc8/0x23c
        [c0000000f71734d0] [c00000000065f418] .nand_reset+0xb8/0x168
        [c0000000f7173560] [c00000000065fec4] .nand_scan_ident+0x2b0/0x1638
        [c0000000f7173650] [c000000000666cd8] .fsl_elbc_nand_probe+0x34c/0x5f0
        ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
        [c0000000f7173750] [c0000000005a3c60] .platform_drv_probe+0x64/0xb0
        [c0000000f71737d0] [c0000000005a12e0] .really_probe+0x290/0x334
        [c0000000f7173870] [c0000000005a14a0] .__driver_attach+0x11c/0x120
        [c0000000f7173900] [c00000000059e6a0] .bus_for_each_dev+0x98/0xfc
        [c0000000f71739a0] [c0000000005a0b3c] .driver_attach+0x34/0x4c
        [c0000000f7173a20] [c0000000005a04b0] .bus_add_driver+0x1ac/0x2e0
        [c0000000f7173ac0] [c0000000005a2170] .driver_register+0x94/0x160
        [c0000000f7173b40] [c0000000005a3be0] .__platform_driver_register+0x60/0x7c
        [c0000000f7173bc0] [c000000000d6aab4] .fsl_elbc_nand_driver_init+0x24/0x38
        [c0000000f7173c30] [c000000000001934] .do_one_initcall+0x68/0x1b8
        [c0000000f7173d00] [c000000000d210f8] .kernel_init_freeable+0x260/0x338
        [c0000000f7173db0] [c0000000000021b0] .kernel_init+0x20/0xe70
        [c0000000f7173e30] [c0000000000009bc] .ret_from_kernel_thread+0x58/0x9c
        handlers:
        [<c000000000ed85c8>] .fsl_lbc_ctrl_irq
        Disabling IRQ #25
      
      Ben also had concerns with the implementation being potentially slow on
      some PICs, so revert it for now.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0edc2ca9
  8. 06 Jun, 2017 2 commits