1. 30 Apr, 2019 4 commits
    • Michael Ellerman's avatar
      Merge branch 'topic/ppc-kvm' into next · bdc7c970
      Michael Ellerman authored
      Merge our topic branch shared with KVM. In particular this includes the
      rewrite of the idle code into C.
      bdc7c970
    • Michael Ellerman's avatar
      powerpc/powernv/idle: Restore AMR/UAMOR/AMOR/IAMR after idle · e9cef018
      Michael Ellerman authored
      This is an implementation of commits 53a712ba
      ("powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle") and
      a3f3072d ("powerpc/powernv/idle: Restore IAMR after idle") using
      the new C-based idle code.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Extract from Nick's patch]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e9cef018
    • Nicholas Piggin's avatar
      powerpc/64s: Reimplement book3s idle code in C · 10d91611
      Nicholas Piggin authored
      Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
      speific HV idle code to the powernv platform code.
      
      Book3S assembly stubs are kept in common code and used only to save
      the stack frame and non-volatile GPRs before executing architected
      idle instructions, and restoring the stack and reloading GPRs then
      returning to C after waking from idle.
      
      The complex logic dealing with threads and subcores, locking, SPRs,
      HMIs, timebase resync, etc., is all done in C which makes it more
      maintainable.
      
      This is not a strict translation to C code, there are some
      significant differences:
      
      - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
        but saves and restores them itself.
      
      - The optimisation where EC=ESL=0 idle modes did not have to save GPRs
        or change MSR is restored, because it's now simple to do. ESL=1
        sleeps that do not lose GPRs can use this optimization too.
      
      - KVM secondary entry and cede is now more of a call/return style
        rather than branchy. nap_state_lost is not required because KVM
        always returns via NVGPR restoring path.
      
      - KVM secondary wakeup from offline sequence is moved entirely into
        the offline wakeup, which avoids a hwsync in the normal idle wakeup
        path.
      
      Performance measured with context switch ping-pong on different
      threads or cores, is possibly improved a small amount, 1-3% depending
      on stop state and core vs thread test for shallow states. Deep states
      it's in the noise compared with other latencies.
      
      KVM improvements:
      
      - Idle sleepers now always return to caller rather than branch out
        to KVM first.
      
      - This allows optimisations like very fast return to caller when no
        state has been lost.
      
      - KVM no longer requires nap_state_lost because it controls NVGPR
        save/restore itself on the way in and out.
      
      - The heavy idle wakeup KVM request check can be moved out of the
        normal host idle code and into the not-performance-critical offline
        code.
      
      - KVM nap code now returns from where it is called, which makes the
        flow a bit easier to follow.
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Squash the KVM changes in]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      10d91611
    • Nicholas Piggin's avatar
      powerpc/watchdog: Use hrtimers for per-CPU heartbeat · 7ae3f6e1
      Nicholas Piggin authored
      Using a jiffies timer creates a dependency on the tick_do_timer_cpu
      incrementing jiffies. If that CPU has locked up and jiffies is not
      incrementing, the watchdog heartbeat timer for all CPUs stops and
      creates false positives and confusing warnings on local CPUs, and
      also causes the SMP detector to stop, so the root cause is never
      detected.
      
      Fix this by using hrtimer based timers for the watchdog heartbeat,
      like the generic kernel hardlockup detector.
      
      Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reported-by: default avatarRavikumar Bangoria <ravi.bangoria@in.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Reported-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7ae3f6e1
  2. 29 Apr, 2019 1 commit
    • Nathan Fontenot's avatar
      powerpc/pseries: Track LMB nid instead of using device tree · b2d3b5ee
      Nathan Fontenot authored
      When removing memory we need to remove the memory from the node
      it was added to instead of looking up the node it should be in
      in the device tree.
      
      During testing we have seen scenarios where the affinity for a
      LMB changes due to a partition migration or PRRN event. In these
      cases the node the LMB exists in may not match the node the device
      tree indicates it belongs in. This can lead to a system crash
      when trying to DLPAR remove the LMB after a migration or PRRN
      event. The current code looks up the node in the device tree to
      remove the LMB from, the crash occurs when we try to offline this
      node and it does not have any data, i.e. node_data[nid] == NULL.
      
      36:mon> e
      cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
          pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
          lr: c0000000003a14ec: remove_memory+0xbc/0x110
          sp: c0000001828b7a90
         msr: 800000000280b033
         dar: 9a28
       dsisr: 40000000
        current = 0xc0000006329c4c80
        paca    = 0xc000000007a55200   softe: 0        irq_happened: 0x01
          pid   = 76926, comm = kworker/u320:3
      
      36:mon> t
      [link register   ] c0000000003a14ec remove_memory+0xbc/0x110
      [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
      [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
      [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
      [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
      [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
      [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
      [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
      [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
      [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
      [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
      
      To resolve this we need to track the node a LMB belongs to when
      it is added to the system so we can remove it from that node instead
      of the node that the device tree indicates it should belong to.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b2d3b5ee
  3. 28 Apr, 2019 1 commit
  4. 21 Apr, 2019 32 commits
  5. 20 Apr, 2019 2 commits
    • Michael Neuling's avatar
      powerpc: Add force enable of DAWR on P9 option · c1fe190c
      Michael Neuling authored
      This adds a flag so that the DAWR can be enabled on P9 via:
        echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous
      
      The DAWR was previously force disabled on POWER9 in:
        96541531 powerpc: Disable DAWR in the base POWER9 CPU features
      Also see Documentation/powerpc/DAWR-POWER9.txt
      
      This is a dangerous setting, USE AT YOUR OWN RISK.
      
      Some users may not care about a bad user crashing their box
      (ie. single user/desktop systems) and really want the DAWR.  This
      allows them to force enable DAWR.
      
      This flag can also be used to disable DAWR access. Once this is
      cleared, all DAWR access should be cleared immediately and your
      machine once again safe from crashing.
      
      Userspace may get confused by toggling this. If DAWR is force
      enabled/disabled between getting the number of breakpoints (via
      PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
      inconsistent view of what's available. Similarly for guests.
      
      For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
      enabled in the host AND the guest. For this reason, this won't work on
      POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
      dawr_enable_dangerous file will fail if the hypervisor doesn't support
      writing the DAWR.
      
      To double check the DAWR is working, run this kernel selftest:
        tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
      Any errors/failures/skips mean something is wrong.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c1fe190c
    • Nathan Lynch's avatar
      powerpc/numa: document topology_updates_enabled, disable by default · 558f8649
      Nathan Lynch authored
      Changing the NUMA associations for CPUs and memory at runtime is
      basically unsupported by the core mm, scheduler etc. We see all manner
      of crashes, warnings and instability when the pseries code tries to do
      this. Disable this behavior by default, and document the switch a bit.
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      558f8649