1. 27 Mar, 2018 4 commits
    • Madhavan Srinivasan's avatar
      powerpc/perf: Prevent kernel address leak via perf_get_data_addr() · cd1231d7
      Madhavan Srinivasan authored
      Sampled Data Address Register (SDAR) is a 64-bit register that
      contains the effective address of the storage operand of an
      instruction that was being executed, possibly out-of-order, at or
      around the time that the Performance Monitor alert occurred.
      
      In certain scenario SDAR happen to contain the kernel address even for
      userspace only sampling. Add checks to prevent it.
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cd1231d7
    • Madhavan Srinivasan's avatar
      powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer · bb19af81
      Madhavan Srinivasan authored
      The current Branch History Rolling Buffer (BHRB) code does not check
      for any privilege levels before updating the data from BHRB. This
      could leak kernel addresses to userspace even when profiling only with
      userspace privileges. Add proper checks to prevent it.
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bb19af81
    • Michael Ellerman's avatar
      powerpc/perf: Fix kernel address leak via sampling registers · e1ebd0e5
      Michael Ellerman authored
      Current code in power_pmu_disable() does not clear the sampling
      registers like Sampling Instruction Address Register (SIAR) and
      Sampling Data Address Register (SDAR) after disabling the PMU. Since
      these are userspace readable and could contain kernel addresses, add
      code to explicitly clear the content of these registers.
      
      Also add a "context synchronizing instruction" to enforce no further
      updates to these registers as suggested by Power ISA v3.0B. From
      section 9.4, on page 1108:
      
        "If an mtspr instruction is executed that changes the value of a
        Performance Monitor register other than SIAR, SDAR, and SIER, the
        change is not guaranteed to have taken effect until after a
        subsequent context synchronizing instruction has been executed (see
        Chapter 11. "Synchronization Requirements for Context Alterations"
        on page 1133)."
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      [mpe: Massage change log and add ISA reference]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e1ebd0e5
    • Paul Mackerras's avatar
      powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9 · dbfcf3cb
      Paul Mackerras authored
      On POWER9, since commit cc3d2940 ("powerpc/64: Enable use of radix
      MMU under hypervisor on POWER9", 2017-01-30), we set both the radix and
      HPT bits in the client-architecture-support (CAS) vector, which tells
      the hypervisor that we can do either radix or HPT.  According to PAPR,
      if we use this combination we are promising to do a H_REGISTER_PROC_TBL
      hcall later on to let the hypervisor know whether we are doing radix
      or HPT.  We currently do this call if we are doing radix but not if
      we are doing HPT.  If the hypervisor is able to support both radix
      and HPT guests, it would be entitled to defer allocation of the HPT
      until the H_REGISTER_PROC_TBL call, and to fail any attempts to create
      HPTEs until the H_REGISTER_PROC_TBL call.  Thus we need to do a
      H_REGISTER_PROC_TBL call when we are doing HPT; otherwise we may
      crash at boot time.
      
      This adds the code to call H_REGISTER_PROC_TBL in this case, before
      we attempt to create any HPT entries using H_ENTER.
      
      Fixes: cc3d2940 ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      dbfcf3cb
  2. 23 Mar, 2018 9 commits
    • Michael Ellerman's avatar
      Merge branch 'topic/ppc-kvm' into next · a26cf1c9
      Michael Ellerman authored
      This brings in two series from Paul, one of which touches KVM code and
      may need to be merged into the kvm-ppc tree to resolve conflicts.
      a26cf1c9
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state · 681c617b
      Paul Mackerras authored
      This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
      where the contents of the TEXASR can get corrupted while a thread is
      in fake suspend state.  The workaround is for the instruction emulation
      code to use the value saved at the most recent guest exit in real
      suspend mode.  We achieve this by simply not saving the TEXASR into
      the vcpu struct on an exit in fake suspend state.  We also have to
      take care to set the orig_texasr field only on guest exit in real
      suspend state.
      
      This also means that on guest entry in fake suspend state, TEXASR
      will be restored to the value it had on the last exit in real suspend
      state, effectively counteracting any hardware-caused corruption.  This
      works because TEXASR may not be written in suspend state.
      
      With this, the guest might see the wrong values in TEXASR if it reads
      it while in suspend state, but will see the correct value in
      non-transactional state (e.g. after a treclaim), and treclaim will
      work correctly.
      
      With this workaround, the code will actually run slightly faster, and
      will operate correctly on systems without the TEXASR bug (since TEXASR
      may not be written in suspend state, and is only changed by failure
      recording, which will have already been done before we get into fake
      suspend state).  Therefore these changes are not made subject to a CPU
      feature bit.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      681c617b
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode · 87a11bb6
      Suraj Jitindar Singh authored
      This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
      where a treclaim performed in fake suspend mode can cause subsequent
      reads from the XER register to return inconsistent values for the SO
      (summary overflow) bit.  The inconsistent SO bit state can potentially
      be observed on any thread in the core.  We have to do the treclaim
      because that is the only way to get the thread out of suspend state
      (fake or real) and into non-transactional state.
      
      The workaround for the bug is to force the core into SMT4 mode before
      doing the treclaim.  This patch adds the code to do that, conditional
      on the CPU_FTR_P9_TM_XER_SO_BUG feature bit.
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      87a11bb6
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 · 4bb3c7a0
      Paul Mackerras authored
      POWER9 has hardware bugs relating to transactional memory and thread
      reconfiguration (changes to hardware SMT mode).  Specifically, the core
      does not have enough storage to store a complete checkpoint of all the
      architected state for all four threads.  The DD2.2 version of POWER9
      includes hardware modifications designed to allow hypervisor software
      to implement workarounds for these problems.  This patch implements
      those workarounds in KVM code so that KVM guests see a full, working
      transactional memory implementation.
      
      The problems center around the use of TM suspended state, where the
      CPU has a checkpointed state but execution is not transactional.  The
      workaround is to implement a "fake suspend" state, which looks to the
      guest like suspended state but the CPU does not store a checkpoint.
      In this state, any instruction that would cause a transition to
      transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
      checkpointed state (treclaim) causes a "soft patch" interrupt (vector
      0x1500) to the hypervisor so that it can be emulated.  The trechkpt
      instruction also causes a soft patch interrupt.
      
      On POWER9 DD2.2, we avoid returning to the guest in any state which
      would require a checkpoint to be present.  The trechkpt in the guest
      entry path which would normally create that checkpoint is replaced by
      either a transition to fake suspend state, if the guest is in suspend
      state, or a rollback to the pre-transactional state if the guest is in
      transactional state.  Fake suspend state is indicated by a flag in the
      PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
      reads back as 0.
      
      On exit from the guest, if the guest is in fake suspend state, we still
      do the treclaim instruction as we would in real suspend state, in order
      to get into non-transactional state, but we do not save the resulting
      register state since there was no checkpoint.
      
      Emulation of the instructions that cause a softpatch interrupt is
      handled in two paths.  If the guest is in real suspend mode, we call
      kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
      transitioning to transactional state.  This is called before we do the
      treclaim in the guest exit path; because we haven't done treclaim, we
      can get back to the guest with the transaction still active.  If the
      instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
      handle, or if the guest is in fake suspend state, then we proceed to
      do the complete guest exit path and subsequently call
      kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
      all the cases including the cases that generate program interrupts
      (illegal instruction or TM Bad Thing) and facility unavailable
      interrupts.
      
      The emulation is reasonably straightforward and is mostly concerned
      with checking for exception conditions and updating the state of
      registers such as MSR and CR0.  The treclaim emulation takes care to
      ensure that the TEXASR register gets updated as if it were the guest
      treclaim instruction that had done failure recording, not the treclaim
      done in hypervisor state in the guest exit path.
      
      With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
      transactional memory is not available to host userspace.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4bb3c7a0
    • Paul Mackerras's avatar
      powerpc/powernv: Provide a way to force a core into SMT4 mode · 7672691a
      Paul Mackerras authored
      POWER9 processors up to and including "Nimbus" v2.2 have hardware
      bugs relating to transactional memory and thread reconfiguration.
      One of these bugs has a workaround which is to get the core into
      SMT4 state temporarily.  This workaround is only needed when
      running bare-metal.
      
      This patch provides a function which gets the core into SMT4 mode
      by preventing threads from going to a stop state, and waking up
      those which are already in a stop state.  Once at least 3 threads
      are not in a stop state, the core will be in SMT4 and we can
      continue.
      
      To do this, we add a "dont_stop" flag to the paca to tell the
      thread not to go into a stop state.  If this flag is set,
      power9_idle_stop() just returns immediately with a return value
      of 0.  The pnv_power9_force_smt4_catch() function does the following:
      
      1. Set the dont_stop flag for each thread in the core, except
         ourselves (in fact we use an atomic_inc() in case more than
         one thread is calling this function concurrently).
      2. See how many threads are awake, indicated by their
         requested_psscr field in the paca being 0.  If this is at
         least 3, skip to step 5.
      3. Send a doorbell interrupt to each thread that was seen as
         being in a stop state in step 2.
      4. Until at least 3 threads are awake, scan the threads to which
         we sent a doorbell interrupt and check if they are awake now.
      
      This relies on the following properties:
      
      - Once dont_stop is non-zero, requested_psccr can't go from zero to
        non-zero, except transiently (and without the thread doing stop).
      - requested_psscr being zero guarantees that the thread isn't in
        a state-losing stop state where thread reconfiguration could occur.
      - Doing stop with a PSSCR value of 0 won't be a state-losing stop
        and thus won't allow thread reconfiguration.
      - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
        must be in SMT4 mode, since SMT modes are powers of 2.
      
      This does add a sync to power9_idle_stop(), which is necessary to
      provide the correct ordering between setting requested_psscr and
      checking dont_stop.  The overhead of the sync should be unnoticeable
      compared to the latency of going into and out of a stop state.
      
      Because some objected to incurring this extra latency on systems where
      the XER[SO] bug is not relevant, I have put the test in
      power9_idle_stop inside a feature section.  This means that
      pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
      without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
      probably hang the system.
      
      In order to cater for uses where the caller has an operation that
      has to be done while the core is in SMT4, the core continues to be
      kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
      until the pnv_power9_force_smt4_release() function is called.
      It undoes the effect of step 1 above and allows the other threads
      to go into a stop state.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7672691a
    • Paul Mackerras's avatar
      powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2 · b5af4f27
      Paul Mackerras authored
      This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2
      processors which will be used to enable the hypervisor to assist
      hardware with the handling of checkpointed register values while the
      CPU is in suspend state, in order to work around hardware bugs.  The
      hardware assistance for these workarounds introduced a new hardware
      bug relating to the XER[SO] bit.  We add a separate feature bit for
      this bug in case future chips fix it while still requiring the
      hypervisor assistance with suspend state.
      
      When the dt_cpu_ftrs subsystem is in use, the software assistance can
      be enabled using a "tm-suspend-hypervisor-assist" node in the device
      tree, and a "tm-suspend-xer-so-bug" node enables the workarounds for
      the XER[SO] bug.  In the absence of such nodes, a quirk enables both
      for POWER9 "Nimbus" DD2.2 processors.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b5af4f27
    • Paul Mackerras's avatar
      powerpc: Free up CPU feature bits on 64-bit machines · 9bbf0b57
      Paul Mackerras authored
      This moves all the CPU feature bits that are only used on 32-bit
      machines to the top 20 bits of the CPU feature word and arranges
      for them to be defined only in 32-bit builds.  The features that
      are common to 32-bit and 64-bit machines are moved to bits 0-11
      of the CPU feature word.  This means that for 64-bit platforms,
      bits 44-63 can now be used for new features that only exist on
      64-bit machines.  (These bit numbers are counting from the right,
      i.e. the LSB is bit 0.)
      
      Because CPU_FTR_L3_DISABLE_NAP moved from the low 16 bits to the high
      16 bits, we have to adjust some assembly code.  Also, CPU_FTR_EMB_HV
      moved from the high 16 bits to the low 16 bits.
      
      Note that CPU_FTR_REAL_LE only applies to 64-bit chips, because only
      64-bit chips (POWER6, 7, 8, 9) have a true little-endian mode that is
      a CPU execution mode as opposed to being a page attribute.
      
      With this we now have 20 free CPU feature bits on 64-bit machines.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9bbf0b57
    • Paul Mackerras's avatar
      powerpc: Book E: Remove unused CPU_FTR_L2CSR bit · dd0efb3f
      Paul Mackerras authored
      The CPU_FTR_L2CSR bit is never tested anywhere, so let's reclaim the
      bit.
      
      The last usage was removed in 86d63363 ("powerpc/e500mc: Remove
      dead L2 flushing code in idle_e500.S") (Jun 2015).
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      dd0efb3f
    • Paul Mackerras's avatar
      powerpc: Use feature bit for RTC presence rather than timebase presence · c0d64cf9
      Paul Mackerras authored
      All PowerPC CPUs other than the original PPC601 have a timebase
      register rather than the "real-time clock" (RTC) register that the
      PPC601 (and the original POWER and POWER2 CPUs) had.  Currently
      we have a CPU feature bit to indicate the presence of the timebase,
      but it makes more sense to use a bit to indicate the unusual
      situation rather than the common situation.  This therefore defines
      a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and
      arranges for it to be set on PPC601 systems.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c0d64cf9
  3. 20 Mar, 2018 6 commits
  4. 14 Mar, 2018 6 commits
  5. 13 Mar, 2018 15 commits