1. 02 Apr, 2020 4 commits
    • Anju T Sudhakar's avatar
      powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events. · a36e8ba6
      Anju T Sudhakar authored
      IMC(In-memory Collection Counters) does performance monitoring in
      two different modes, i.e accumulation mode(core-imc and thread-imc events),
      and trace mode(trace-imc events). A cpu thread can either be in
      accumulation-mode or trace-mode at a time and this is done via the LDBAR
      register in POWER architecture. The current design does not address the
      races between thread-imc and trace-imc events.
      
      Patch implements a global id and lock to avoid the races between
      core, trace and thread imc events. With this global id-lock
      implementation, the system can either run core, thread or trace imc
      events at a time. i.e. to run any core-imc events, thread/trace imc events
      should not be enabled/monitored.
      Signed-off-by: default avatarAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200313055238.8656-1-anju@linux.vnet.ibm.com
      a36e8ba6
    • Ganesh Goudar's avatar
      powerpc/pseries: Fix MCE handling on pseries · a95a0a16
      Ganesh Goudar authored
      MCE handling on pSeries platform fails as recent rework to use common
      code for pSeries and PowerNV in machine check error handling tries to
      access per-cpu variables in realmode. The per-cpu variables may be
      outside the RMO region on pSeries platform and needs translation to be
      enabled for access. Just moving these per-cpu variable into RMO region
      did'nt help because we queue some work to workqueues in real mode, which
      again tries to touch per-cpu variables. Also fwnmi_release_errinfo()
      cannot be called when translation is not enabled.
      
      This patch fixes this by enabling translation in the exception handler
      when all required real mode handling is done. This change only affects
      the pSeries platform.
      
      Without this fix below kernel crash is seen on injecting
      SLB multihit:
      
      BUG: Unable to handle kernel data access on read at 0xc00000027b205950
      Faulting instruction address: 0xc00000000003b7e0
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in: mcetest_slb(OE+) af_packet(E) xt_tcpudp(E) ip6t_rpfilter(E) ip6t_REJECT(E) ipt_REJECT(E) xt_conntrack(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) xfs(E) ibmveth(E) vmx_crypto(E) gf128mul(E) uio_pdrv_genirq(E) uio(E) crct10dif_vpmsum(E) rtc_generic(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) raid6_pq(E) sr_mod(E) sd_mod(E) cdrom(E) ibmvscsi(E) scsi_transport_srp(E) crc32c_vpmsum(E) dm_mod(E) sg(E) scsi_mod(E)
      CPU: 34 PID: 8154 Comm: insmod Kdump: loaded Tainted: G OE 5.5.0-mahesh #1
      NIP: c00000000003b7e0 LR: c0000000000f2218 CTR: 0000000000000000
      REGS: c000000007dcb960 TRAP: 0300 Tainted: G OE (5.5.0-mahesh)
      MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28002428 XER: 20040000
      CFAR: c0000000000f2214 DAR: c00000027b205950 DSISR: 40000000 IRQMASK: 0
      GPR00: c0000000000f2218 c000000007dcbbf0 c000000001544800 c000000007dcbd70
      GPR04: 0000000000000001 c000000007dcbc98 c008000000d00258 c0080000011c0000
      GPR08: 0000000000000000 0000000300000003 c000000001035950 0000000003000048
      GPR12: 000000027a1d0000 c000000007f9c000 0000000000000558 0000000000000000
      GPR16: 0000000000000540 c008000001110000 c008000001110540 0000000000000000
      GPR20: c00000000022af10 c00000025480fd70 c008000001280000 c00000004bfbb300
      GPR24: c000000001442330 c00800000800000d c008000008000000 4009287a77000510
      GPR28: 0000000000000000 0000000000000002 c000000001033d30 0000000000000001
      NIP [c00000000003b7e0] save_mce_event+0x30/0x240
      LR [c0000000000f2218] pseries_machine_check_realmode+0x2c8/0x4f0
      Call Trace:
      Instruction dump:
      3c4c0151 38429050 7c0802a6 60000000 fbc1fff0 fbe1fff8 f821ffd1 3d42ffaf
      3fc2ffaf e98d0030 394a1150 3bdef530 <7d6a62aa> 1d2b0048 2f8b0063 380b0001
      ---[ end trace 46fd63f36bbdd940 ]---
      
      Fixes: 9ca766f9 ("powerpc/64s/pseries: machine check convert to use common event code")
      Reviewed-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200320110119.10207-1-ganeshgr@linux.ibm.com
      a95a0a16
    • Michael Ellerman's avatar
      selftests/eeh: Skip ahci adapters · bbe9064f
      Michael Ellerman authored
      The ahci driver doesn't support error recovery, and if your root
      filesystem is attached to it the eeh-basic.sh test will likely kill
      your machine.
      
      So skip any device we see using the ahci driver.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200326061144.2006522-1-mpe@ellerman.id.au
      bbe9064f
    • Nicholas Piggin's avatar
      powerpc/64s: Fix doorbell wakeup msgclr optimisation · 0c89649a
      Nicholas Piggin authored
      Commit 3282a3da ("powerpc/64: Implement soft interrupt replay in C")
      broke the doorbell wakeup optimisation introduced by commit a9af97aa
      ("powerpc/64s: msgclr when handling doorbell exceptions from system
      reset").
      
      This patch restores the msgclr, in C code. It's now done in the system
      reset wakeup path rather than doorbell interrupt replay where it used
      to be, because it is always the right thing to do in the wakeup case,
      but it may be rarely of use in other interrupt replay situations in
      which case it's wasted work - we would have to run measurements to see
      if that was a worthwhile optimisation, and I suspect it would not be.
      
      The results are similar to those in the original commit, test on POWER8
      of context_switch selftests benchmark with polling idle disabled (e.g.,
      always nap, giving cross-CPU IPIs) gives the following results:
      
                                        broken           patched
        Different threads, same core:   317k/s           375k/s    +18.7%
        Different cores:                280k/s           282k/s     +1.0%
      
      Fixes: 3282a3da ("powerpc/64: Implement soft interrupt replay in C")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200402121212.1118218-1-npiggin@gmail.com
      0c89649a
  2. 01 Apr, 2020 36 commits