1. 10 Aug, 2018 6 commits
    • Hari Bathini's avatar
      powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements · ced1bf52
      Hari Bathini authored
      With dynamic memory allocation support for crash memory ranges array,
      there is no hard limit on the no. of crash memory ranges kernel could
      export, but program headers count could overflow in the /proc/vmcore
      ELF file while exporting each memory range as PT_LOAD segment. Reduce
      the likelihood of a such scenario, by folding adjacent crash memory
      ranges which minimizes the total number of PT_LOAD segments.
      Signed-off-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Reviewed-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ced1bf52
    • Hari Bathini's avatar
      powerpc/fadump: handle crash memory ranges array index overflow · 1bd6a1c4
      Hari Bathini authored
      Crash memory ranges is an array of memory ranges of the crashing kernel
      to be exported as a dump via /proc/vmcore file. The size of the array
      is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
      where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
      value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
      commit 142b45a7 ("memblock: Add array resizing support").
      
      On large memory systems with a few DLPAR operations, the memblock memory
      regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
      systems, registering fadump results in crash or other system failures
      like below:
      
        task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000
        NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180
        REGS: c00000000b73b570 TRAP: 0300   Tainted: G          L   X  (4.4.140+)
        MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22004484  XER: 20000000
        CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0
        ...
        NIP [c000000000047df4] smp_send_reschedule+0x24/0x80
        LR [c0000000000f9e58] resched_curr+0x138/0x160
        Call Trace:
          resched_curr+0x138/0x160 (unreliable)
          check_preempt_curr+0xc8/0xf0
          ttwu_do_wakeup+0x38/0x150
          try_to_wake_up+0x224/0x4d0
          __wake_up_common+0x94/0x100
          ep_poll_callback+0xac/0x1c0
          __wake_up_common+0x94/0x100
          __wake_up_sync_key+0x70/0xa0
          sock_def_readable+0x58/0xa0
          unix_stream_sendmsg+0x2dc/0x4c0
          sock_sendmsg+0x68/0xa0
          ___sys_sendmsg+0x2cc/0x2e0
          __sys_sendmsg+0x5c/0xc0
          SyS_socketcall+0x36c/0x3f0
          system_call+0x3c/0x100
      
      as array index overflow is not checked for while setting up crash memory
      ranges causing memory corruption. To resolve this issue, dynamically
      allocate memory for crash memory ranges and resize it incrementally,
      in units of pagesize, on hitting array size limit.
      
      Fixes: 2df173d9 ("fadump: Initialize elfcore header and add PT_LOAD program headers.")
      Cc: stable@vger.kernel.org # v3.4+
      Signed-off-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Reviewed-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      [mpe: Just use PAGE_SIZE directly, fixup variable placement]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1bd6a1c4
    • Christophe Leroy's avatar
      powerpc/cpm1: fix compilation error with CONFIG_PPC_EARLY_DEBUG_CPM · 6bd6d867
      Christophe Leroy authored
      commit e8cb7a55 ("powerpc: remove superflous inclusions of
      asm/fixmap.h") removed inclusion of asm/fixmap.h from files not
      including objects from that file.
      
      However, asm/mmu-8xx.h includes  call to __fix_to_virt(). The proper
      way would be to include asm/fixmap.h in asm/mmu-8xx.h but it creates
      an inclusion loop.
      
      So we have to leave asm/fixmap.h in sysdep/cpm_common.c for
      CONFIG_PPC_EARLY_DEBUG_CPM
      
        CC      arch/powerpc/sysdev/cpm_common.o
      In file included from ./arch/powerpc/include/asm/mmu.h:340:0,
                       from ./arch/powerpc/include/asm/reg_8xx.h:8,
                       from ./arch/powerpc/include/asm/reg.h:29,
                       from ./arch/powerpc/include/asm/processor.h:13,
                       from ./arch/powerpc/include/asm/thread_info.h:28,
                       from ./include/linux/thread_info.h:38,
                       from ./arch/powerpc/include/asm/ptrace.h:159,
                       from ./arch/powerpc/include/asm/hw_irq.h:12,
                       from ./arch/powerpc/include/asm/irqflags.h:12,
                       from ./include/linux/irqflags.h:16,
                       from ./include/asm-generic/cmpxchg-local.h:6,
                       from ./arch/powerpc/include/asm/cmpxchg.h:537,
                       from ./arch/powerpc/include/asm/atomic.h:11,
                       from ./include/linux/atomic.h:5,
                       from ./include/linux/mutex.h:18,
                       from ./include/linux/kernfs.h:13,
                       from ./include/linux/sysfs.h:16,
                       from ./include/linux/kobject.h:20,
                       from ./include/linux/device.h:16,
                       from ./include/linux/node.h:18,
                       from ./include/linux/cpu.h:17,
                       from ./include/linux/of_device.h:5,
                       from arch/powerpc/sysdev/cpm_common.c:21:
      arch/powerpc/sysdev/cpm_common.c: In function ‘udbg_init_cpm’:
      ./arch/powerpc/include/asm/mmu-8xx.h:218:25: error: implicit declaration of function ‘__fix_to_virt’ [-Werror=implicit-function-declaration]
       #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
                               ^
      arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
             VIRT_IMMR_BASE);
             ^
      ./arch/powerpc/include/asm/mmu-8xx.h:218:39: error: ‘FIX_IMMR_BASE’ undeclared (first use in this function)
       #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
                                             ^
      arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
             VIRT_IMMR_BASE);
             ^
      ./arch/powerpc/include/asm/mmu-8xx.h:218:39: note: each undeclared identifier is reported only once for each function it appears in
       #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
                                             ^
      arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
             VIRT_IMMR_BASE);
             ^
      cc1: all warnings being treated as errors
      make[1]: *** [arch/powerpc/sysdev/cpm_common.o] Error 1
      
      Fixes: e8cb7a55 ("powerpc: remove superflous inclusions of asm/fixmap.h")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6bd6d867
    • Dan Carpenter's avatar
      powerpc: Fix size calculation using resource_size() · c42d3be0
      Dan Carpenter authored
      The problem is the the calculation should be "end - start + 1" but the
      plus one is missing in this calculation.
      
      Fixes: 8626816e ("powerpc: add support for MPIC message register API")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c42d3be0
    • Rashmica Gupta's avatar
    • Rashmica Gupta's avatar
      powerpc/powernv: Allow memory that has been hot-removed to be hot-added · d3da701d
      Rashmica Gupta authored
      This patch allows the memory removed by memtrace to be readded to the
      kernel. So now you don't have to reboot your system to add the memory
      back to the kernel or to have a different amount of memory removed.
      Signed-off-by: default avatarRashmica Gupta <rashmica.g@gmail.com>
      Tested-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d3da701d
  2. 08 Aug, 2018 1 commit
    • Breno Leitao's avatar
      selftests/powerpc: Kill child processes on SIGINT · 7c27a26e
      Breno Leitao authored
      There are some powerpc selftests, as tm/tm-unavailable, that run for a long
      period (>120 seconds), and if it is interrupted, as pressing CRTL-C
      (SIGINT), the foreground process (harness) dies but the child process and
      threads continue to execute (with PPID = 1 now) in background.
      
      In this case, you'd think the whole test exited, but there are remaining
      threads and processes being executed in background. Sometimes these
      zombies processes are doing annoying things, as consuming the whole CPU or
      dumping things to STDOUT.
      
      This patch fixes this problem by attaching an empty signal handler to
      SIGINT in the harness process. This handler will interrupt (EINTR) the
      parent process waitpid() call, letting the code to follow through the
      normal flow, which will kill all the processes in the child process group.
      
      This patch also fixes a typo.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7c27a26e
  3. 07 Aug, 2018 33 commits
    • Benjamin Herrenschmidt's avatar
      powerpc/powernv/opal: Use standard interrupts property when available · 77b5f703
      Benjamin Herrenschmidt authored
      For (bad) historical reasons, OPAL used to create a non-standard pair
      of properties "opal-interrupts" and "opal-interrupts-names" for
      representing the list of interrupts it wants Linux to request on its
      behalf.
      
      Among other issues, the opal-interrupts doesn't have a way to carry
      the type of interrupts, and they were assumed to be all level
      sensitive.
      
      This is wrong on some recent systems where some of them are edge
      sensitive causing warnings in the XIVE code and possible misbehaviours
      if they need to be retriggered (typically the NPU2 TCE error
      interrupts).
      
      This makes Linux switch to using the standard "interrupts" and
      "interrupt-names" properties instead when they are available, using
      standard of_irq helpers, which can carry all the desired type
      information.
      
      Newer versions of OPAL will generate those properties in addition to
      the legacy ones.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Fixup prefix logic to check strlen(r->name). Reinstate setting
       of start = 0 in opal_event_shutdown() to avoid double free warnings]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      77b5f703
    • Christophe Leroy's avatar
      powerpc: Allow CPU selection of e300core variants · d6690b1a
      Christophe Leroy authored
      GCC supports -mcpu=e300c2 and -mcpu=e300c3
      
      This patch gives the opportunity to tune kernel to one of
      those two types.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d6690b1a
    • Christophe Leroy's avatar
      powerpc: Allow CPU selection also on PPC32 · 0e00a8c9
      Christophe Leroy authored
      This patch extends to PPC32 the capability to select the exact
      CPU type.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0e00a8c9
    • Christophe Leroy's avatar
      powerpc: Make CPU selection logic generic in Makefile · cc62d20c
      Christophe Leroy authored
      At the time being, when adding a new CPU for selection, both
      Kconfig.cputype and Makefile have to be modified.
      
      This patch moves into Kconfig.cputype the name of the CPU to me
      passed to the -mcpu= argument.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Rename the option to TARGET_CPU to echo the gcc documentation]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cc62d20c
    • Rodrigo R. Galvao's avatar
      powerpc/Makefiles: Convert ifeq to ifdef where possible · badf436f
      Rodrigo R. Galvao authored
      In Makefiles if we're testing a CONFIG_FOO symbol for equality with 'y'
      we can instead just use ifdef. The latter reads easily, so convert to
      it where possible.
      Signed-off-by: default avatarRodrigo R. Galvao <rosattig@linux.vnet.ibm.com>
      Reviewed-by: default avatarMauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      badf436f
    • Paul Mackerras's avatar
      powerpc/64: Copy as much as possible in __copy_tofrom_user · f8db2007
      Paul Mackerras authored
      In __copy_tofrom_user, if we encounter an exception on a store, we
      stop copying and return the number of bytes not copied.  However,
      if the store is wider than one byte and is to an unaligned address,
      it is possible that the store operand overlaps a page boundary
      and the exception occurred on the latter part of the store operand,
      meaning that it would be possible to copy a few more bytes.  Since
      copy_to_user is generally expected to copy as much as possible,
      it would be better to copy those extra few bytes.  This adds code
      to do that.  Since this edge case is not performance-critical,
      the code has been written to be compact rather than as fast as
      possible.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f8db2007
    • Michael Ellerman's avatar
      selftests/powerpc/64: Test exception cases in copy_tofrom_user · 2679f63f
      Michael Ellerman authored
      This adds a set of test cases to test the behaviour of
      copy_tofrom_user when exceptions are encountered accessing the
      source or destination.  Currently, copy_tofrom_user does not always
      copy as many bytes as possible when an exception occurs on a store
      to the destination, and that is reflected in failures in these tests.
      
      Based on a test program from Anton Blanchard.
      
      [paulus@ozlabs.org - test all three paths, wrote commit description,
       made EX_TABLE create an exception table.]
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2679f63f
    • Paul Mackerras's avatar
      selftests/powerpc/64: Test all paths through copy routines · 98c45f51
      Paul Mackerras authored
      The hand-coded assembler 64-bit copy routines include feature sections
      that select one code path or another depending on which CPU we are
      executing on.  The self-tests for these copy routines end up testing
      just one path.  This adds a mechanism for selecting any desired code
      path at compile time, and makes 2 or 3 versions of each test, each
      using a different code path, so as to cover all the possible paths.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      [mpe: Add -mcpu=power4 to CFLAGS for older compilers]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      98c45f51
    • Paul Mackerras's avatar
      powerpc/64: Make exception table clearer in __copy_tofrom_user_base · a7c81ce3
      Paul Mackerras authored
      This aims to make the generation of exception table entries for the
      loads and stores in __copy_tofrom_user_base clearer and easier to
      verify.  Instead of having a series of local labels on the loads and
      stores, with a series of corresponding labels later for the exception
      handlers, we now use macros to generate exception table entries at the
      point of each load and store that could potentially trap.  We do this
      with the macros lex (load exception) and stex (store exception).
      These macros are used right before the load or store to which they
      apply.
      
      Some complexity is introduced by the fact that we have some more work
      to do after hitting an exception, because we need to calculate and
      return the number of bytes not copied.  The code uses r3 as the
      current pointer into the destination buffer, that is, the address of
      the first byte of the destination that has not been modified.
      However, at various points in the copy loops, r3 can be 4, 8, 16 or 24
      bytes behind that point.
      
      To express this offset in an understandable way, we define a symbol
      r3_offset which is updated at various points so that it equal to the
      difference between the address of the first unmodified byte of the
      destination and the value in r3.  (In fact it only needs to be
      accurate at the point of each lex or stex macro invocation.)
      
      The rules for updating r3_offset are as follows:
      
      * It starts out at 0
      * An addi r3,r3,N instruction decreases r3_offset by N
      * A store instruction (stb, sth, stw, std) to N(r3)
        increases r3_offset by the width of the store (1, 2, 4, 8)
      * A store with update instruction (stbu, sthu, stwu, stdu) to N(r3)
        sets r3_offset to the width of the store.
      
      There is some trickiness to the way that the lex and stex macros and
      the associated exception handlers work.  I would have liked to use
      the current value of r3_offset in the name of the symbol used as
      the exception handler, as in ".Lld_exc_$(r3_offset)" and then
      have symbols .Lld_exc_0, .Lld_exc_8, .Lld_exc_16 etc. corresponding
      to the offsets that needed to be added to r3.  However, I couldn't
      see a way to do that with gas.
      
      Instead, the exception handler address is .Lld_exc - r3_offset or
      .Lst_exc - r3_offset, that is, the distance ahead of .Lld_exc/.Lst_exc
      that we start executing is equal to the amount that we need to add to
      r3.  This works because r3_offset is always a small multiple of 4,
      and our instructions are 4 bytes long.  This means that before
      .Lld_exc and .Lst_exc, we have a sequence of instructions that
      increments r3 by 4, 8, 16 or 24 depending on where we start.  The
      sequence increments r3 by 4 per instruction (on average).
      
      We also replace the exception table for the 4k copy loop by a
      macro per load or store.  These loads and stores all use exactly
      the same exception handler, which simply resets the argument registers
      r3, r4 and r5 to there original values and re-does the whole copy
      using the slower loop.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a7c81ce3
    • zhong jiang's avatar
      powerpc/powermac: of_node_put() is not needed after iterator · 81d7b08b
      zhong jiang authored
      for_each_node_by_name() iterators only exit normally when the loop
      cursor is NULL, So there is no need to call of_node_put().
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Reviewed-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      81d7b08b
    • Haren Myneni's avatar
      crypto/nx: Initialize 842 high and normal RxFIFO control registers · 656ecc16
      Haren Myneni authored
      NX increments readOffset by FIFO size in receive FIFO control register
      when CRB is read. But the index in RxFIFO has to match with the
      corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
      may be processing incorrect CRBs and can cause CRB timeout.
      
      VAS FIFO offset is 0 when the receive window is opened during
      initialization. When the module is reloaded or in kexec boot, readOffset
      in FIFO control register may not match with VAS entry. This patch adds
      nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
      control register for both high and normal FIFOs.
      Signed-off-by: default avatarHaren Myneni <haren@us.ibm.com>
      [mpe: Fixup uninitialized variable warning]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      656ecc16
    • Haren Myneni's avatar
      powerpc/powernv: Export opal_check_token symbol · 6e708000
      Haren Myneni authored
      Export opal_check_token symbol for modules to check the availability
      of OPAL calls before using them.
      Signed-off-by: default avatarHaren Myneni <haren@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6e708000
    • Randy Dunlap's avatar
      powerpc/platforms/85xx: fix t1042rdb_diu.c build errors & warning · f5daf77a
      Randy Dunlap authored
      Fix build errors and warnings in t1042rdb_diu.c by adding header files
      and MODULE_LICENSE().
      
      ../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: data definition has no type or storage class
       early_initcall(t1042rdb_diu_init);
      ../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: error: type defaults to 'int' in declaration of 'early_initcall' [-Werror=implicit-int]
      ../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: parameter names (without types) in function declaration
      
      and
      WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/85xx/t1042rdb_diu.o
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Scott Wood <oss@buserror.net>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f5daf77a
    • Anju T Sudhakar's avatar
      powerpc/perf: Remove sched_task function defined for thread-imc · 7ccc4fe5
      Anju T Sudhakar authored
      Call trace observed while running perf-fuzzer:
      
        CPU: 43 PID: 9088 Comm: perf_fuzzer Not tainted 4.13.0-32-generic #35~lp1746225
        task: c000003f776ac900 task.stack: c000003f77728000
        NIP: c000000000299b70 LR: c0000000002a4534 CTR: c00000000029bb80
        REGS: c000003f7772b760 TRAP: 0700   Not tainted  (4.13.0-32-generic)
        MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
          CR: 24008822  XER: 00000000
        CFAR: c000000000299a70 SOFTE: 0
        GPR00: c0000000002a4534 c000003f7772b9e0 c000000001606200 c000003fef858908
        GPR04: c000003f776ac900 0000000000000001 ffffffffffffffff 0000003fee730000
        GPR08: 0000000000000000 0000000000000000 c0000000011220d8 0000000000000002
        GPR12: c00000000029bb80 c000000007a3d900 0000000000000000 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 c000003f776ad090 c000000000c71354
        GPR24: c000003fef716780 0000003fee730000 c000003fe69d4200 c000003f776ad330
        GPR28: c0000000011220d8 0000000000000001 c0000000014c6108 c000003fef858900
        NIP [c000000000299b70] perf_pmu_sched_task+0x170/0x180
        LR [c0000000002a4534] __perf_event_task_sched_in+0xc4/0x230
        Call Trace:
          perf_iterate_sb+0x158/0x2a0 (unreliable)
          __perf_event_task_sched_in+0xc4/0x230
          finish_task_switch+0x21c/0x310
          __schedule+0x304/0xb80
          schedule+0x40/0xc0
          do_wait+0x254/0x2e0
          kernel_wait4+0xa0/0x1a0
          SyS_wait4+0x64/0xc0
          system_call+0x58/0x6c
        Instruction dump:
        3beafea0 7faa4800 409eff18 e8010060 eb610028 ebc10040 7c0803a6 38210050
        eb81ffe0 eba1ffe8 ebe1fff8 4e800020 <0fe00000> 4bffffbc 60000000 60420000
        ---[ end trace 8c46856d314c1811 ]---
      
      The context switch call-backs for thread-imc are defined in sched_task function.
      So when thread-imc events are grouped with software pmu events,
      perf_pmu_sched_task hits the WARN_ON_ONCE condition, since software PMUs are
      assumed not to have a sched_task defined.
      
      Patch to move the thread_imc enable/disable opal call back from sched_task to
      event_[add/del] function
      
      Fixes: f74c89bd ("powerpc/perf: Add thread IMC PMU support")
      Signed-off-by: default avatarAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reviewed-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Tested-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7ccc4fe5
    • Nicholas Piggin's avatar
      powerpc/64s: Fix page table fragment refcount race vs speculative references · 4231aba0
      Nicholas Piggin authored
      The page table fragment allocator uses the main page refcount racily
      with respect to speculative references. A customer observed a BUG due
      to page table page refcount underflow in the fragment allocator. This
      can be caused by the fragment allocator set_page_count stomping on a
      speculative reference, and then the speculative failure handler
      decrements the new reference, and the underflow eventually pops when
      the page tables are freed.
      
      Fix this by using a dedicated field in the struct page for the page
      table fragment allocator.
      
      Fixes: 5c1f6ee9 ("powerpc: Reduce PTE table memory wastage")
      Cc: stable@vger.kernel.org # v3.10+
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4231aba0
    • Parth Y Shah's avatar
      misc: cxl: changed asterisk position · a0ac3687
      Parth Y Shah authored
      Resolved <"foo* bar" should be "foo *bar"> error
      Signed-off-by: default avatarParth Y Shah <sparth1292@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a0ac3687
    • Darren Stevens's avatar
      powerpc/pasemi: Use pr_err/pr_warn... for kernel messages · e13606d7
      Darren Stevens authored
      Pasemi code still uses printk(KERN_ERR/KERN_WARN ... change these to
      pr_err(, pr_warn(... to match other powerpc arch code.
      
      No functional changes.
      Signed-off-by: default avatarDarren Stevens <darren@stevens-zone.net>
      [mpe: Unsplit some strings while we're at it]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e13606d7
    • Murilo Opsfelder Araujo's avatar
      powerpc/traps: Show instructions on exceptions · a99b9c5e
      Murilo Opsfelder Araujo authored
      Call show_user_instructions() in arch/powerpc/kernel/traps.c to dump
      instructions at faulty location, useful to debugging.
      
      Before this patch, an unhandled signal message looked like:
      
        pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000]
      
      After this patch, it looks like:
      
        pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000]
        pandafault[10524]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
        pandafault[10524]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040
      Signed-off-by: default avatarMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a99b9c5e
    • Murilo Opsfelder Araujo's avatar
      powerpc: Add show_user_instructions() · 88b0fe17
      Murilo Opsfelder Araujo authored
      show_user_instructions() is a slightly modified version of
      show_instructions() that allows userspace instruction dump.
      
      This will be useful within show_signal_msg() to dump userspace
      instructions of the faulty location.
      
      Here is a sample of what show_user_instructions() outputs:
      
        pandafault[10850]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
        pandafault[10850]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040
      
      The current->comm and current->pid printed can serve as a glue that
      links the instructions dump to its originator, allowing messages to be
      interleaved in the logs.
      Signed-off-by: default avatarMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      88b0fe17
    • Murilo Opsfelder Araujo's avatar
      powerpc/traps: Print VMA for unhandled signals · 0f642d61
      Murilo Opsfelder Araujo authored
      This adds VMA address in the message printed for unhandled signals,
      similarly to what other architectures, like x86, print.
      
      Before this patch, a page fault looked like:
      
        pandafault[61470]: unhandled signal 11 at 100007d0 nip 1000061c lr 7fff8d185100 code 2
      
      After this patch, a page fault looks like:
      
        pandafault[6303]: segfault 11 at 100007d0 nip 1000061c lr 7fff93c55100 code 2 in pandafault[10000000+10000]
      Signed-off-by: default avatarMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0f642d61
    • Murilo Opsfelder Araujo's avatar
      powerpc/traps: Use %lx format in show_signal_msg() · 49d8f201
      Murilo Opsfelder Araujo authored
      Use %lx format to print registers.  This avoids having two different
      formats and avoids checking for MSR_64BIT, improving readability of the
      function.
      
      Even though we could have used %px, which is functionally equivalent to %lx
      as per Documentation/core-api/printk-formats.rst, it is not semantically
      correct because the data printed are not pointers.  And using %px requires
      casting data to (void *).
      
      Besides that, %lx matches the format used in show_regs().
      
      Before this patch:
      
        pandafault[4808]: unhandled signal 11 at 0000000010000718 nip 0000000010000574 lr 00007fff935e7a6c code 2
      
      After this patch:
      
        pandafault[4732]: unhandled signal 11 at 10000718 nip 10000574 lr 7fff86697a6c code 2
      Signed-off-by: default avatarMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      49d8f201
    • Murilo Opsfelder Araujo's avatar
      powerpc/traps: Use an explicit ratelimit state for show_signal_msg() · 35a52a10
      Murilo Opsfelder Araujo authored
      Replace printk_ratelimited() by printk() and a default rate limit
      burst to limit displaying unhandled signals messages.
      
      This will allow us to call print_vma_addr() in a future patch, which
      does not work with printk_ratelimited().
      Signed-off-by: default avatarMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      35a52a10
    • Murilo Opsfelder Araujo's avatar
      powerpc/traps: Print unhandled signals in a separate function · 658b0f92
      Murilo Opsfelder Araujo authored
      Isolate the logic of printing unhandled signals out of _exception_pkey().
      No functional change, only code rearrangement.
      Signed-off-by: default avatarMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      658b0f92
    • Michael Ellerman's avatar
      selftests/powerpc: Add more version checks to alignment_handler test · 8e4bdc69
      Michael Ellerman authored
      The alignment_handler is documented to only work on Power8/Power9, but
      we can make it run on older CPUs by guarding more of the tests with
      feature checks.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      8e4bdc69
    • Michael Ellerman's avatar
      selftests/powerpc: Skip earlier in alignment_handler test · edba42cd
      Michael Ellerman authored
      Currently the alignment_handler test prints "Can't open /dev/fb0"
      about 80 times per run, which is a little annoying.
      
      Refactor it to check earlier if it can open /dev/fb0 and skip if not,
      this results in each test printing something like:
      
        test: test_alignment_handler_vsx_206
        tags: git_version:v4.18-rc3-134-gfb21a48904aa
        [SKIP] Test skipped on line 291
        skip: test_alignment_handler_vsx_206
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      edba42cd
    • Michael Ellerman's avatar
      powerpc/64s: Make rfi_flush_fallback a little more robust · 78ee9946
      Michael Ellerman authored
      Because rfi_flush_fallback runs immediately before the return to
      userspace it currently runs with the user r1 (stack pointer). This
      means if we oops in there we will report a bad kernel stack pointer in
      the exception entry path, eg:
      
        Bad kernel stack pointer 7ffff7150e40 at c0000000000023b4
        Oops: Bad kernel stack pointer, sig: 6 [#1]
        LE SMP NR_CPUS=32 NUMA PowerNV
        Modules linked in:
        CPU: 0 PID: 1246 Comm: klogd Not tainted 4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3 #7
        NIP:  c0000000000023b4 LR: 0000000010053e00 CTR: 0000000000000040
        REGS: c0000000fffe7d40 TRAP: 4100   Not tainted  (4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3)
        MSR:  9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE>  CR: 44000442  XER: 20000000
        CFAR: c00000000000bac8 IRQMASK: c0000000f1e66a80
        GPR00: 0000000002000000 00007ffff7150e40 00007fff93a99900 0000000000000020
        ...
        NIP [c0000000000023b4] rfi_flush_fallback+0x34/0x80
        LR [0000000010053e00] 0x10053e00
      
      Although the NIP tells us where we were, and the TRAP number tells us
      what happened, it would still be nicer if we could report the actual
      exception rather than barfing about the stack pointer.
      
      We an do that fairly simply by loading the kernel stack pointer on
      entry and restoring the user value before returning. That way we see a
      regular oops such as:
      
        Unrecoverable exception 4100 at c00000000000239c
        Oops: Unrecoverable exception, sig: 6 [#1]
        LE SMP NR_CPUS=32 NUMA PowerNV
        Modules linked in:
        CPU: 0 PID: 1251 Comm: klogd Not tainted 4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty #40
        NIP:  c00000000000239c LR: 0000000010053e00 CTR: 0000000000000040
        REGS: c0000000f1e17bb0 TRAP: 4100   Not tainted  (4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty)
        MSR:  9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE>  CR: 44000442  XER: 20000000
        CFAR: c00000000000bac8 IRQMASK: 0
        ...
        NIP [c00000000000239c] rfi_flush_fallback+0x3c/0x80
        LR [0000000010053e00] 0x10053e00
        Call Trace:
        [c0000000f1e17e30] [c00000000000b9e4] system_call+0x5c/0x70 (unreliable)
      
      Note this shouldn't make the kernel stack pointer vulnerable to a
      meltdown attack, because it should be flushed from the cache before we
      return to userspace. The user r1 value will be in the cache, because
      we load it in the return path, but that is harmless.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      78ee9946
    • Michael Ellerman's avatar
      powerpc/powernv: Query firmware for count cache flush settings · 99d54754
      Michael Ellerman authored
      Look for fw-features properties to determine the appropriate settings
      for the count cache flush, and then call the generic powerpc code to
      set it up based on the security feature flags.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      99d54754
    • Michael Ellerman's avatar
      powerpc/pseries: Query hypervisor for count cache flush settings · ba72dc17
      Michael Ellerman authored
      Use the existing hypercall to determine the appropriate settings for
      the count cache flush, and then call the generic powerpc code to set
      it up based on the security feature flags.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ba72dc17
    • Michael Ellerman's avatar
      powerpc/64s: Add support for software count cache flush · ee13cb24
      Michael Ellerman authored
      Some CPU revisions support a mode where the count cache needs to be
      flushed by software on context switch. Additionally some revisions may
      have a hardware accelerated flush, in which case the software flush
      sequence can be shortened.
      
      If we detect the appropriate flag from firmware we patch a branch
      into _switch() which takes us to a count cache flush sequence.
      
      That sequence in turn may be patched to return early if we detect that
      the CPU supports accelerating the flush sequence in hardware.
      
      Add debugfs support for reporting the state of the flush, as well as
      runtime disabling it.
      
      And modify the spectre_v2 sysfs file to report the state of the
      software flush.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ee13cb24
    • Michael Ellerman's avatar
      powerpc/64s: Add new security feature flags for count cache flush · dc8c6cce
      Michael Ellerman authored
      Add security feature flags to indicate the need for software to flush
      the count cache on context switch, and for the presence of a hardware
      assisted count cache flush.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      dc8c6cce
    • Michael Ellerman's avatar
      powerpc/asm: Add a patch_site macro & helpers for patching instructions · 06d0bbc6
      Michael Ellerman authored
      Add a macro and some helper C functions for patching single asm
      instructions.
      
      The gas macro means we can do something like:
      
        1:	nop
        	patch_site 1b, patch__foo
      
      Which is less visually distracting than defining a GLOBAL symbol at 1,
      and also doesn't pollute the symbol table which can confuse eg. perf.
      
      These are obviously similar to our existing feature sections, but are
      not automatically patched based on CPU/MMU features, rather they are
      designed to be manually patched by C code at some arbitrary point.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      06d0bbc6
    • Diana Craciun's avatar
      Documentation: Add nospectre_v1 parameter · 26cb1f36
      Diana Craciun authored
      Currently only supported on powerpc.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      26cb1f36
    • Diana Craciun's avatar
      powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms · c28218d4
      Diana Craciun authored
      Used barrier_nospec to sanitize the syscall table.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c28218d4