1. 10 Aug, 2017 11 commits
    • Madhavan Srinivasan's avatar
      powerpc/perf: Factor out PPMU_ONLY_COUNT_RUN check code from power8 · 70a7e720
      Madhavan Srinivasan authored
      There are some hardware events on Power systems which only count when
      the processor is not idle, and there are some fixed-function counters
      which count such events. For example, the "run cycles" event counts
      cycles when the processor is not idle. If the user asks to count
      cycles, we can use "run cycles" if this is a per-task event, since the
      processor is running when the task is running, by definition. We can't
      use "run cycles" if the user asks for "cycles" on a system-wide
      counter.
      
      Currently in power8 this check is done using PPMU_ONLY_COUNT_RUN flag
      in power8_get_alternatives() function. Based on the flag, events are
      switched if needed. This function should also be enabled in power9, so
      factor out the code to isa207_get_alternatives().
      
      Fixes: efe881af ('powerpc/perf: Factor out event_alternative function')
      Reported-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      70a7e720
    • Madhavan Srinivasan's avatar
      powerpc/perf: Update default sdar_mode value for power9 · 7aa345d8
      Madhavan Srinivasan authored
      Commit 20dd4c62 ('powerpc/perf: Fix SDAR_MODE value for continous
      sampling on Power9') set the default sdar_mode value in MMCRA[SDAR_MODE]
      to be used as 0b01 (Update on TLB miss). And this value is set if sdar_mode
      from event is zero, or we are in continous sampling mode in power9 dd1.
      
      But it is preferred to have the sdar_mode value for power9 as
      0b10 (Update on dcache miss) for better sampling updates instead
      of 0b01 (Update on TLB miss).
      
      From Anton:
      
      Using a bandwidth test case with a 1MB footprint, I profiled cycles and
      chose TLB updates of the SDAR:
      
        $ perf record -d -e r000400000000001E:u ./bw2001 1M
                              ^
                              SDAR TLB
      
        $ perf report -D | grep PERF_RECORD_SAMPLE | sed 's/.*addr: //' | sort -u | wc -l
        4
      
        I get 4 unique addresses. If I ran with dcache misses:
      
        $ perf record -d -e r000800000000001E:u ./bw2001 1M
                              ^
                              SDAR dcache miss
      
        $ perf report -D|grep PERF_RECORD_SAMPLE| sed 's/.*addr: //'|sort -u | wc -l
        5217
      
      I get 5217 unique addresses. No surprises here, but it does show why
      TLB misses is the wrong event to default to - we get very little useful
      information out of it.
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Acked-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7aa345d8
    • Ivan Mikhaylov's avatar
      powerpc/44x/fsp2: Enable eMMC arasan for fsp2 platform · 754f0309
      Ivan Mikhaylov authored
      Add mmc0 changes for enabling arasan emmc and change
      defconfig appropriately.
      Signed-off-by: default avatarIvan Mikhaylov <ivan@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      754f0309
    • Suraj Jitindar Singh's avatar
      powerpc/mm: Properly invalidate when setting process table base · 7cd2a869
      Suraj Jitindar Singh authored
      The host process table base is stored in the partition table by calling
      the function native_register_process_table(). Currently this just sets
      the entry in memory and is missing a subsequent cache invalidation
      instruction. Any update to the partition table should be followed by a
      cache invalidation instruction specifying invalidation of the caching of
      any partition table entries (RIC = 2, PRS = 0).
      
      We already have a function to update the partition table with the
      required cache invalidation instructions - mmu_partition_table_set_entry().
      Update the native_register_process_table() function to call
      mmu_partition_table_set_entry(), this ensures all appropriate
      invalidation will be performed.
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Use a local for patb0 to clean it up slightly]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7cd2a869
    • Benjamin Herrenschmidt's avatar
      powerpc/xive: Ensure active irqd when setting affinity · cffb717c
      Benjamin Herrenschmidt authored
      Ensure irqd is active before attempting to set affinity. This should
      make the set affinity code more robust. For instance, this prevents
      these messages seen on a 4.12 based kernel when taking cpus offline:
      
         [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !
         [   77.885859] xive: Error -6 reconfiguring irq 17
         [   77.885862] IRQ17: set affinity failed(-6).
      
      That particular case has been fixed in 4.13-rc1 by commit
      91f26cb4 ("genirq/cpuhotplug: Do not migrated shutdown irqs").
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cffb717c
    • Nicholas Piggin's avatar
      powerpc: Add irq accounting for watchdog interrupts · 04019bf8
      Nicholas Piggin authored
      This adds an irq counter for the watchdog soft-NMI. This interrupt
      only fires when interrupts are soft-disabled, so it will not
      increment much even when the watchdog is running. However it's
      useful for debugging and sanity checking.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      04019bf8
    • Nicholas Piggin's avatar
    • Nicholas Piggin's avatar
      powerpc: Fix powerpc-specific watchdog build configuration · 75eb767e
      Nicholas Piggin authored
      The powerpc kernel/watchdog.o should be built when HARDLOCKUP_DETECTOR
      and HAVE_HARDLOCKUP_DETECTOR_ARCH are both selected. If only the former
      is selected, then the generic perf watchdog has been selected.
      
      To simplify this check, introduce a new Kconfig symbol PPC_WATCHDOG that
      depends on both. This Kconfig option means the powerpc specific
      watchdog is enabled.
      
      Without this patch, Book3E will attempt to build the powerpc watchdog.
      
      Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      75eb767e
    • Nicholas Piggin's avatar
      powerpc/64s: Fix mce accounting for powernv · f886f0f6
      Nicholas Piggin authored
      On 64-bit Book3s, when we're in HV mode, we have already counted the
      machine check exception in machine_check_early().
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Use IS_ENABLED() rather than an #ifdef]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f886f0f6
    • Nathan Fontenot's avatar
      powerpc/pseries: Check memory device state before onlining/offlining · 1a367063
      Nathan Fontenot authored
      When DLPAR adding or removing memory we need to check the device
      offline status before trying to online/offline the memory. This is
      needed because calls to device_online() and device_offline() will
      return non-zero for memory that is already online and offline
      respectively.
      
      This update resolves two scenarios. First, for a kernel built with
      auto-online memory enabled (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y),
      memory will be onlined as part of calls to add_memory(). After adding
      the memory the pseries DLPAR code tries to online it and fails since
      the memory is already online. The DLPAR code then tries to remove the
      memory which produces the oops message below because the memory is not
      offline.
      
      The second scenario occurs when removing memory that is already
      offline, i.e. marking memory offline (via sysfs) and then trying to
      remove that memory. This doesn't work because offlining the already
      offline memory does not succeed and the DLPAR code then fails the
      DLPAR remove operation.
      
      The fix for both scenarios is to check the device.offline status
      before making the calls to device_online() or device_offline().
      
        kernel BUG at mm/memory_hotplug.c:1936!
        ...
        NIP [c0000000002ca428] .remove_memory+0xb8/0xc0
        LR [c0000000002ca3cc] .remove_memory+0x5c/0xc0
        Call Trace:
          .remove_memory+0x5c/0xc0 (unreliable)
          .dlpar_add_lmb+0x384/0x400
          .dlpar_memory+0x5dc/0xca0
          .handle_dlpar_errorlog+0x74/0xe0
          .pseries_hp_work_fn+0x2c/0x90
          .process_one_work+0x17c/0x460
          .worker_thread+0x88/0x500
          .kthread+0x15c/0x1a0
          .ret_from_kernel_thread+0x58/0xc0
      
      Fixes: 943db62c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      [mpe: Use bool, add explicit rc=0 case, change log typos & formatting]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1a367063
    • Andreas Schwab's avatar
      powerpc: Fix invalid use of register expressions · 8a583c0a
      Andreas Schwab authored
      binutils >= 2.26 now warns about misuse of register expressions in
      assembler operands that are actually literals, for example:
      
        arch/powerpc/kernel/entry_64.S:535: Warning: invalid register expression
      
      In practice these are almost all uses of r0 that should just be a
      literal 0.
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      [mpe: Mention r0 is almost always the culprit, fold in purgatory change]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8a583c0a
  2. 08 Aug, 2017 6 commits
    • Michael Ellerman's avatar
      powerpc/mm/hash64: Make vmalloc 56T on hash · 21a0e8c1
      Michael Ellerman authored
      On 64-bit book3s, with the hash MMU, we currently define the kernel
      virtual space (vmalloc, ioremap etc.), to be 16T in size. This is a
      leftover from pre v3.7 when our user VM was also 16T.
      
      Of that 16T we split it 50/50, with half used for PCI IO and ioremap
      and the other 8T for vmalloc.
      
      We never bothered to make it any bigger because 8T of vmalloc ought to
      be enough for anybody. But it turns out that's not true, the per cpu
      allocator wants large amounts of vmalloc space, not to make large
      allocations, but to allow a large stride between allocations, because
      we use pcpu_embed_first_chunk().
      
      With a bit of juggling we can increase the entire kernel virtual space
      to 64T. The only real complication is the check of the address in the
      SLB miss handler, see the comment in the code.
      
      Although we could continue to split virtual space 50/50 as we do now,
      no one seems to be running out of PCI IO or ioremap space. So instead
      keep that as 8T, and use the remaining 56T for vmalloc.
      
      In future we should be able to increase the kernel virtual space to
      512T, the code already supports that, it just needs testing on older
      hardware.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      21a0e8c1
    • Michael Ellerman's avatar
      powerpc/mm/slb: Move comment next to the code it's referring to · b5048de0
      Michael Ellerman authored
      There is a comment in slb_allocate() referring to the load of
      paca->vmalloc_sllp, but it's several lines prior in the assembly.
      We're about to change this code, and we want to add another comment,
      so move the comment immediately prior to the instruction it's talking
      about.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b5048de0
    • Michael Ellerman's avatar
      powerpc/mm/book3s64: Make KERN_IO_START a variable · 63ee9b2f
      Michael Ellerman authored
      Currently KERN_IO_START is defined as:
      
       #define KERN_IO_START  (KERN_VIRT_START + (KERN_VIRT_SIZE >> 1))
      
      Although it looks like a constant, both the components are actually
      variables, to allow us to have a different value between Radix and
      Hash with a single kernel.
      
      However that still requires both Radix and Hash to place the kernel IO
      region at the same location relative to the start and end of the
      kernel virtual region (namely 1/2 way through it), and we'd like to
      change that.
      
      So split KERN_IO_START out into its own variable, and initialise it
      for Radix and Hash. In the medium term we should be able to
      reconsolidate this, by doing a more involved rearrangement of the
      location of the regions.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      63ee9b2f
    • Matt Brown's avatar
      powerpc/powernv: Use darn instruction for get_random_seed() on Power9 · e66ca3db
      Matt Brown authored
      This adds powernv_get_random_darn() which utilises the darn instruction,
      introduced in ISA v3.0/POWER9.
      
      The darn instruction can potentially return an error, which is supported
      by the get_random_seed() API, in normal usage if we see an error we just
      return that to the caller.
      
      However when detecting whether darn is functional at boot we try up to
      10 times, before deciding that darn doesn't work and failing the
      registration of get_random_seed(). That way an intermittent failure
      at boot doesn't deprive the system of randomness until the next reboot.
      Signed-off-by: default avatarMatt Brown <matthew.brown.dev@gmail.com>
      [mpe: Move init into a function, tweak change log]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e66ca3db
    • Christophe Leroy's avatar
      powerpc/32: Fix boot failure on non 6xx platforms · 64d0a506
      Christophe Leroy authored
      Commit d300627c ("powerpc/6xx: Handle DABR match before calling
      do_page_fault") breaks non 6xx platforms.
      
        Failed to execute /init (error -14)
        Starting init: /bin/sh exists but couldn't execute it (error -14)
        Kernel panic - not syncing: No working init found.  Try passing init= ...
        CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-rc3-s3k-dev-00143-g7aa62e972a56 #56
        Call Trace:
          panic+0x108/0x250 (unreliable)
          rootfs_mount+0x0/0x58
          ret_from_kernel_thread+0x5c/0x64
        Rebooting in 180 seconds..
      
      This is because in handle_page_fault(), the call to do_page_fault() has been
      mistakenly enclosed inside an #ifdef CONFIG_6xx
      
      Fixes: d300627c ("powerpc/6xx: Handle DABR match before calling do_page_fault")
      Brown-paper-bag-to-be-worn-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      64d0a506
    • Frederic Barrat's avatar
      powerpc/powernv: Enable PCI peer-to-peer · 25529100
      Frederic Barrat authored
      P9 has support for PCI peer-to-peer, enabling a device to write in the
      MMIO space of another device directly, without interrupting the CPU.
      
      This patch adds support for it on powernv, by adding a new API to be
      called by drivers. The pnv_pci_set_p2p(...) call configures an
      'initiator', i.e the device which will issue the MMIO operation, and a
      'target', i.e. the device on the receiving side.
      
      P9 really only supports MMIO stores for the time being but that's
      expected to change in the future, so the API allows to define both
      load and store operations.
      
        /* PCI p2p descriptor */
        #define OPAL_PCI_P2P_ENABLE           0x1
        #define OPAL_PCI_P2P_LOAD             0x2
        #define OPAL_PCI_P2P_STORE            0x4
      
        int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
                            u64 desc)
      
      It uses a new OPAL call, as the configuration magic is done on the
      PHBs by skiboot.
      Signed-off-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: default avatarRussell Currey <ruscur@russell.cc>
      [mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      25529100
  3. 03 Aug, 2017 22 commits
  4. 02 Aug, 2017 1 commit