1. 22 Feb, 2016 2 commits
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Clear the invalid slot information correctly · 9ab3ac23
      Aneesh Kumar K.V authored
      We can get a hash pte fault with 4k base page size and find the pte
      already inserted with 64K base page size. In that case we need to clear
      the existing slot information from the old pte. Fix this correctly
      
      With THP, we also clear the slot information with respect to all
      the 64K hash pte mapping that 16MB page. They are all invalid
      now. This make sure we don't find the slot valid when we fault with
      4k base page size. Finding the slot valid should not result in any wrong
      behavior because we do check again in hash page table for the validity.
      But we can avoid that check completely.
      
      Fixes: a43c0eb8 ("powerpc/mm: Convert 4k hash insert to C")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9ab3ac23
    • Gavin Shan's avatar
      powerpc/eeh: Fix partial hotplug criterion · f6bf0fa1
      Gavin Shan authored
      During error recovery, the device could be removed as part of the
      partial hotplug. The criterion used to come with partial hotplug
      is: if the device driver provides error_detected(), slot_reset()
      and resume() callbacks, it's immune from hotplug. Otherwise,
      it's going to experience partial hotplug during EEH recovery. But
      the criterion isn't correct enough: mlx4_core driver for Mellanox
      adapters provides error_detected(), slot_reset() callbacks, but
      resume() isn't there. Those Mellanox adapters won't be to involved
      in the partial hotplug.
      
      This fixes the criterion to a practical one: adpater with driver
      that provides error_detected(), slot_reset() will be immune from
      partial hotplug. resume() isn't mandatory.
      
      Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion")
      Cc: stable@vger.kernel.org #v4.4+
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f6bf0fa1
  2. 17 Feb, 2016 1 commit
  3. 15 Feb, 2016 4 commits
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Fix Multi hit ERAT cause by recent THP update · c777e2a8
      Aneesh Kumar K.V authored
      With ppc64 we use the deposited pgtable_t to store the hash pte slot
      information. We should not withdraw the deposited pgtable_t without
      marking the pmd none. This ensure that low level hash fault handling
      will skip this huge pte and we will handle them at upper levels.
      
      Recent change to pmd splitting changed the above in order to handle the
      race between pmd split and exit_mmap. The race is explained below.
      
      Consider following race:
      
      		CPU0				CPU1
      shrink_page_list()
        add_to_swap()
          split_huge_page_to_list()
            __split_huge_pmd_locked()
              pmdp_huge_clear_flush_notify()
      	// pmd_none() == true
      					exit_mmap()
      					  unmap_vmas()
      					    zap_pmd_range()
      					      // no action on pmd since pmd_none() == true
      	pmd_populate()
      
      As result the THP will not be freed. The leak is detected by check_mm():
      
      	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
      
      The above required us to not mark pmd none during a pmd split.
      
      The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
      level fault handling code skip this pte. At higher level we do take ptl
      lock. That should serialze us against the pmd split. Once the lock is
      acquired we do check the pmd again using pmd_same. That should always
      return false for us and hence we should retry the access. We do the
      pmd_same check in all case after taking plt with
      THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
      huge_pmd_set_accessed)
      
      Also make sure we wait for irq disable section in other cpus to finish
      before flipping a huge pte entry with a regular pmd entry. Code paths
      like find_linux_pte_or_hugepte depend on irq disable to get
      a stable pte_t pointer. A parallel thp split need to make sure we
      don't convert a pmd pte to a regular pmd entry without waiting for the
      irq disable section to finish.
      
      Fixes: eef1b3ba ("thp: implement split_huge_pmd()")
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c777e2a8
    • Gavin Shan's avatar
      powerpc/powernv: Fix stale PE primary bus · 1bc74f1c
      Gavin Shan authored
      When PCI bus is unplugged during full hotplug for EEH recovery,
      the platform PE instance (struct pnv_ioda_pe) isn't released and
      it dereferences the stale PCI bus that has been released. It leads
      to kernel crash when referring to the stale PCI bus.
      
      This fixes the issue by correcting the PE's primary bus when it's
      oneline at plugging time, in pnv_pci_dma_bus_setup() which is to
      be called by pcibios_fixup_bus().
      
      Cc: stable@vger.kernel.org # v4.1+
      Reported-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reported-by: default avatarPradipta Ghosh <pradghos@in.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Tested-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1bc74f1c
    • Gavin Shan's avatar
      powerpc/eeh: Fix stale cached primary bus · 05ba75f8
      Gavin Shan authored
      When PE is created, its primary bus is cached to pe->bus. At later
      point, the cached primary bus is returned from eeh_pe_bus_get().
      However, we could get stale cached primary bus and run into kernel
      crash in one case: full hotplug as part of fenced PHB error recovery
      releases all PCI busses under the PHB at unplugging time and recreate
      them at plugging time. pe->bus is still dereferencing the PCI bus
      that was released.
      
      This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity
      of pe->bus. pe->bus is updated when its first child EEH device is
      online and the flag is set. Before unplugging in full hotplug for
      error recovery, the flag is cleared.
      
      Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE")
      Cc: stable@vger.kernel.org #v3.11+
      Reported-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reported-by: default avatarPradipta Ghosh <pradghos@in.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Tested-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      05ba75f8
    • Denis Kirjanov's avatar
      powerpc/pseries: Don't trace hcalls on offline CPUs · 126df08c
      Denis Kirjanov authored
      If a cpu is hotplugged while the hcall trace points are active, it's
      possible to hit a warning from RCU due to the trace points calling into
      RCU from an offline cpu, eg:
      
        RCU used illegally from offline CPU!
        rcu_scheduler_active = 1, debug_locks = 1
      
      Make the hypervisor tracepoints conditional by using
      TRACE_EVENT_FN_COND.
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarDenis Kirjanov <kda@linux-powerpc.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      126df08c
  4. 08 Feb, 2016 1 commit
    • Andreas Schwab's avatar
      powerpc: Fix dedotify for binutils >= 2.26 · f15838e9
      Andreas Schwab authored
      Since binutils 2.26 BFD is doing suffix merging on STRTAB sections.  But
      dedotify modifies the symbol names in place, which can also modify
      unrelated symbols with a name that matches a suffix of a dotted name.  To
      remove the leading dot of a symbol name we can just increment the pointer
      into the STRTAB section instead.
      
      Backport to all stables to avoid breakage when people update their
      binutils - mpe.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f15838e9
  5. 31 Jan, 2016 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/book3s_32: Fix build error with checkpoint restart · 19f97c98
      Aneesh Kumar K.V authored
      In file included from mm/vmscan.c:54:0:
      include/linux/swapops.h: In function ‘pte_to_swp_entry’:
      include/linux/swapops.h:69:2: error: implicit declaration of function ‘pte_swp_soft_dirty’ [-Werror=implicit-function-declaration]
        if (pte_swp_soft_dirty(pte))
        ^
      include/linux/swapops.h:70:3: error: implicit declaration of function ‘pte_swp_clear_soft_dirty’ [-Werror=implicit-function-declaration]
         pte = pte_swp_clear_soft_dirty(pte);
      
      We support soft dirty tracking only with book3s 64 for now.
      So change the Kconfig dependency accordingly. Also CHECKPOINT_RESTORE
      feature is not really dependent on SOFT_DIRTY. We track the dependency
      between MEM_SOFT_DIRTY and ARCH_SOFT_DIRTY through headers
      
      Fixes: 7207f436 ("powerpc/mm: Add page soft dirty tracking")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      19f97c98
  6. 28 Jan, 2016 2 commits
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Fixup _HPAGE_CHG_MASK · 2d19fc63
      Aneesh Kumar K.V authored
      This was wrongly updated by commit 7aa9a23c ("powerpc, thp: remove
      infrastructure for handling splitting PMDs") during the last merge
      window. Fix it up.
      
      This could lead to incorrect behaviour in THP and/or mprotect(), at a
      minimum.
      
      Fixes: 7aa9a23c ("powerpc, thp: remove infrastructure for handling splitting PMDs")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2d19fc63
    • Madhavan Srinivasan's avatar
      powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8 · 370f06c8
      Madhavan Srinivasan authored
      Commit 7a786832 ("powerpc/perf: Add an explict flag indicating
      presence of SLOT field") introduced the PPMU_HAS_SSLOT flag to remove
      the assumption that MMCRA[SLOT] was present when PPMU_ALT_SIPR was not
      set.
      
      That commit's changelog also mentions that Power8 does not support
      MMCRA[SLOT]. However when the Power8 PMU support was merged, it
      errnoeously included the PPMU_HAS_SSLOT flag.
      
      So remove PPMU_HAS_SSLOT from the Power8 flags.
      
      mpe: On systems where MMCRA[SLOT] exists, the field occupies bits 37:39
      (IBM numbering). On Power8 bit 37 is reserved, and 38:39 overlap with
      the high bits of the Threshold Event Counter Mantissa. I am not aware of
      any published events which use the threshold counting mechanism, which
      would cause the mantissa bits to be set. So in practice this bug is
      unlikely to trigger.
      
      Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support")
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      370f06c8
  7. 27 Jan, 2016 1 commit
  8. 25 Jan, 2016 1 commit
  9. 21 Jan, 2016 3 commits
    • Stephen Rothwell's avatar
      powerpc: Remove newly added extra definition of pmd_dirty · 0e2bce74
      Stephen Rothwell authored
      Commit d5d6a443 ("arch/powerpc/include/asm/pgtable-ppc64.h:
      add pmd_[dirty|mkclean] for THP") added a new identical definition
      of pmd_dirty(). Remove it again.
      
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0e2bce74
    • Alan Modra's avatar
      powerpc: Simplify module TOC handling · c153693d
      Alan Modra authored
      PowerPC64 uses the symbol .TOC. much as other targets use
      _GLOBAL_OFFSET_TABLE_. It identifies the value of the GOT pointer (or in
      powerpc parlance, the TOC pointer). Global offset tables are generally
      local to an executable or shared library, or in the kernel, module. Thus
      it does not make sense for a module to resolve a relocation against
      .TOC. to the kernel's .TOC. value. A module has its own .TOC., and
      indeed the powerpc64 module relocation processing ignores the kernel
      value of .TOC. and instead calculates a module-local value.
      
      This patch removes code involved in exporting the kernel .TOC., tweaks
      modpost to ignore an undefined .TOC., and the module loader to twiddle
      the section symbol so that .TOC. isn't seen as undefined.
      
      Note that if the kernel was compiled with -msingle-pic-base then ELFv2
      would not have function global entry code setting up r2. In that case
      the module call stubs would need to be modified to set up r2 using the
      kernel .TOC. value, requiring some of this code to be reinstated.
      
      mpe: Furthermore a change in binutils master (not yet released) causes
      the current way we handle the TOC to no longer work when building with
      MODVERSIONS=y and RELOCATABLE=n. The symptom is that modules can not be
      loaded due to there being no version found for TOC.
      
      Cc: stable@vger.kernel.org # 3.16+
      Signed-off-by: default avatarAlan Modra <amodra@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c153693d
    • Chandan Rajendra's avatar
      powerpc: Wire up copy_file_range() syscall · d7f9ee60
      Chandan Rajendra authored
      Test runs on a ppc64 BE guest succeeded using modified fstests.
      
      Also tested on ppc64 LE using a home made test - mpe.
      Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d7f9ee60
  10. 20 Jan, 2016 24 commits