1. 05 Apr, 2018 7 commits
  2. 04 Apr, 2018 8 commits
    • Frederic Barrat's avatar
      cxl: Fix possible deadlock when processing page faults from cxllib · ad7b4e80
      Frederic Barrat authored
      cxllib_handle_fault() is called by an external driver when it needs to
      have the host resolve page faults for a buffer. The buffer can cover
      several pages and VMAs. The function iterates over all the pages used
      by the buffer, based on the page size of the VMA.
      
      To ensure some stability while processing the faults, the thread T1
      grabs the mm->mmap_sem semaphore with read access (R1). However, when
      processing a page fault for a single page, one of the underlying
      functions, copro_handle_mm_fault(), also grabs the same semaphore with
      read access (R2). So the thread T1 takes the semaphore twice.
      
      If another thread T2 tries to access the semaphore in write mode W1
      (say, because it wants to allocate memory and calls 'brk'), then that
      thread T2 will have to wait because there's a reader (R1). If the
      thread T1 is processing a new page at that time, it won't get an
      automatic grant at R2, because there's now a writer thread
      waiting (T2). And we have a deadlock.
      
      The timeline is:
      1. thread T1 owns the semaphore with read access R1
      2. thread T2 requests write access W1 and waits
      3. thread T1 requests read access R2 and waits
      
      The fix is for the thread T1 to release the semaphore R1 once it got
      the information it needs from the current VMA. The address space/VMAs
      could evolve while T1 iterates over the full buffer, but in the
      unlikely case where T1 misses a page, the external driver will raise a
      new page fault when retrying the memory access.
      
      Fixes: 3ced8d73 ("cxl: Export library to support IBM XSL")
      Cc: stable@vger.kernel.org # 4.13+
      Signed-off-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ad7b4e80
    • Naveen N. Rao's avatar
      powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it · 5d6a03eb
      Naveen N. Rao authored
      We get the below warning if we try to use kexec on P9:
         kexec_core: Starting new kernel
         WARNING: CPU: 0 PID: 1223 at arch/powerpc/kernel/process.c:826 __set_breakpoint+0xb4/0x140
         [snip]
         NIP __set_breakpoint+0xb4/0x140
         LR  kexec_prepare_cpus_wait+0x58/0x150
         Call Trace:
           0xc0000000ee70fb20 (unreliable)
           0xc0000000ee70fb20
           default_machine_kexec+0x234/0x2c0
           machine_kexec+0x84/0x90
           kernel_kexec+0xd8/0xe0
           SyS_reboot+0x214/0x2c0
           system_call+0x58/0x6c
      
      This happens since we are trying to clear hw breakpoint on POWER9,
      though we don't have CPU_FTR_DAWR enabled. Guard __set_breakpoint()
      within hw_breakpoint_disable() with ppc_breakpoint_available() to
      address this.
      
      Fixes: 96541531 ("powerpc: Disable DAWR in the base POWER9 CPU features")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5d6a03eb
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Update command line parsing for disable_radix · 7a22d632
      Aneesh Kumar K.V authored
      kernel parameter disable_radix takes different options
      disable_radix=yes|no|1|0  or just disable_radix.
      
      prom_init parsing is not supporting these options.
      
      Fixes: 1fd6c022 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7a22d632
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Parse disable_radix commandline correctly. · cec4e9b2
      Aneesh Kumar K.V authored
      kernel parameter disable_radix takes different options
      disable_radix=yes|no|1|0 or just disable_radix. When using the later
      format we get below error.
      
       `Malformed early option 'disable_radix'`
      
      Fixes: 1fd6c022 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cec4e9b2
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb · 6fa50483
      Aneesh Kumar K.V authored
      With 64k page size, we have hugetlb pte entries at the pmd and pud level for
      book3s64. We don't need to create a separate page table cache for that. With 4k
      we need to make sure hugepd page table cache for 16M is placed at PUD level
      and 16G at the PGD level.
      
      Simplify all these by not using HUGEPD_PD_SHIFT which is confusing for book3s64.
      
      Without this patch, with 64k page size we create pagetable caches with shift
      value 10 and 7 which are not used at all.
      
      Fixes: 419df06e ("powerpc: Reduce the PTE_INDEX_SIZE")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6fa50483
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix · fb4e5dbd
      Aneesh Kumar K.V authored
      With split PTL (page table lock) config, we allocate the level
      4 (leaf) page table using pte fragment framework instead of slab cache
      like other levels. This was done to enable us to have split page table
      lock at the level 4 of the page table. We use page->plt backing the
      all the level 4 pte fragment for the lock.
      
      Currently with Radix, we use only 16 fragments out of the allocated
      page. In radix each fragment is 256 bytes which means we use only 4k
      out of the allocated 64K page wasting 60k of the allocated memory.
      This was done earlier to keep it closer to hash.
      
      This patch update the pte fragment count to 256, thereby using the
      full 64K page and reducing the memory usage. Performance tests shows
      really low impact even with THP disabled. With THP disabled we will be
      contenting further less on level 4 ptl and hence the impact should be
      further low.
      
        256 threads:
          without patch (10 runs of ./ebizzy  -m -n 1000 -s 131072 -S 100)
            median = 15678.5
            stdev = 42.1209
      
          with patch:
            median = 15354
            stdev = 194.743
      
      This is with THP disabled. With THP enabled the impact of the patch
      will be less.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fb4e5dbd
    • Aneesh Kumar K.V's avatar
      powerpc/mm/keys: Update documentation and remove unnecessary check · f2ed480f
      Aneesh Kumar K.V authored
      Adds more code comments. We also remove an unnecessary pkey check
      after we check for pkey error in this patch.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f2ed480f
    • Nicholas Piggin's avatar
      powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead · b9ee31e1
      Nicholas Piggin authored
      When stop is executed with EC=ESL=0, it appears to execute like a
      normal instruction (resuming from NIP when woken by interrupt). So all
      the save/restore handling can be avoided completely. In particular NV
      GPRs do not have to be saved, and MSR does not have to be switched
      back to kernel MSR.
      
      So move the test for EC=ESL=0 sleep states out to power9_idle_stop,
      and return directly to the caller after stop in that case.
      
      This improves performance for ping-pong benchmark with the stop0_lite
      idle state by 2.54% for 2 threads in the same core, and 2.57% for
      different cores. Performance increase with HV_POSSIBLE defined will be
      improved further by avoiding the hwsync.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b9ee31e1
  3. 03 Apr, 2018 11 commits
  4. 01 Apr, 2018 5 commits
  5. 31 Mar, 2018 9 commits