1. 18 Aug, 2015 1 commit
    • Michael Ellerman's avatar
      powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · 74b5037b
      Michael Ellerman authored
      The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
      PAGE_SIZE.
      
      However when built with a 4K PAGE_SIZE there is an additional config
      option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
      also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
      
      This is used in one obscure configuration, to support 64K pages for SPU
      local store on the Cell processor when the rest of the kernel is using
      4K pages.
      
      In this configuration, pte_pagesize_index() is defined to just pass
      through its arguments to get_slice_psize(). However pte_pagesize_index()
      is called for both user and kernel addresses, whereas get_slice_psize()
      only knows how to handle user addresses.
      
      This has been broken forever, however until recently it happened to
      work. That was because in get_slice_psize() the large kernel address
      would cause the right shift of the slice mask to return zero.
      
      However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
      64TB"), the get_slice_psize() code was changed so that instead of a
      right shift we do an array lookup based on the address. When passed a
      kernel address this means we index way off the end of the slice array
      and return random junk.
      
      That is only fatal if we happen to hit something non-zero, but when we
      do return a non-zero value we confuse the MMU code and eventually cause
      a check stop.
      
      This fix is ugly, but simple. When we're called for a kernel address we
      return 4K, which is always correct in this configuration, otherwise we
      use the slice mask.
      
      Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB")
      Reported-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      74b5037b
  2. 17 Aug, 2015 3 commits
  3. 14 Aug, 2015 12 commits
    • Daniel Axtens's avatar
      cxl: EEH support · 9e8df8a2
      Daniel Axtens authored
      EEH (Enhanced Error Handling) allows a driver to recover from the
      temporary failure of an attached PCI card. Enable basic CXL support
      for EEH.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9e8df8a2
    • Daniel Axtens's avatar
      cxl: Allow the kernel to trust that an image won't change on PERST. · 13e68d8b
      Daniel Axtens authored
      Provide a kernel API and a sysfs entry which allow a user to specify
      that when a card is PERSTed, it's image will stay the same, allowing
      it to participate in EEH.
      
      cxl_reset is used to reflash the card. In that case, we cannot safely
      assert that the image will not change. Therefore, disallow cxl_reset
      if the flag is set.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      13e68d8b
    • Daniel Axtens's avatar
      cxl: Don't remove AFUs/vPHBs in cxl_reset · 4e1efb40
      Daniel Axtens authored
      If the driver doesn't participate in EEH, the AFUs will be removed
      by cxl_remove, which will be invoked by EEH.
      
      If the driver does particpate in EEH, the vPHB needs to stick around
      so that the it can particpate.
      
      In both cases, we shouldn't remove the AFU/vPHB.
      Reviewed-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4e1efb40
    • Daniel Axtens's avatar
      cxl: Refactor AFU init/teardown · d76427b0
      Daniel Axtens authored
      As with an adapter, some aspects of initialisation are done only once
      in the lifetime of an AFU: for example, allocating memory, or setting
      up sysfs/debugfs files.
      
      However, we may want to be able to do some parts of the initialisation
      multiple times: for example, in error recovery we want to be able to
      tear down and then re-map IO memory and IRQs.
      
      Therefore, refactor AFU init/teardown as follows.
      
       - Create two new functions: 'cxl_configure_afu', and its pair
         'cxl_deconfigure_afu'. As with the adapter functions,
         these (de)configure resources that do not need to last the entire
         lifetime of the AFU.
      
       - Allocating and releasing memory remain the task of 'cxl_alloc_afu'
         and 'cxl_release_afu'.
      
       - Once-only functions that do not involve allocating/releasing memory
         stay in the overarching 'cxl_init_afu'/'cxl_remove_afu' pair.
         However, the task of picking an AFU mode and activating it has been
         broken out.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d76427b0
    • Daniel Axtens's avatar
      cxl: Refactor adaptor init/teardown · c044c415
      Daniel Axtens authored
      Some aspects of initialisation are done only once in the lifetime of
      an adapter: for example, allocating memory for the adapter,
      allocating the adapter number, or setting up sysfs/debugfs files.
      
      However, we may want to be able to do some parts of the
      initialisation multiple times: for example, in error recovery we
      want to be able to tear down and then re-map IO memory and IRQs.
      
      Therefore, refactor CXL init/teardown as follows.
      
       - Keep the overarching functions 'cxl_init_adapter' and its pair,
         'cxl_remove_adapter'.
      
       - Move all 'once only' allocation/freeing steps to the existing
         'cxl_alloc_adapter' function, and its pair 'cxl_release_adapter'
         (This involves moving allocation of the adapter number out of
         cxl_init_adapter.)
      
       - Create two new functions: 'cxl_configure_adapter', and its pair
         'cxl_deconfigure_adapter'. These two functions 'wire up' the
         hardware --- they (de)configure resources that do not need to
         last the entire lifetime of the adapter
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c044c415
    • Daniel Axtens's avatar
      cxl: Clean up adapter MMIO unmap path. · 575e6986
      Daniel Axtens authored
      - MMIO pointer unmapping is guarded by a null pointer check.
         However, iounmap doesn't null the pointer, just invalidate it.
         Therefore, explicitly null the pointer after unmapping.
      
       - afu_desc_mmio also needs to be unmapped.
      
       - PCI regions are allocated in cxl_map_adapter_regs.
         Therefore they should be released in unmap, not elsewhere.
      Acked-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      575e6986
    • Daniel Axtens's avatar
      cxl: Make IRQ release idempotent · e640d2fc
      Daniel Axtens authored
      Check if an IRQ is mapped before releasing it.
      
      This will simplify future EEH code by allowing unconditional unmapping
      of IRQs.
      Acked-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e640d2fc
    • Daniel Axtens's avatar
      cxl: Allocate and release the SPA with the AFU · 05155772
      Daniel Axtens authored
      Previously the SPA was allocated and freed upon entering and leaving
      AFU-directed mode. This causes some issues for error recovery - contexts
      hold a pointer inside the SPA, and they may persist after the AFU has
      been detached.
      
      We would ideally like to allocate the SPA when the AFU is allocated, and
      release it until the AFU is released. However, we don't know how big the
      SPA needs to be until we read the AFU descriptor.
      
      Therefore, restructure the code:
      
       - Allocate the SPA only once, on the first attach.
      
       - Release the SPA only when the entire AFU is being released (not
         detached). Guard the release with a NULL check, so we don't free
         if it was never allocated (e.g. dedicated mode)
      Acked-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      05155772
    • Daniel Axtens's avatar
      cxl: Drop commands if the PCI channel is not in normal state · 0b3f9c75
      Daniel Axtens authored
      If the PCI channel has gone down, don't attempt to poke the hardware.
      
      We need to guard every time cxl_whatever_(read|write) is called. This
      is because a call to those functions will dereference an offset into an
      mmio register, and the mmio mappings get invalidated in the EEH
      teardown.
      
      Check in the read/write functions in the header.
      We give them the same semantics as usual PCI operations:
       - a write to a channel that is down is ignored.
       - a read from a channel that is down returns all fs.
      
      Also, we try to access the MMIO space of a vPHB device as part of the
      PCI disable path. Because that's a read that bypasses most of our usual
      checks, we handle it explicitly.
      
      As far as user visible warnings go:
       - Check link state in file ops, return -EIO if down.
       - Be reasonably quiet if there's an error in a teardown path,
         or when we already know the hardware is going down.
       - Throw a big WARN if someone tries to start a CXL operation
         while the card is down. This gives a useful stacktrace for
         debugging whatever is doing that.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0b3f9c75
    • Daniel Axtens's avatar
      cxl: Convert MMIO read/write macros to inline functions · 588b34be
      Daniel Axtens authored
      We're about to make these more complex, so make them functions
      first.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      588b34be
    • Daniel Axtens's avatar
      powerpc/eeh: Probe after unbalanced kref check · e642d11b
      Daniel Axtens authored
      In the complete hotplug case, EEH PEs are supposed to be released
      and set to NULL. Normally, this is done by eeh_remove_device(),
      which is called from pcibios_release_device().
      
      However, if something is holding a kref to the device, it will not
      be released, and the PE will remain. eeh_add_device_late() has
      a check for this which will explictly destroy the PE in this case.
      
      This check in eeh_add_device_late() occurs after a call to
      eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
      which will exit without probing if there is an existing PE.
      
      This means that on PowerNV, devices with outstanding krefs will not
      be rediscovered by EEH correctly after a complete hotplug. This is
      affecting CXL (CAPI) devices in the field.
      
      Put the probe after the kref check so that the PE is destroyed
      and affected devices are correctly rediscovered by EEH.
      
      Fixes: d91dafc0 ("powerpc/eeh: Delay probing EEH device during hotplug")
      Cc: stable@vger.kernel.org
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Acked-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e642d11b
    • Gautham R. Shenoy's avatar
      powerpc: Add an inline function to update POWER8 HID0 · e63dbd16
      Gautham R. Shenoy authored
      Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
      prescribes that updates to HID0 be preceded by a SYNC instruction and
      followed by an ISYNC instruction (Page 91).
      
      Create an inline function name update_power8_hid0() which follows this
      recipe and invoke it from the static split core path.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: default avatarSam Bobroff <sam.bobroff@au1.ibm.com>
      Tested-by: default avatarSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e63dbd16
  4. 12 Aug, 2015 8 commits
  5. 10 Aug, 2015 1 commit
  6. 06 Aug, 2015 12 commits
  7. 30 Jul, 2015 3 commits
    • Michael Ellerman's avatar
      selftests/seccomp: Add powerpc support · 5d83c2b3
      Michael Ellerman authored
      Wire up the syscall number and regs so the tests work on powerpc.
      
      With the powerpc kernel support just merged, all tests pass on ppc64,
      ppc64 (compat), ppc64le, ppc, ppc64e and ppc64e (compat).
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5d83c2b3
    • Michael Ellerman's avatar
      selftests/seccomp: Make seccomp tests work on big endian · c385d0db
      Michael Ellerman authored
      The seccomp_bpf test uses BPF_LD|BPF_W|BPF_ABS to load 32-bit values
      from seccomp_data->args. On big endian machines this will load the high
      word of the argument, which is not what the test wants.
      
      Borrow a hack from samples/seccomp/bpf-helper.h which changes the offset
      on big endian to account for this.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      c385d0db
    • Michael Ellerman's avatar
      powerpc/kernel: Enable seccomp filter · 2449acc5
      Michael Ellerman authored
      This commit enables seccomp filter on powerpc, now that we have all the
      necessary pieces in place.
      
      To support seccomp's desire to modify the syscall return value under
      some circumstances, we use a different ABI to the ptrace ABI. That is we
      use r3 as the syscall return value, and orig_gpr3 is the first syscall
      parameter.
      
      This means the seccomp code, or a ptracer via SECCOMP_RET_TRACE, will
      see -ENOSYS preloaded in r3. This is identical to the behaviour on x86,
      and allows seccomp or the ptracer to either leave the -ENOSYS or change
      it to something else, as well as rejecting or not the syscall by
      modifying r0.
      
      If seccomp does not reject the syscall, we restore the register state to
      match what ptrace and audit expect, ie. r3 is the first syscall
      parameter again. We do this restore using orig_gpr3, which may have been
      modified by seccomp, which allows seccomp to modify the first syscall
      paramater and allow the syscall to proceed.
      
      We need to #ifdef the the additional handling of r3 for seccomp, so move
      it all out of line.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      2449acc5