Commits · 66b43081c0bde3171208a7cb52f5807dce4a79e4 · Kirill Smelkov / linux

08 Oct, 2014 14 commits

cxl: Add userspace header file · 66b43081

Ian Munsie authored Oct 08, 2014

This adds a header file for use by userspace programs wanting to interact with
the kernel cxl driver.  It defines structs and magic numbers required for
userspace to interact with devices in /dev/cxl/afuM.N.

Further documentation on this interface is added in a subsequent patch in
Documentation/powerpc/cxl.txt.

It also adds this new userspace header file to Kbuild so it's exported when
doing "make headers_installs".
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

66b43081

cxl: Driver code for powernv PCIe based cards for userspace access · f204e0b8

Ian Munsie authored Oct 08, 2014

This is the core of the cxl driver.

It adds support for using cxl cards in the powernv environment only (ie POWER8
bare metal). It allows access to cxl accelerators by userspace using the
/dev/cxl/afuM.N char devices.

The kernel driver has no knowledge of the function implemented by the
accelerator. It provides services to userspace via the /dev/cxl/afuM.N
devices. When a program opens this device and runs the start work IOCTL, the
accelerator will have coherent access to that processes memory using the same
virtual addresses. That process may mmap the device to access any MMIO space
the accelerator provides.  Also, reads on the device will allow interrupts to
be received. These services are further documented in a later patch in
Documentation/powerpc/cxl.txt.

Documentation of the cxl hardware architecture and userspace API is provided in
subsequent patches.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

f204e0b8

cxl: Add base builtin support · 10542ca0

Ian Munsie authored Oct 08, 2014

This adds the base cxl support that cannot be built as a module. Specifically
it adds the cxl callbacks that are called from the core powerpc mm code which
must always exist irrespective of if the cxl module is loaded or not. This is
similar to how cell works with CONFIG_SPU_BASE.

This adds a cxl_slbia() call (similar to spu_flush_all_slbs()) which checks if
the cxl module is loaded and in use, returning immediately if it is not. If it
is in use it calls into the cxl SLB invalidation code.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

10542ca0

powerpc/mm: Add hooks for cxl · 4c6d9acc

Ian Munsie authored Oct 08, 2014

This adds hooks into the core powerpc mm code for cxl.

The core powerpc code sometimes uses local tlbie. Unfortunately this won't
work with the current cxl driver as it relies on snooping tlbie broadcasts.

The cxl hardware can have TLB entries invalidated via MMIO but this is not
currently supported by the driver. In future we can make local tlbie smarter so
that it invalidates cxl contexts via MMIO when it needs to but for now we have
this workaround.

This workaround checks for any active cxl contexts and if so, disables local
tlbie.

This also adds a hook for when SLBs are invalidated. This ensures any
corresponding SLBs in cxl are also invalidated at the same time. This is
required for segment demotion.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

4c6d9acc

powerpc/opal: Add PHB to cxl mode call · 09521736

Ian Munsie authored Oct 08, 2014

This adds the OPAL call to change a PHB into cxl mode.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

09521736

powerpc/mm: Add new hash_page_mm() · a1dca346

Ian Munsie authored Oct 08, 2014

This adds a new function hash_page_mm() based on the existing hash_page().
This version allows any struct mm to be passed in, rather than assuming
current. This is useful for servicing co-processor faults which are not in the
context of the current running process.

We need to be careful here as the current hash_page() assumes current in a few
places.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

a1dca346

powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts · 80c49c7e

Ian Munsie authored Oct 08, 2014

This adds a number of functions for allocating IRQs under powernv PCIe for cxl.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

80c49c7e

cxl: Add new header for call backs and structs · 1cd258d7

Ian Munsie authored Oct 08, 2014

This new header adds callbacks and structs needed by the rest of the kernel to
hook into the cxl infrastructure.

This adds the cxl_ctx_in_use() function for use in the mm code to see if any
cxl contexts are currently in use. This is used by the tlbie() to determine if
it can do local TLB invalidations or not. This also adds get/put calls for the
cxl driver module to refcount the active cxl contexts.

cxl_ctx_get/put/in_use are static inlined here as they are called in tlbie
which we want to be fast (mpe's suggestion).

Empty functions are provided when CONFIG_CXL_BASE is not enabled.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

1cd258d7

powerpc/powernv: Split out set MSI IRQ chip code · fd9a1c26

Ian Munsie authored Oct 08, 2014

Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
split it out.

This will be used by some of the cxl PCIe code later.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

fd9a1c26

powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize · 8ca7a82f

Ian Munsie authored Oct 08, 2014

Export mmu_kernel_ssize and mmu_linear_psize.  These are needed by the cxl
driver which has it's own MMU.  To setup the MMU cxl needs access to these.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

8ca7a82f

powerpc/msi: Improve IRQ bitmap allocator · b0345bbc

Ian Munsie authored Oct 08, 2014

Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
to the nearest power of 2. eg. ask for 5 IRQs and you'll get 8. This wastes a
lot of IRQs which can be a scarce resource.

For cxl we may require multiple IRQs for every context that is attached to the
accelerator. There may be 1000s of contexts attached, hence we can easily run
out of IRQs, especially if we are needlessly wasting them.

This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
of IRQs, hence avoiding this wastage. It keeps the natural alignment
requirement though.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b0345bbc

powerpc/cell: Make spu_flush_all_slbs() generic · be3ebfe8

Ian Munsie authored Oct 08, 2014

This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs().

This will be useful when we add cxl which also needs a similar SLB flush call.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

be3ebfe8

powerpc/cell: Move data segment faulting code out of cell platform · 73d16a6e

Ian Munsie authored Oct 08, 2014

__spu_trap_data_seg() currently contains code to determine the VSID and ESID
required for a particular EA and mm struct.

This code is generically useful for other co-processors. This moves the code of
the cell platform so it can be used by other powerpc code. It also adds 1TB
segment handling which Cell didn't support.  The new function is called
copro_calculate_slb().

This also moves the internal struct spu_slb to a generic struct copro_slb which
is now used in the Cell and copro code.  We use this new struct instead of
passing around esid and vsid parameters.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

73d16a6e

powerpc/cell: Move spu_handle_mm_fault() out of cell platform · e83d0169

Ian Munsie authored Oct 08, 2014

Currently spu_handle_mm_fault() is in the cell platform.

This code is generically useful for other non-cell co-processors on powerpc.

This patch moves this function out of the cell platform into arch/powerpc/mm so
that others may use it.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e83d0169

07 Oct, 2014 5 commits

powerpc/pseries: Use new defines when calling H_SET_MODE · 60666de2

Michael Neuling authored May 29, 2014

Now that we define these in the KVM code, use these defines when we call
H_SET_MODE. No functional change.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

60666de2

powerpc: Update contact info in Documentation files · dfdac393

sukadev@linux.vnet.ibm.com authored Sep 30, 2014

Cody's email address has changed. Update the contact information for
the 24x7 and GPCI counters to the PowerPC developers mailing list.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

dfdac393

powerpc/perf/hv-24x7: Simplify catalog_read() · 56f12bee

sukadev@linux.vnet.ibm.com authored Sep 30, 2014

catalog_read() implements the read interface for the sysfs file

	/sys/bus/event_source/devices/hv_24x7/interface/catalog

It essentially takes a buffer, an offset and count as parameters
to the read() call.  It makes a hypervisor call to read a specific
page from the catalog and copy the required bytes into the given
buffer. Each call to catalog_read() returns at most one 4K page.

Given these requirements, we should be able to simplify the
catalog_read().
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

56f12bee

powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations · 48bee8a6

Cody P Schafer authored Sep 30, 2014

Ian pointed out the use of __aligned(4096) caused rather large stack
consumption in single_24x7_request(), so use the kmem_cache
hv_page_cache (which we've already got set up for other allocations)
insead of allocating locally.

CC: Haren Myneni <hbabu@us.ibm.com>
Reported-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Cody P Schafer <dev@codyps.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

48bee8a6

powerpc/powernv: Fix endian bug in LPC bus debugfs accessors · bf7588a0

Benjamin Herrenschmidt authored Oct 03, 2014

When reading from the LPC, the OPAL FW calls return the value via pointer
to a uint32_t which is always returned big endian. Our internal inb/outb
implementation byteswaps that fine but our debugfs code is still broken.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bf7588a0

03 Oct, 2014 6 commits

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git · 75d43b2d

Michael Ellerman authored Oct 04, 2014

Freescale updates from Scott (27 commits):

  "Highlights include DMA32 zone support (SATA, USB, etc now works on 64-bit
   FSL kernels), MSI changes, 8xx optimizations and cleanup, t104x board
   support, and PrPMC PCI enumeration."

75d43b2d

powerpc: Enable CONFIG_CRASH_DUMP=y for ppc64_defconfig · d0b7abb2

Michael Ellerman authored Sep 24, 2014

It pulls in more code, including causing us to build a relocatable
kernel, which is good for testing.

The resulting kernel is still usable as a non-crash dump kernel.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

d0b7abb2

powerpc/kdump: crash_dump.c needs to include io.h · edcee77f
Michael Ellerman authored Sep 24, 2014
```
For __ioremap().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
```
edcee77f

powerpc: Don't build powernv for other platform defconfigs · d3b94e4b

Michael Ellerman authored Sep 24, 2014

Because powernv arrived after these other platforms, the defconfigs
didn't have PPC_POWERNV disabled, and being default y it gets turned on.

If we're going to bother having defconfigs for the specific platforms
then they should only build the code required for those platforms.

The grab bag of everything config is ppc64_defconfig.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

d3b94e4b

powerpc/pci: remove duplicate declaration of pci_bus_find_capability · 8abf29f8

Wei Yang authored Sep 19, 2014

pci_bus_find_capability() is decleared in pci.h, so it is not necessary to do
it again.

This patch removes it.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

8abf29f8

powerpc/iommu/ddw: Fix endianness · 9410e018

Alexey Kardashevskiy authored Sep 25, 2014

rtas_call() accepts and returns values in CPU endianness.
The ddw_query_response and ddw_create_response structs members are
defined and treated as BE but as they are passed to rtas_call() as
(u32 *) and they get byteswapped automatically, the data is CPU-endian.
This fixes ddw_query_response and ddw_create_response definitions and use.

of_read_number() is designed to work with device tree cells - it assumes
the input is big-endian and returns data in CPU-endian. However due
to the ddw_create_response struct fix, create.addr_hi/lo are already
CPU-endian so do not byteswap them.

ddw_avail is a pointer to the "ibm,ddw-applicable" property which contains
3 cells which are big-endian as it is a device tree. rtas_call() accepts
a RTAS token in CPU-endian. This makes use of of_property_read_u32_array
to byte swap and avoid the need for a number of be32_to_cpu calls.

Cc: stable@vger.kernel.org # v3.13+
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[aik: folded Anton's patch with of_property_read_u32_array]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

9410e018

02 Oct, 2014 8 commits

powerpc: Add printk levels to powerpc code · a7696b36

Anton Blanchard authored Sep 17, 2014

Add printk levels to some places in the powerpc port.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

a7696b36

powerpc: Add printk levels to powernv platform code · 9a4f5cd0

Anton Blanchard authored Sep 17, 2014

Add printk levels to powernv platform code, and convert to
pr_err() etc while here.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

9a4f5cd0

powerpc: Remove powerpc specific cmd_line · 3e47d147

Anton Blanchard authored Sep 17, 2014

There is no need for yet another copy of the command line, just
use boot_command_line like everyone else.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

3e47d147

powerpc: Use pr_fmt in module loader code · c7d1f6af

Anton Blanchard authored Sep 17, 2014

Use pr_fmt to give some context to the error messages in the
module code, and convert open coded debug printk to pr_debug.

Use pr_err for error messages.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c7d1f6af

powerpc: Fill in si_addr_lsb siginfo field · 9d57472f

Anton Blanchard authored Sep 24, 2014

Fill in the si_addr_lsb siginfo field so the hwpoison code can
pass to userspace the length of memory that has been corrupted.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

9d57472f

powerpc: Add VM_FAULT_HWPOISON handling to powerpc page fault handler · 3913fdd7

Anton Blanchard authored Sep 24, 2014

do_page_fault was missing knowledge of HWPOISON, and we would oops
if userspace tried to access a poisoned page:

kernel BUG at arch/powerpc/mm/fault.c:180!
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

3913fdd7

powerpc: Simplify do_sigbus · 63af5262

Anton Blanchard authored Sep 24, 2014

Exit out early for a kernel fault, avoiding indenting of
most of the function.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

63af5262

powerpc: Speed up clear_page by unrolling it · e35735b9

Anton Blanchard authored Oct 02, 2014

Unroll clear_page 8 times. A simple microbenchmark which
allocates and frees a zeroed page:

for (i = 0; i < iterations; i++) {
	unsigned long p = __get_free_page(GFP_KERNEL | __GFP_ZERO);
	free_page(p);
}

improves 20% on POWER8.

This assumes cacheline sizes won't grow beyond 512 bytes or
page sizes wont drop below 1kB, which is unlikely, but we could
add a runtime check during early init if it makes people nervous.

Michael found that some versions of gcc produce quite bad code
(all multiplies), so we give gcc a hand by using shifts and adds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e35735b9

01 Oct, 2014 1 commit

powerpc/eeh: Show hex prefix for PE state sysfs · 2013add4

Gavin Shan authored Oct 01, 2014

As Michael suggested, the hex prefix for the output of EEH PE
state sysfs entry (/sys/bus/pci/devices/xxx/eeh_pe_state) is
always informative to users.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

2013add4

30 Sep, 2014 6 commits

powerpc/powernv: Override dma_get_required_mask() · fe7e85c6

Gavin Shan authored Sep 30, 2014

The dma_get_required_mask() function is used by some drivers to
query the platform about what DMA mask is needed to cover all of
memory. This is a bit of a strange semantic when we have to choose
between IOMMU translation or bypass, but essentially what it means
is "what DMA mask will give best performances".

Currently, our IOMMU backend always returns a 32-bit mask here, we
don't do anything special to it when we have bypass available. This
causes some drivers to choose a 32-bit mask, thus losing the ability
to use the bypass window, thinking this is more efficient. The problem
was reported from the driver of following device:

0004:03:00.0 0107: 1000:0087 (rev 05)
0004:03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios \
             Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)

This patch adds an override of that function in order to, instead,
return a 64-bit mask whenever a bypass window is available in order
for drivers to prefer this configuration.
Reported-by: Murali N. Iyer <mniyer@us.ibm.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

fe7e85c6

powerpc/powernv: Fetch frozen PE on top level · 372fb80d

Gavin Shan authored Sep 30, 2014

It should have been part of commit 1ad7a72c ("powerpc/eeh: Report
frozen parent PE prior to child PE"). There are 2 ways to report
EEH errors: proactively polling because of 0xFF's returned from
PCI config or IO read, or interrupt driven event. We missed to
report and handle parent frozen PE prior to child frozen PE for
the later case on PowerNV platform.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

372fb80d

powerpc/eeh: Dump PCI config space for all child devices · f2e0be5e

Gavin Shan authored Sep 30, 2014

The PEs can be organized as nested. Current implementation doesn't
dump PCI config space for subordinate devices of child PEs. However,
the frozen PE could be caused by those subordinate devices of its
child PEs.

The patch dumps PCI config space for all subordinate devices of the
problematic PE.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

f2e0be5e

powerpc/eeh: Emulate EEH recovery for VFIO devices · 5cfb20b9

Gavin Shan authored Sep 30, 2014

When enabling EEH functionality on passed through devices (PE)
with VFIO, the devices in the PE would be removed permanently
from guest side. In that case, the PE remains frozen state.
When returning PE to host, or restarting the guest again, we
had mechanism unfreezing the PE by clearing PESTA/B frozen
bits. However, that's not enough for some adapters, which are
indicated as following "lspci" shows. Those adapters require
hot reset on the parent bus to bring their firmware back to
workable state. Otherwise, those adaptrs won't be operative
and the host (for returning case) or the guest will fail to
load the drivers for those adapters without exception.

0000:01:00.0 Ethernet controller: Emulex Corporation OneConnect \
             10Gb NIC (be3) (rev 02)
0000:01:00.0 0200: 19a2:0710 (rev 02)
0001:03:00.0 Ethernet controller: Emulex Corporation OneConnect \
             NIC (Lancer) (rev 10)
0001:03:00.0 0200: 10df:e220 (rev 10)

The patch adds mechanism to emulate EEH recovery (for hot reset
on parent PCI bus) on 3 gates to fix the issue: open/release one
adapter of the PE, enable EEH functionality on one adapter of the
PE.
Reported-by: Murilo Fossa Vicentini <muvic@br.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

5cfb20b9

powerpc/eeh: Tag reset state for user owned PE · 93e8b36d

Gavin Shan authored Sep 30, 2014

PE would be owned by userland, which probably request PE reset
done in host side. During the reset, we should drop the PCI
config accesses to the PE with help of flag EEH_PE_RESET.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

93e8b36d

powerpc/powernv: Sync OpalPciResetScope with firmware · d1a85eee

Gavin Shan authored Sep 30, 2014

The names of PCI reset scopes aren't sychronized with firmware.
The patch fixes it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

d1a85eee