Commits · 31cdd0c39c7544ced79da53aa0b7e989f3a39582 · Kirill Smelkov / linux

11 May, 2016 40 commits

powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs · 31cdd0c3

Paul Mackerras authored Apr 13, 2016

xmon has commands for reading and writing SPRs, but they don't work
currently for several reasons. They attempt to synthesize a small
function containing an mfspr or mtspr instruction and call it. However,
the instructions are on the stack, which is usually not executable.
Also, for 64-bit we set up a procedure descriptor, which is fine for the
big-endian ABIv1, but not correct for ABIv2. Finally, the code uses the
infrastructure for catching memory errors, but that only catches data
storage interrupts and machine check interrupts, but a failed
mfspr/mtspr can generate a program interrupt or a hypervisor emulation
assist interrupt, or be a no-op.

Instead of trying to synthesize a function on the fly, this adds two new
functions, xmon_mfspr() and xmon_mtspr(), which take an SPR number as an
argument and read or write the SPR. Because there is no Power ISA
instruction which takes an SPR number in a register, we have to generate
one of each possible mfspr and mtspr instruction, for all 1024 possible
SPRs. Thus we get just over 8k bytes of code for each of xmon_mfspr()
and xmon_mtspr(). However, this 16kB of code pales in comparison to the
> 130kB of PPC opcode tables used by the xmon disassembler.

To catch interrupts caused by the mfspr/mtspr instructions, we add a new
'catch_spr_faults' flag. If an interrupt occurs while it is set, we come
back into xmon() via program_check_interrupt(), _exception() and die(),
see that catch_spr_faults is set and do a longjmp to bus_error_jmp, back
into read_spr() or write_spr().

This adds a couple of other nice features: first, a "Sa" command that
attempts to read and print out the value of all 1024 SPRs. If any mfspr
instruction acts as a no-op, then the SPR is not implemented and not
printed.

Secondly, the Sr and Sw commands detect when an SPR is not
implemented (i.e. mfspr is a no-op) and print a message to that effect
rather than printing a bogus value.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

31cdd0c3

perf tools: Fix perf regs mask generation · f4782207

Naveen N. Rao authored Apr 28, 2016

On some architectures (powerpc in particular), the number of registers
exceeds what can be represented in an integer bitmask. Ensure we
generate the proper bitmask on such platforms.

Fixes: 71ad0f5e ("perf tools: Support for DWARF CFI unwinding on post processing")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

f4782207

perf/powerpc: Add support for unwinding perf-stackdump · c4522469

Chandan Kumar authored Apr 28, 2016

Adds support for unwinding user stack dump by linking with libunwind.
Signed-off-by: Chandan Kumar <chandan.kumar@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c4522469

powerpc: Add HAVE_PERF_USER_STACK_DUMP support · 17ed7c38

Chandan Kumar authored Apr 28, 2016

With perf regs support enabled for powerpc, in commit ed4a4ef8
("powerpc/perf: Add support for sampling interrupt register state"),
the support for obtaining perf user stack dump is already enabled. This
patch declares the support for same and also updates documentation to
mark the support for perf-regs and perf-stackdump.
Signed-off-by: Chandan Kumar <chandan.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

17ed7c38

powerpc/mm/hash64: Fix subpage protection with 4K HPTE config · aac55d75

Michael Ellerman authored May 06, 2016

With Linux page size of 64K and hardware only supporting 4K HPTE, if we
use subpage protection, we always fail for the subpage 0 as shown
below (using the selftest subpage_prot test):

  520175565:  (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass !
  4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass !
  4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass !
  4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass !

This is because hash preload wrongly inserts the HPTE entry for subpage
0 without looking at the subpage protection information.

Fix it by teaching should_hash_preload() not to preload if we have
subpage protection configured for that range.

It appears this has been broken since it was introduced in 2008.

Fixes: fa28237c ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

aac55d75

powerpc/mm/hash64: Factor out hash preload psize check · 8bbc9b7b

Michael Ellerman authored May 06, 2016

Currently we have a check in hash_preload() against the psize, which is
only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand
this check in a subsequent patch, so factor it out to allow that. As a
bonus it removes the #ifdef in the C code.

Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES
block because it would require a forward declaration.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

8bbc9b7b

powerpc: Update of_remove_property() call sites to remove null checking · 925e2d1d

Suraj Jitindar Singh authored Apr 28, 2016

After obtaining a property from of_find_property() and before calling
of_remove_property() most code checks to ensure that the property
returned from of_find_property() is not null. The previous patch moved
this check to the start of the function of_remove_property() in order to
avoid the case where this check isn't done and a null value is passed.
This ensures the check is always conducted before taking locks and
attempting to remove the property. Thus it is no longer necessary to
perform a check for null values before invoking of_remove_property().

Update of_remove_property() call sites in order to remove redundant
checking for null property value as check is now performed within the
of_remove_property function().
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[mpe: Unbreak some lines which are just >80 chars for readability]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

925e2d1d

drivers/of: Add check for null property in of_remove_property() · 201b3fe5

Suraj Jitindar Singh authored Apr 28, 2016

The validity of the property input argument to of_remove_property() is
never checked within the function and thus it is possible to pass a null
value. It happens that this will be picked up in __of_remove_property()
as no matching property of the device node will be found and thus an
error will be returned, however once again there is no explicit check
for a null value. By the time this is detected 2 locks have already been
acquired which is completely unnecessary if the property to remove is
null.

Add an explicit check in the function of_remove_property() for a null
property value and return -ENODEV in this case, this is consistent with
what the previous return value would have been when the null value was
not detected and passed to __of_remove_property().

By moving an explicit check for the property paramenter into the
of_remove_property() function, this will remove the need to perform this
check in calling code before invocation of the of_remove_property()
function.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

201b3fe5

powerpc/pseries: Add null property check to pseries_discover_pic() · b2ed0596

Suraj Jitindar Singh authored Apr 28, 2016

The return value of of_get_property() isn't checked before it is passed
to the strstr() function, if it happens that the return value is null
then this will result in a null pointer being dereferenced.

Add a check to see if the return value of of_get_property() is null and
if it is continue straight on to the next node.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Chris Smart <chris@distroguy.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b2ed0596

powerpc/powernv/pci: Fix cfg_dbg() & replace with pr_devel() · 9e447547

Alexey Kardashevskiy authored May 02, 2016

When cfg_dbg() is enabled (i.e. mapped to printk()), gcc produces
errors as the __func__ parameter is missing (pnv_pci_cfg_read() has one);
this adds the missing parameter.

cfg_dbg() is just an inferior version of pr_devel() so use the latter
instead.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

9e447547

selftests/powerpc: Fix subpage_prot test to return !0 on failure · 2f67798c

Michael Ellerman authored May 02, 2016

It's helpful for automated testing if the test returns error codes back
to the calling program.

While we're here fix all the usages of %p to remove the double 0x, ie.
%p already includes 0x.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

2f67798c

selftests/powerpc: Test cp_abort during context switch · 438517ec

Chris Smart authored May 02, 2016

Test that performing a copy paste sequence in userspace on P9 does not
result in a leak of the copy into the paste of another process.

This is based on Anton Blanchard's context_switch benchmarking code. It
sets up two processes tied to the same CPU, one which copies and one
which pastes.

The paste should never succeed and the test fails if it does.

This is a test for commit, "8a649045 powerpc: Add support for userspace
P9 copy paste."

Patch created with much assistance from Michael Neuling
<mikey@neuling.org>
Signed-off-by: Chris Smart <chris@distroguy.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

438517ec

powerpc: Remove unnecessary CONFIG_SMP #ifdefs · e44c1b15

Chris Smart authored May 02, 2016

The code in machine_restart/power_off/halt() includes #ifdefs around
calls to smp_send_stop(), however these are not required as
include/linux/smp.h includes an empty version of this function for
CONFIG_SMP=n builds.
Signed-off-by: Chris Smart <chris@distroguy.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e44c1b15

powerpc: Remove unused remnants from A2 cpu · c415c9cb

Rashmica Gupta authored Apr 12, 2016

Support for the A2 cpu was removed in commit fb5a5157 ("powerpc:
Remove platforms/wsp and associated pieces"), and the externs:
__setup_cpu_a2 and __restore_cpu_a2 are still around and unused, so
remove them.
Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c415c9cb

powerpc/mm/slice: Remove slice_mm_new_context() · 62ccf5bf

Aneesh Kumar K.V authored May 02, 2016

The usage in mm mmu_context_nohash.c is bogus, because we set the
context.id value to MMU_NO_CONTEXT 4 lines previously in the same
function, meaning slice_mm_new_context() will always be true.

The book3s 64 usage was removed in the previous commit. So remove it as
unused.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

62ccf5bf

powerpc/mm/subpage: Initialise user psize correctly · 2d566537

Aneesh Kumar K.V authored May 02, 2016

As part of the radix support we switched Book3s64 to use a value of ~0
for MMU_NO_CONTEXT. That is because id 0 is special on radix.

However that broke the logic in init_new_context(). The code there needs
to differentiate between a newly allocated context and one inherited via
fork. Previously it worked because a newly allocated context has an id
of zero (because it was just memset() to zero), which used to match
MMU_NO_CONTEXT, and therefore slice_mm_new_context() did the right
thing.

Instead check against a context.id value of zero instead of using
slice_mm_new_context().

Without this patch we never call slice_set_user_psize(), and end up with
a slice psize value of zero and we always end up using 4K HPTE.

Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

2d566537

powerpc/mm/radix: Fix CONFIG_PPC_MMU_STD_64 typo · bb03efe2

Valentin Rothberg authored May 03, 2016

It's CONFIG_PPC_STD_MMU_64 not ...
     CONFIG_PPC_MMU_STD_64.

Fixes: 11ffc1cfa4c2 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bb03efe2

powerpc/mm/radix: Document software bits for radix · 69dfbaeb

Aneesh Kumar K.V authored Apr 29, 2016

Add #defines for Power ISA 3.0 software defined bits.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

69dfbaeb

powerpc/mm/radix: Use firmware feature to enable Radix MMU · 17a3dd2f

Aneesh Kumar K.V authored Apr 29, 2016

We use the existing "ibm,pa-features" device-tree property to enable
Radix MMU mode. This means we default to hash mode unless firmware tells
us it's OK to start using Radix mode.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

17a3dd2f

powerpc/mm/radix: Add THP support for 4K linux page size · ab624762

Aneesh Kumar K.V authored Apr 29, 2016

This adds THP support for 4K Linux page size config with radix. We still
don't do THP with 4K Linux page size and hash page table. Hash page
table needs a 16MB hugepage and we can't do THP with 16MM hugepage and
4K Linux page size.

We add missing functions to 4K hash config to get it to build and
hash__has_transparent_hugepage() makes sure we don't enable THP for 4K
hash config. To catch wrong usage of THP related with 4K config, we add
BUG() in those dummy functions we added to get it compile.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

ab624762

powerpc/mm/radix: Add radix THP callbacks · bde3eb62

Aneesh Kumar K.V authored Apr 29, 2016

The deposited pgtable_t is a pte fragment hence we cannot use page->lru
for linking then together. We use the first two 64 bits for pte fragment
as list_head type to link all deposited fragments together. On withdraw
we properly zero then out.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bde3eb62

powerpc/mm/thp: Abstraction for THP functions · 3df33f12

Aneesh Kumar K.V authored Apr 29, 2016

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

3df33f12

powerpc/mm: THP is only available on hash64 as of now · 6a1ea362

Aneesh Kumar K.V authored Apr 29, 2016

Only code movement in this patch. No functionality change.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

6a1ea362

powerpc/mm/radix: Add hugetlb support 4K page size · c0a6c719

Aneesh Kumar K.V authored Apr 29, 2016

We have hugepage at the pmd level with 4K radix config. Hence we don't
need to use hugepd format with radix.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c0a6c719

powerpc/mm/radix: Make sure swapper pgdir is properly aligned · 43a5c684

Aneesh Kumar K.V authored Apr 29, 2016

With 4K page size radix config our level 1 page table size is 64K and it
should be naturally aligned.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

43a5c684

powerpc/mm: Add radix support for hugetlb · 48483760

Aneesh Kumar K.V authored Apr 29, 2016

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

48483760

powerpc/mm: Fix vma_mmu_pagesize() for radix · 2f5f0dfd

Aneesh Kumar K.V authored Apr 29, 2016

Radix doesn't use the slice framework to find the page size. Hence use
vma to find the page size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

2f5f0dfd

powerpc/mm: pte_frag abstraction · 5ed7ecd0

Aneesh Kumar K.V authored Apr 29, 2016

In this patch we make the number of pte fragments per level 4 page table
page a variable. Radix level 4 table size is 256 bytes and hence we can
have 256 fragments per level 4 page. We don't update the fragment count
in this patch. We need to do performance measurements to find the right
value for fragment count.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

5ed7ecd0

powerpc/radix: Update MMU cache · a3dece6d

Aneesh Kumar K.V authored Apr 29, 2016

With radix there is no MMU cache. Hence we don't need to do anything in
update_mmu_cache().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

a3dece6d

powerpc/mm: vmalloc abstraction in preparation for radix · d6a9996e

Aneesh Kumar K.V authored Apr 29, 2016

The vmalloc range differs between hash and radix config. Hence make
VMALLOC_START and related constants a variable which will be runtime
initialized depending on whether hash or radix mode is active.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix missing init of ioremap_bot in pgtable_64.c for ppc64e]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

d6a9996e

powerpc/mm: Update pte filter for radix · 4dfb88ca

Aneesh Kumar K.V authored Apr 29, 2016

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

4dfb88ca

powerpc/mm: Add radix pgalloc details · a2f41eb9

Aneesh Kumar K.V authored Apr 29, 2016

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

a2f41eb9

powerpc/mm: Make 4K and 64K use pte_t for pgtable_t · 934828ed

Aneesh Kumar K.V authored Apr 29, 2016

This patch switches 4K Linux page size config to use pte_t * type
instead of struct page * for pgtable_t. This simplifies the code a lot
and helps in consolidating both 64K and 4K page allocator routines. The
changes should not have any impact, because we already store physical
address in the upper level page table tree and that implies we already
do struct page * to physical address conversion.

One change to note here is we move the pgtable_page_dtor() call for
nohash to pte_fragment_free_mm(). The nohash related change is due to
the related changes in pgtable_64.c.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

934828ed

powerpc/mm: Rename function to indicate we are allocating fragments · 74701d59

Aneesh Kumar K.V authored Apr 29, 2016

Only code cleanup. No functionality change.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

74701d59

powerpc/mm: Simplify the code dropping 4-level table #ifdef · bcbe7f77

Aneesh Kumar K.V authored Apr 29, 2016

Simplify the code by dropping 4-level page table #ifdef. We are always
4-level now.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bcbe7f77

powerpc/mm: Revert changes made to nohash pgalloc-64.h · 27209206

Aneesh Kumar K.V authored Apr 29, 2016

This reverts pgalloc related changes WRT implementing 4-level page
table for 64K Linux page size and storing of physical address in higher
level page tables since they are only applicable to book3s64 variant
and we now have a separate copy for book3s64. This helps to keep these
headers simpler.

Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

27209206

powerpc/mm: Copy pgalloc (part 2) · 75a9b8a6

Aneesh Kumar K.V authored Apr 29, 2016

This moves the nohash variant of pgalloc headers to nohash/ directory
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

75a9b8a6

powerpc/mm: Make a copy of pgalloc.h for 32 and 64 book3s · 101ad5c6

Aneesh Kumar K.V authored Apr 29, 2016

This patch start to make a book3s variant for pgalloc headers. We have
multiple book3s specific changes such as:
  * 4 level page table
  * store physical address in higher level table
  * use pte_t * for pgtable_t

Having a book3s64 specific variant helps to keep code simpler and remove
lots of #ifdef around code.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

101ad5c6

powerpc/mm/radix: Update PTCR on secondary CPUs · b5dcc609

Aneesh Kumar K.V authored Apr 29, 2016

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b5dcc609

powerpc/mm/radix: Pick the address layout for radix config · 7a0eedee

Aneesh Kumar K.V authored Apr 29, 2016

Hash needs special get_unmapped_area() handling because of limitations
around base page size, so we have to set HAVE_ARCH_UNMAPPED_AREA.

With radix we don't have such restrictions, so we could use the generic
code. But because we've set HAVE_ARCH_UNMAPPED_AREA (for hash), we have
to re-implement the same logic as the generic code.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

7a0eedee