Commits · 3084cdb7cd6a1609d0a4480291f5e4da80765d03 · nexedi / linux

11 Mar, 2016 6 commits

powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together · 3084cdb7

Christophe Leroy authored Feb 09, 2016

x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
purpose, and are never defined at the same time.
So rename them x_block_mapped() and define them in the relevant
places
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

3084cdb7

powerpc32: Fix pte_offset_kernel() to return NULL for bad pages · be00ed72

Christophe Leroy authored Feb 09, 2016

The fixmap related functions try to map kernel pages that are
already mapped through Large TLBs. pte_offset_kernel() has to
return NULL for LTLBs, otherwise the caller will try to access
level 2 table which doesn't exist
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

be00ed72

powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c · 516d9189

Christophe Leroy authored Feb 09, 2016

Now we have a 8xx specific .c file for that so put it in there
as other powerpc variants do
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

516d9189

powerpc: Update documentation for noltlbs kernel parameter · f15eea66

Christophe Leroy authored Feb 09, 2016

Now the noltlbs kernel parameter is also applicable to PPC8xx
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

f15eea66

powerpc/8xx: Map linear kernel RAM with 8M pages · a372acfa

Christophe Leroy authored Feb 09, 2016

On a live running system (VoIP gateway for Air Trafic Control), over
a 10 minutes period (with 277s idle), we get 87 millions DTLB misses
and approximatly 35 secondes are spent in DTLB handler.
This represents 5.8% of the overall time and even 10.8% of the
non-idle time.
Among those 87 millions DTLB misses, 15% are on user addresses and
85% are on kernel addresses. And within the kernel addresses, 93%
are on addresses from the linear address space and only 7% are on
addresses from the virtual address space.

MPC8xx has no BATs but it has 8Mb page size. This patch implements
mapping of kernel RAM using 8Mb pages, on the same model as what is
done on the 40x.

In 4k pages mode, each PGD entry maps a 4Mb area: we map every two
entries to the same 8Mb physical page. In each second entry, we add
4Mb to the page physical address to ease life of the FixupDAR
routine. This is just ignored by HW.

In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry
will point to the first page of the area. The DTLB handler adds
the 3 bits from EPN to map the correct page.

With this patch applied, we now get only 13 millions TLB misses
during the 10 minutes period. The idle time has increased to 313s
and the overall time spent in DTLB miss handler is 6.3s, which
represents 1% of the overall time and 2.2% of non-idle time.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

a372acfa

powerpc/8xx: Save r3 all the time in DTLB miss handler · 913a6b3d

Christophe Leroy authored Feb 09, 2016

We are spending between 40 and 160 cycles with a mean of 65 cycles in
the DTLB handling routine (measured with mftbl) so make it more
simple althought it adds one instruction.
With this modification, we get three registers available at all time,
which will help with following patch.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

913a6b3d

09 Mar, 2016 12 commits

powerpc/p5040: Add device node for RAID Engine · 3b5eb41b

Xuelin Shi authored Feb 25, 2016

add the missing RAID Engine device node for p5040.
otherwise, the device can not be detected.
Signed-off-by: Xuelin Shi <xuelin.shi@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>

3b5eb41b

powerpc: optimise csum_partial() call when len is constant · 7e393220

Christophe Leroy authored Mar 07, 2016

csum_partial is often called for small fixed length packets
for which it is suboptimal to use the generic csum_partial()
function.

For instance, in my configuration, I got:
* One place calling it with constant len 4
* Seven places calling it with constant len 8
* Three places calling it with constant len 14
* One place calling it with constant len 20
* One place calling it with constant len 24
* One place calling it with constant len 32

This patch renames csum_partial() to __csum_partial() and
implements csum_partial() as a wrapper inline function which
* uses csum_add() for small 16bits multiple constant length
* uses ip_fast_csum() for other 32bits multiple constant
* uses __csum_partial() in all other cases
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

7e393220

powerpc/fsl-lbc: Modify suspend/resume entry sequence · ac6082dd

Raghav Dogra authored Feb 09, 2016

Modify platform driver suspend/resume to syscore
suspend/resume. This is because p1022ds needs to use
localbus when entering the PCIE resume.
Signed-off-by: Raghav Dogra <raghav.dogra@nxp.com>
[scottwood: dropped makefile churn]
Signed-off-by: Scott Wood <oss@buserror.net>

ac6082dd

powerpc/8xx: CONFIG_DEBUG_PAGEALLOC requires ITLBmiss for kernel addresses · 921fff35

Christophe Leroy authored Feb 03, 2016

When CONFIG_DEBUG_PAGEALLOC is activated, the initial TLB mapping gets
flushed to track accesses to wrong areas. Therefore, kernel addresses
will also generate ITLB misses.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

921fff35

powerpc/885: set SDCR to 0x40 · 501ea766

Christophe Leroy authored Feb 04, 2016

The MPC885 reference manual says that SDCR shall have value 0x40, but
most exemples set SDCR to 0x1
With 0x1 in SDCR, we observe TX underruns on SCC when using it in
QMC mode.
According the NXP technical support, this is a copy/paste error from
MPC860 reference manual, 0x40 being the only value supported
by the MPC885 HW.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

501ea766

powerpc/86xx: disable IDE subsystem in mpc8610_hpcd_defconfig · b278268b

Bartlomiej Zolnierkiewicz authored Feb 03, 2016

This patch disables deprecated IDE subsystem in mpc8610_hpcd_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>

b278268b

powerpc/85xx: disable IDE subsystem in stx_gp3_defconfig · 2fa1d230

Bartlomiej Zolnierkiewicz authored Feb 03, 2016

This patch disables deprecated IDE subsystem in stx_gp3_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).

Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>

2fa1d230

powerpc/85xx: disable IDE subsystem in ksi8560_defconfig · 451bc2e9

Bartlomiej Zolnierkiewicz authored Feb 03, 2016

This patch disables deprecated IDE subsystem in ksi8560_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).

Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>

451bc2e9

powerpc/83xx: disable IDE subsystem in mpc834x_itx_defconfig · ba1353ee

Bartlomiej Zolnierkiewicz authored Feb 03, 2016

This patch disables deprecated IDE subsystem in mpc834x_itx_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).

Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>

ba1353ee

qe: Use GFP_ATOMIC while spin_lock_irqsave is held · 66923a60

Saurabh Sengar authored Jan 24, 2016

cpm_muram_alloc_common is called twice and both the times
spin_lock_irqsave is held.
Using GFP_KERNEL can sleep in spin_lock_irqsave context and cause
deadlock
Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>

66923a60

qe: Make cpm_muram_alloc_common static · 713df30b

Saurabh Sengar authored Jan 26, 2016

as cpm_muram_alloc_common is used only in this file,
making it static
Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>

713df30b

qe/ic: fix a buffer overflow error and add check elsewhere · c9ee69c5

Zhao Qiang authored Jan 21, 2016

127 is the theoretical up boundary of QEIC number,
in fact there only be 44 qe_ic_info now.
add check to overflow for qe_ic_info
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>

c9ee69c5

05 Mar, 2016 16 commits

powerpc/fsl: Update fman dt binding with pcs-phy and tbi-phy · ea6370d2

Igal Liberman authored Dec 24, 2015

The FMan contains internal PHY devices used for SGMII connections
to external PHYs. When these PHYs are in use a reference is needed
for both the external PHY and the internal one. For the external
PHY phy-handle provides the reference. For the internal PHY a new
handle is required.
In dTSEC, the internal PHY is a TBI (Ten Bit Interface) PHY,
the handle used will be tbi-handle.
In mEMAC, the internal PHY is a PCS (Physical Coding Sublayer) PHY,
the handle used will be pcsphy-handle.
Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>

ea6370d2

powerpc/mpc85xx: Add CPU hotplug support for E6500 · 6becef7e

chenhui zhao authored Nov 20, 2015

Support Freescale E6500 core-based platforms, like t4240.
Support disabling/enabling individual CPU thread dynamically.
Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>

6becef7e

powerpc/mpc85xx: Add hotplug support on E5500 and E500MC cores · 2f4f1f81

chenhui zhao authored Nov 20, 2015

Freescale E500MC and E5500 core-based platforms, like P4080, T1040,
support disabling/enabling CPU dynamically.
This patch adds this feature on those platforms.
Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com>
[scottwood: removed unused pr_fmt]
Signed-off-by: Scott Wood <oss@buserror.net>

2f4f1f81

powerpc/mpc85xx: refactor the PM operations · 56f1ba28

chenhui zhao authored Nov 20, 2015

Freescale CoreNet-based and Non-CoreNet-based platforms require
different PM operations. This patch extracted existing PM operations
on Non-CoreNet-based platforms to a new file which can accommodate
both platforms. In this way, PM operation codes are clearer structurally.
Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>

56f1ba28

powerpc/rcpm: add RCPM driver · d17799f9

chenhui zhao authored Nov 20, 2015

There is a RCPM (Run Control/Power Management) in Freescale QorIQ
series processors. The device performs tasks associated with device
run control and power management.

The driver implements some features: mask/unmask irq, enter/exit low
power states, freeze time base, etc.
Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
[scottwood: remove __KERNEL__ ifdef]
Signed-off-by: Scott Wood <oss@buserror.net>

d17799f9

powerpc/cache: add cache flush operation for various e500 · e7affb1d

chenhui zhao authored Nov 20, 2015

Various e500 core have different cache architecture, so they
need different cache flush operations. Therefore, add a callback
function cpu_flush_caches to the struct cpu_spec. The cache flush
operation for the specific kind of e500 is selected at init time.
The callback function will flush all caches inside the current cpu.
Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>

e7affb1d

powerpc/mm: any thread in one core can be the first to setup TLB1 · ebb9d30a

chenhui zhao authored Dec 24, 2015

On e6500, in the case of cpu hotplug, either thread in one core
may be the first thread initilzing the TLB1. The subsequent threads
must not setup it again.

The code is derived from the comment of Scott Wood.
Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>

ebb9d30a

Documentation: dt: binding: fsl: add devicetree binding for describing RCPM · d64716ca

Wang Dongsheng authored Oct 26, 2015

RCPM is the Run Control and Power Management module performs all
device-level tasks associated with device run control and power
management.

Add this for freescale powerpc platform and layerscape platform.
Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
[scottwood: s/pointer/phandle and "disabled" status from example]
Signed-off-by: Scott Wood <oss@buserror.net>

d64716ca

powerpc: simplify csum_add(a, b) in case a or b is constant 0 · 5a8847c8

Christophe Leroy authored Sep 22, 2015

Simplify csum_add(a, b) in case a or b is constant 0
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

5a8847c8

powerpc32: optimise csum_partial() loop · f867d556

Christophe Leroy authored Sep 22, 2015

On the 8xx, load latency is 2 cycles and taking branches also takes
2 cycles. So let's unroll the loop.

This patch improves csum_partial() speed by around 10% on both:
* 8xx (single issue processor with parallel execution)
* 83xx (superscalar 6xx processor with dual instruction fetch
and parallel execution)
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

f867d556

powerpc32: optimise a few instructions in csum_partial() · 48821a34

Christophe Leroy authored Sep 22, 2015

r5 does contain the value to be updated, so lets use r5 all way long
for that. It makes the code more readable.

To avoid confusion, it is better to use adde instead of addc

The first addition is useless. Its only purpose is to clear carry.
As r4 is a signed int that is always positive, this can be done by
using srawi instead of srwi

Let's also remove the comment about bdnz having no overhead as it
is not correct on all powerpc, at least on MPC8xx

In the last part, in our situation, the remaining quantity of bytes
to be proceeded is between 0 and 3. Therefore, we can base that part
on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
then proceding on comparisons and substractions.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

48821a34

powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user() · 7aef4136

Christophe Leroy authored Sep 22, 2015

csum_partial_copy_generic() does the same as copy_tofrom_user and also
calculates the checksum during the copy. Unlike copy_tofrom_user(),
the existing version of csum_partial_copy_generic() doesn't take
benefit of the cache.

This patch is a rewrite of csum_partial_copy_generic() based on
copy_tofrom_user().
The previous version of csum_partial_copy_generic() was handling
errors. Now we have the checksum wrapper functions to handle the error
case like in powerpc64 so we can make the error case simple:
just return -EFAULT.
copy_tofrom_user() only has r12 available => we use it for the
checksum r7 and r8 which contains pointers to error feedback are used,
so we stack them.

On a TCP benchmark using socklib on the loopback interface on which
checksum offload and scatter/gather have been deactivated, we get
about 20% performance increase.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

7aef4136

powerpc: inline ip_fast_csum() · 37e08cad

Christophe Leroy authored Sep 22, 2015

In several architectures, ip_fast_csum() is inlined
There are functions like ip_send_check() which do nothing
much more than calling ip_fast_csum().
Inlining ip_fast_csum() allows the compiler to optimise better
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood: whitespace and cast fixes]
Signed-off-by: Scott Wood <oss@buserror.net>

37e08cad

powerpc32: checksum_wrappers_64 becomes checksum_wrappers · 03bc8b0f

Christophe Leroy authored Sep 22, 2015

The powerpc64 checksum wrapper functions adds csum_and_copy_to_user()
which otherwise is implemented in include/net/checksum.h by using
csum_partial() then copy_to_user()

Those two wrapper fonctions are also applicable to powerpc32 as it is
based on the use of csum_partial_copy_generic() which also
exists on powerpc32

This patch renames arch/powerpc/lib/checksum_wrappers_64.c to
arch/powerpc/lib/checksum_wrappers.c and
makes it non-conditional to CONFIG_WORD_SIZE
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

03bc8b0f

powerpc: mark xer clobbered in csum_add() · 11dfbf58

Christophe Leroy authored Sep 22, 2015

addc uses carry so xer is clobbered in csum_add()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

11dfbf58

powerpc: unexport csum_tcpudp_magic · e0f82bdf

Christophe Leroy authored Sep 22, 2015

csum_tcpudp_magic is now an inline function, so there is
nothing to export
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

e0f82bdf

03 Mar, 2016 6 commits

powerpc/mm: Move hash64 tlbflush code into a new header · ee3b93eb

Aneesh Kumar K.V authored Mar 01, 2016

No code changes.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

ee3b93eb

powerpc/mm: Move hash related mmu-*.h headers to book3s/ · f64e8084

Aneesh Kumar K.V authored Mar 01, 2016

No code changes.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

f64e8084

powerpc/mm: add _PAGE_HASHPTE similar to 4K hash · c367a441

Aneesh Kumar K.V authored Mar 01, 2016

We don't need to update linux page table entry with _PAGE_HASHPTE early
in hash pte fault. A parallel pte update will loop via _PAGE_BUSY
and look at _PAGE_HASHPTE for a required hpte flush only if
_PAGE_BUSY is cleared. That ensures a pte update will wait for a
parallel hpte insert to finish before looking at _PAGE_HASHPTE bit.

To avoid further confusion drop setting _PAGE_HASHPTE in cmpxchg in __hash_page_4K.

commit 41743a4e ("powerpc: Free a PTE bit on ppc64 with 64K pages")
did similar change for 64K config
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c367a441

powerp/mm: Update code comments · e9a68147

Aneesh Kumar K.V authored Mar 01, 2016

We are updating pte in those functions.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e9a68147

mm: Some arch may want to use HPAGE_PMD related values as variables · ff20c2e0

Kirill A. Shutemov authored Mar 01, 2016

With next generation power processor, we are having a new mmu model
[1] that require us to maintain a different linux page table format.

Inorder to support both current and future ppc64 systems with a single
kernel we need to make sure kernel can select between different page
table format at runtime. With the new MMU (radix MMU) added, we will
have two different pmd hugepage size 16MB for hash model and 2MB for
Radix model. Hence make HPAGE_PMD related values as a variable.

Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a
followup patch.

[1] http://ibm.biz/power-isa3 (Needs registration).
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

ff20c2e0

powerpc/mm: Switch book3s 64 with 64K page size to 4 level page table · 368ced78

Aneesh Kumar K.V authored Mar 01, 2016

This is needed so that we can support both hash and radix page table
using single kernel. Radix kernel uses a 4 level table.

We now use physical address in upper page table tree levels. Even though
they are aligned to their size, for the masked bits we use the
bit positions as per PowerISA 3.0.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

368ced78