- 09 Mar, 2016 11 commits
-
-
Christophe Leroy authored
csum_partial is often called for small fixed length packets for which it is suboptimal to use the generic csum_partial() function. For instance, in my configuration, I got: * One place calling it with constant len 4 * Seven places calling it with constant len 8 * Three places calling it with constant len 14 * One place calling it with constant len 20 * One place calling it with constant len 24 * One place calling it with constant len 32 This patch renames csum_partial() to __csum_partial() and implements csum_partial() as a wrapper inline function which * uses csum_add() for small 16bits multiple constant length * uses ip_fast_csum() for other 32bits multiple constant * uses __csum_partial() in all other cases Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Raghav Dogra authored
Modify platform driver suspend/resume to syscore suspend/resume. This is because p1022ds needs to use localbus when entering the PCIE resume. Signed-off-by: Raghav Dogra <raghav.dogra@nxp.com> [scottwood: dropped makefile churn] Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
When CONFIG_DEBUG_PAGEALLOC is activated, the initial TLB mapping gets flushed to track accesses to wrong areas. Therefore, kernel addresses will also generate ITLB misses. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
The MPC885 reference manual says that SDCR shall have value 0x40, but most exemples set SDCR to 0x1 With 0x1 in SDCR, we observe TX underruns on SCC when using it in QMC mode. According the NXP technical support, this is a copy/paste error from MPC860 reference manual, 0x40 being the only value supported by the MPC885 HW. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Bartlomiej Zolnierkiewicz authored
This patch disables deprecated IDE subsystem in mpc8610_hpcd_defconfig (no IDE host drivers are selected in this config so there is no valid reason to enable IDE subsystem itself). Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
Bartlomiej Zolnierkiewicz authored
This patch disables deprecated IDE subsystem in stx_gp3_defconfig (no IDE host drivers are selected in this config so there is no valid reason to enable IDE subsystem itself). Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
Bartlomiej Zolnierkiewicz authored
This patch disables deprecated IDE subsystem in ksi8560_defconfig (no IDE host drivers are selected in this config so there is no valid reason to enable IDE subsystem itself). Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
Bartlomiej Zolnierkiewicz authored
This patch disables deprecated IDE subsystem in mpc834x_itx_defconfig (no IDE host drivers are selected in this config so there is no valid reason to enable IDE subsystem itself). Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
Saurabh Sengar authored
cpm_muram_alloc_common is called twice and both the times spin_lock_irqsave is held. Using GFP_KERNEL can sleep in spin_lock_irqsave context and cause deadlock Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
Saurabh Sengar authored
as cpm_muram_alloc_common is used only in this file, making it static Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
Zhao Qiang authored
127 is the theoretical up boundary of QEIC number, in fact there only be 44 qe_ic_info now. add check to overflow for qe_ic_info Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com> Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
- 05 Mar, 2016 16 commits
-
-
Igal Liberman authored
The FMan contains internal PHY devices used for SGMII connections to external PHYs. When these PHYs are in use a reference is needed for both the external PHY and the internal one. For the external PHY phy-handle provides the reference. For the internal PHY a new handle is required. In dTSEC, the internal PHY is a TBI (Ten Bit Interface) PHY, the handle used will be tbi-handle. In mEMAC, the internal PHY is a PCS (Physical Coding Sublayer) PHY, the handle used will be pcsphy-handle. Signed-off-by: Igal Liberman <igal.liberman@freescale.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
chenhui zhao authored
Support Freescale E6500 core-based platforms, like t4240. Support disabling/enabling individual CPU thread dynamically. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
-
chenhui zhao authored
Freescale E500MC and E5500 core-based platforms, like P4080, T1040, support disabling/enabling CPU dynamically. This patch adds this feature on those platforms. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com> [scottwood: removed unused pr_fmt] Signed-off-by: Scott Wood <oss@buserror.net>
-
chenhui zhao authored
Freescale CoreNet-based and Non-CoreNet-based platforms require different PM operations. This patch extracted existing PM operations on Non-CoreNet-based platforms to a new file which can accommodate both platforms. In this way, PM operation codes are clearer structurally. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
chenhui zhao authored
There is a RCPM (Run Control/Power Management) in Freescale QorIQ series processors. The device performs tasks associated with device run control and power management. The driver implements some features: mask/unmask irq, enter/exit low power states, freeze time base, etc. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com> [scottwood: remove __KERNEL__ ifdef] Signed-off-by: Scott Wood <oss@buserror.net>
-
chenhui zhao authored
Various e500 core have different cache architecture, so they need different cache flush operations. Therefore, add a callback function cpu_flush_caches to the struct cpu_spec. The cache flush operation for the specific kind of e500 is selected at init time. The callback function will flush all caches inside the current cpu. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
chenhui zhao authored
On e6500, in the case of cpu hotplug, either thread in one core may be the first thread initilzing the TLB1. The subsequent threads must not setup it again. The code is derived from the comment of Scott Wood. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Scott Wood <oss@buserror.net>
-
Wang Dongsheng authored
RCPM is the Run Control and Power Management module performs all device-level tasks associated with device run control and power management. Add this for freescale powerpc platform and layerscape platform. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com> Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com> [scottwood: s/pointer/phandle and "disabled" status from example] Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
Simplify csum_add(a, b) in case a or b is constant 0 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
On the 8xx, load latency is 2 cycles and taking branches also takes 2 cycles. So let's unroll the loop. This patch improves csum_partial() speed by around 10% on both: * 8xx (single issue processor with parallel execution) * 83xx (superscalar 6xx processor with dual instruction fetch and parallel execution) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
r5 does contain the value to be updated, so lets use r5 all way long for that. It makes the code more readable. To avoid confusion, it is better to use adde instead of addc The first addition is useless. Its only purpose is to clear carry. As r4 is a signed int that is always positive, this can be done by using srawi instead of srwi Let's also remove the comment about bdnz having no overhead as it is not correct on all powerpc, at least on MPC8xx In the last part, in our situation, the remaining quantity of bytes to be proceeded is between 0 and 3. Therefore, we can base that part on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3 then proceding on comparisons and substractions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
csum_partial_copy_generic() does the same as copy_tofrom_user and also calculates the checksum during the copy. Unlike copy_tofrom_user(), the existing version of csum_partial_copy_generic() doesn't take benefit of the cache. This patch is a rewrite of csum_partial_copy_generic() based on copy_tofrom_user(). The previous version of csum_partial_copy_generic() was handling errors. Now we have the checksum wrapper functions to handle the error case like in powerpc64 so we can make the error case simple: just return -EFAULT. copy_tofrom_user() only has r12 available => we use it for the checksum r7 and r8 which contains pointers to error feedback are used, so we stack them. On a TCP benchmark using socklib on the loopback interface on which checksum offload and scatter/gather have been deactivated, we get about 20% performance increase. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: whitespace and cast fixes] Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
The powerpc64 checksum wrapper functions adds csum_and_copy_to_user() which otherwise is implemented in include/net/checksum.h by using csum_partial() then copy_to_user() Those two wrapper fonctions are also applicable to powerpc32 as it is based on the use of csum_partial_copy_generic() which also exists on powerpc32 This patch renames arch/powerpc/lib/checksum_wrappers_64.c to arch/powerpc/lib/checksum_wrappers.c and makes it non-conditional to CONFIG_WORD_SIZE Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
addc uses carry so xer is clobbered in csum_add() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
Christophe Leroy authored
csum_tcpudp_magic is now an inline function, so there is nothing to export Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
-
- 03 Mar, 2016 8 commits
-
-
Aneesh Kumar K.V authored
No code changes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
No code changes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
We don't need to update linux page table entry with _PAGE_HASHPTE early in hash pte fault. A parallel pte update will loop via _PAGE_BUSY and look at _PAGE_HASHPTE for a required hpte flush only if _PAGE_BUSY is cleared. That ensures a pte update will wait for a parallel hpte insert to finish before looking at _PAGE_HASHPTE bit. To avoid further confusion drop setting _PAGE_HASHPTE in cmpxchg in __hash_page_4K. commit 41743a4e ("powerpc: Free a PTE bit on ppc64 with 64K pages") did similar change for 64K config Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
We are updating pte in those functions. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Kirill A. Shutemov authored
With next generation power processor, we are having a new mmu model [1] that require us to maintain a different linux page table format. Inorder to support both current and future ppc64 systems with a single kernel we need to make sure kernel can select between different page table format at runtime. With the new MMU (radix MMU) added, we will have two different pmd hugepage size 16MB for hash model and 2MB for Radix model. Hence make HPAGE_PMD related values as a variable. Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a followup patch. [1] http://ibm.biz/power-isa3 (Needs registration). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
This is needed so that we can support both hash and radix page table using single kernel. Radix kernel uses a 4 level table. We now use physical address in upper page table tree levels. Even though they are aligned to their size, for the masked bits we use the bit positions as per PowerISA 3.0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
We remove real_pte_t out of STRICT_MM_TYPESCHECK. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
We move the page table accessors into a separate header. We will later add a big endian variant of the table which is needed for radix. No functionality change only code movement. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 02 Mar, 2016 5 commits
-
-
Cyril Bur authored
This patch adds the ability to be able to save the VSX registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch builds on a previous optimisation for the FPU and VEC registers in the thread copy path to avoid a possibly pointless reload of VSX state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Cyril Bur authored
This patch adds the ability to be able to save the VEC registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch builds on a previous optimisation for the FPU registers in the thread copy path to avoid a possibly pointless reload of VEC state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Cyril Bur authored
This patch adds the ability to be able to save the FPU registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch optimises the thread copy path (as a result of a fork() or clone()) so that the parent thread can return to userspace with hot registers avoiding a possibly pointless reload of FPU register state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Cyril Bur authored
This prepares for the decoupling of saving {fpu,altivec,vsx} registers and marking {fpu,altivec,vsx} as being unused by a thread. Currently giveup_{fpu,altivec,vsx}() does both however optimisations to task switching can be made if these two operations are decoupled. save_all() will permit the saving of registers to thread structs and leave threads MSR with bits enabled. This patch introduces no functional change. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Cyril Bur authored
Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a problem unless a process is using these facilities. Modern versions of GCC are very good at automatically vectorising code, new and modernised workloads make use of floating point and vector facilities, even the kernel makes use of vectorised memcpy. All this combined greatly increases the cost of a syscall since the kernel uses the facilities sometimes even in syscall fast-path making it increasingly common for a thread to take an *_unavailable exception soon after a syscall, not to mention potentially taking all three. The obvious overcompensation to this problem is to simply always load all the facilities on every exit to userspace. Loading up all FPU, VEC and VSX registers every time can be expensive and if a workload does avoid using them, it should not be forced to incur this penalty. An 8bit counter is used to detect if the registers have been used in the past and the registers are always loaded until the value wraps to back to zero. Several versions of the assembly in entry_64.S were tested: 1. Always calling C. 2. Performing a common case check and then calling C. 3. A complex check in asm. After some benchmarking it was determined that avoiding C in the common case is a performance benefit (option 2). The full check in asm (option 3) greatly complicated that codepath for a negligible performance gain and the trade-off was deemed not worth it. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Move load_vec in the struct to fill an existing hole, reword change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> fixup
-