- 19 Jan, 2018 11 commits
-
-
Madhavan Srinivasan authored
Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
"paca->soft_enabled" is used as a flag to mask some of interrupts. Currently supported flags values and their details: soft_enabled MSR[EE] 0 0 Disabled (PMI and HMI not masked) 1 1 Enabled "paca->soft_enabled" is initialized to 1 to make the interripts as enabled. arch_local_irq_disable() will toggle the value when interrupts needs to disbled. At this point, the interrupts are not actually disabled, instead, interrupt vector has code to check for the flag and mask it when it occurs. By "mask it", it update interrupt paca->irq_happened and return. arch_local_irq_restore() is called to re-enable interrupts, which checks and replays interrupts if any occured. Now, as mentioned, current logic doesnot mask "performance monitoring interrupts" and PMIs are implemented as NMI. But this patchset depends on local_irq_* for a successful local_* update. Meaning, mask all possible interrupts during local_* update and replay them after the update. So the idea here is to reserve the "paca->soft_enabled" logic. New values and details: soft_enabled MSR[EE] 1 0 Disabled (PMI and HMI not masked) 0 1 Enabled Reason for the this change is to create foundation for a third mask value "0x2" for "soft_enabled" to add support to mask PMIs. When ->soft_enabled is set to a value "3", PMI interrupts are mask and when set to a value of "1", PMI are not mask. With this patch also extends soft_enabled as interrupt disable mask. Current flags are renamed from IRQ_[EN?DIS}ABLED to IRQS_ENABLED and IRQS_DISABLED. Patch also fixes the ptrace call to force the user to see the softe value to be alway 1. Reason being, even though userspace has no business knowing about softe, it is part of pt_regs. Like-wise in signal context. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
Minor cleanup to use helper function for manipulating paca->soft_enabled variable. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
Add a new wrapper function, soft_enabled_set_return(), added to do the paca->soft_enabled updates requiring a set-return. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
Add a new wrapper function, soft_enabled_return(), added to return paca->soft_enabled value. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
Move set_soft_enabled() from powerpc/kernel/irq.c to asm/hw_irq.c, to encourage updates to paca->soft_enabled done via these access function. Add "memory" clobber to hint compiler since paca->soft_enabled memory is the target here. Renaming it as soft_enabled_set() will make namespaces works better as prefix than a postfix when new soft_enabled manipulation functions are introduced. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
In powerpc/64, the arch_local_irq_disable() function returns unsigned long, which is not consistent with other architectures. Move that set-return asm implementation into arch_local_irq_save(), and make arch_local_irq_disable() return void, simplifying the assembly. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
arch_local_irq_disable is implemented strangely, with a temporary output register being set to the desired soft_enabled value via an immediate input, which is then used to store to memory. This is not required, the immediate can be specified directly as a register input. For simple cases at least, assembly is unchanged except register mapping. Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
We have always had softe in pt_regs, and accessible via PT_SOFTE, even though it is not userspace state. The value userspace sees should always be 1, because we should never be in userspace with interrupts soft disabled. In a subsequent patch we will be changing the semantics of the kernel softe value, so hard wire the value to 1 to retain the existing semantics. As far as we know nothing ever looks at it, but better safe than sorry. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Split out of larger patch, write change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
The recent changes to TLB handling broke the PS3 build: arch/powerpc/include/asm/book3s/64/tlbflush.h:30: undefined reference to `.hash__tlbiel_all' Fix it by adding an fallback version of tlbiel_all() for non-native builds. It should never be called, due to checks in callers so it calls BUG(). We should probably clean it up further but this will suffice for now. Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 18 Jan, 2018 6 commits
-
-
Nicholas Piggin authored
For shared processor guests (e.g., KVM), add an idle polling mode rather than immediately returning to the hypervisor when the guest CPU goes idle. Test setup is a 2 socket POWER9 with 4 guests running, each with vCPUs equal to 1/2 of real of CPUs. Saturated each guest with tbench. Using polling idle gives about 1.4x throughput. Kernel compile speed was not changed significantly. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Since e1689795 ("cpuidle: Add common time keeping and irq enabling"), cpuidle drivers are expected to return from ->enter with irqs disabled. Update the cpuidle-powernv snooze and cede loops to disable irqs before returning. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Since e1689795 ("cpuidle: Add common time keeping and irq enabling"), cpuidle drivers are expected to return from ->enter with irqs disabled. Update the cpuidle-powernv snooze loop to disable irqs before returning. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
powerpc calls irq_exit() with local irqs disabled, therefore it can define __ARCH_IRQ_EXIT_IRQS_DISABLED. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
The powerpc NMI IPIs may not be recoverable if they are taken in some sections of code, and also there have been and still are issues with taking NMIs (in KVM guest code, in firmware, etc) which makes them a bit dangerous to use. Generic code like softlockup detector and rcu stall detectors really hammer on trigger_*_backtrace, which has lead to further problems because we've implemented it with the NMI. So stop providing NMI backtraces for now. Importantly, the powerpc code uses NMI IPIs in crash/debug, and the SMP hardlockup watchdog. So if the softlockup and rcu hang detection traces are not being printed because the CPU is stuck with interrupts off, then the hard lockup watchdog should get it with the NMI IPI. Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Book3S PACA memory allocation is restricted by the RMA limit and also must not take SLB faults when accessed in virtual mode. Currently a fixed 256MB limit is used for this, which is imprecise and sub-optimal. Update the paca allocation limits to use use the ppc64_rma_size for RMA limit, and share the safe_stack_limit() that is currently used for stack allocations that must not take virtual mode faults. The safe_stack_limit() name is changed to ppc64_bolted_size() to match ppc64_rma_size and some comments are updated. We also need to use early_mmu_has_feature() because we are now calling this function prior to the jump label patching that enables mmu_has_feature(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change mmu_has_feature() to early_mmu_has_feature()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 17 Jan, 2018 7 commits
-
-
Nicholas Piggin authored
With the previous patch to switch to 64-bit mode after returning from RTAS and before doing any memory accesses, the RMA limit need not be clamped to 1GB to avoid RTAS bugs. Keep the 1GB limit for older firmware (although this is more of a kernel concern than RTAS), and remove it starting with POWER9. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
With the previous patch to switch to 64-bit mode after returning from RTAS and before doing any memory accesses, the RMA limit need not be clamped to 1GB to avoid RTAS bugs. Keep the 1GB limit for older firmware (although this is more of a kernel concern than RTAS), and remove it starting with POWER9. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Commit 177ba7c6 ("powerpc/mm/radix: Limit paca allocation in radix") limited the paca allocation address to 1G on pSeries because RTAS return accesses the paca in 32-bit mode: On return from RTAS we access the paca variables and we have 64 bit disabled. This requires us to limit paca in 32 bit range. Fix this by setting ppc64_rma_size to first_memblock_size/1G range. Avoid this limit by switching to 64-bit mode before accessing any memory. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
The radix guest is not subject to the paravirtualized HPT VRMA limit, so remove that from ppc64_rma_size calculation for that platform. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
This removes the RMA limit on powernv platform, which constrains early allocations such as PACAs and stacks. There are still other restrictions that must be followed, such as bolted SLB limits, but real mode addressing has no constraints. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
There are several cases outside the normal address space management where a CPU's entire local TLB is to be flushed: 1. Booting the kernel, in case something has left stale entries in the TLB (e.g., kexec). 2. Machine check, to clean corrupted TLB entries. One other place where the TLB is flushed, is waking from deep idle states. The flush is a side-effect of calling ->cpu_restore with the intention of re-setting various SPRs. The flush itself is unnecessary because in the first case, the TLB should not acquire new corrupted TLB entries as part of sleep/wake (though they may be lost). This type of TLB flush is coded inflexibly, several times for each CPU type, and they have a number of problems with ISA v3.0B: - The current radix mode of the MMU is not taken into account, it is always done as a hash flushn For IS=2 (LPID-matching flush from host) and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if the R field does not match the current radix mode. - ISA v3.0B hash must flush the partition and process table caches as well. - ISA v3.0B radix must flush partition and process scoped translations, partition and process table caches, and also the page walk cache. So consolidate the flushing code and implement it in C and inline asm under the mm/ directory with the rest of the flush code. Add ISA v3.0B cases for radix and hash, and use the radix flush in radix environment. Provide a way for IS=2 (LPID flush) to specify the radix mode of the partition. Have KVM pass in the radix mode of the guest. Take out the flushes from early cputable/dt_cpu_ftrs detection hooks, and move it later in the boot process after, the MMU registers are set up and before relocation is first turned on. The TLB flush is no longer called when restoring from deep idle states. This was not be done as a separate step because booting secondaries uses the same cpu_restore as idle restore, which needs the TLB flush. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
The die() oops path contains a serializing lock to prevent oops messages from being interleaved. In the case of a system reset initiated oops (e.g., qemu nmi command), __die was being called which lacks that synchronisation and oops reports could be interleaved across CPUs. A recent patch 4388c9b3 ("powerpc: Do not send system reset request through the oops path") changed this to __die to avoid the debugger() call, but there is no real harm to calling it twice if the first time fell through. So go back to using die() here. This was observed to fix the problem. Fixes: 4388c9b3 ("powerpc: Do not send system reset request through the oops path") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 16 Jan, 2018 16 commits
-
-
Benjamin Herrenschmidt authored
Trap numbers can have extra bits at the bottom that need to be filtered out. There are a few cases where we don't do that. It's possible that we got lucky but better safe than sorry. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
The only difference between EXC_COMMON_HV and EXC_COMMON is that the former adds "2" to the trap number which is supposed to represent the fact that this is an "HV" interrupt which uses HSRR0/1. However KVM is the only one who cares and it has its own separate macros. In fact, we only have one user of EXC_COMMON_HV and it's for an unknown interrupt case. All the other ones already using EXC_COMMON. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
WORD2 if the TIMA isn't byte accessible and isn't that useful to know about, take out the pr_devel statement. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
We used to not put the newline between the CPU part and the summary part on UP kernels. This is a rather pointless ifdef so take it out. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
These adapters can be found in a number of our systems, so let's enable the corresponding drivers by default. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
When CONFIG_SWAP is set, the TLB miss handlers have to also take into account _PAGE_ACCESSED flag. At the moment it is done by anding _PAGE_ACCESSED into _PAGE_PRESENT using 3 instructions. This patch uses APG for handling _PAGE_ACCESSED, allowing to just copy _PAGE_ACCESSED bit into APG field, hence reducing the action to a single instruction. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
As Linux kernel separates KERNEL and USER address spaces, there is therefore no need to flag USER access at page level. Today, the 8xx TLB handlers already handle user access in the L1 entry through Access Protection Groups, it is then natural to move the user access handling at PMD level once _PAGE_NA allows to handle PAGE_NONE protection without _PAGE_USER In the mean time, as we free up one bit in the PTE, we can use it to include SPS (page size flag) in the PTE and avoid handling it at every TLB miss hence removing special handling based on compiled page size. For _PAGE_EXEC, we rework it to use PP PTE bits, avoiding the copy of _PAGE_EXEC bit into the L1 entry. Unfortunatly we are not able to put it at the correct location as it conflicts with NA/RO/RW bits for data entries. Upper bits of APG in L1 entry overlap with PMD base address. In order to avoid having to filter that out, we set up all groups so that upper bits can have any value. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Today, PAGE_NONE is defined as a page not having _PAGE_USER. In some circunstances, when the CPU supports it, it might be better to be able to flag a page with NO ACCESS. In a following patch, the 8xx will switch user access being flagged in the PMD, therefore it will not be possible anymore to use _PAGE_USER as a way to flag a page with no access. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
commit ac29c640 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64 This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing to have either _PAGE_PRIVILEGED or _PAGE_USER or both. PPC_8xx has a _PAGE_SHARED flag which is set for and only for all non user pages. Lets rename it _PAGE_PRIVILEGED to remove confusion as it has nothing to do with Linux shared pages. On BookE, there's a _PAGE_BAP_SR which has to be set for kernel pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make this generic Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
_PAGE_WRITETHRU is only used in: * AMIGA_Z2RAM block driver which is never activated on powerPC * Video/FB driver which is for PPC_PMAC Therefore, no need to spend time in 8xx TLB miss handlers for handling it. And by removing it, we free up bit 20 which then avoids having to clear it on each TLB miss. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
In TLB miss handlers, updating the perf counter is only useful when performing a perf analysis. As it has a noticeable overhead, let's only do it when needed. In order to do so, the exit of the miss handlers will be patched when starting/stopping 'perf': the first register restore instruction of each exit point will be replaced by a jump to the counting code. Once this is done, CONFIG_PPC_8xx_PERF_EVENT becomes useless as this feature doesn't add any overhead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
EXCEPTION_PROLOG_0 and EXCEPTION_EPILOG_0 were added some time ago in order to regroup the two mtspr/mfspr to SCRATCH0 and SCRATCH1 and the mfcr/mtcr in order to ease entry and exit of function not using the full EXCEPTION_PROLOG. Since then, the mfcr/mtcr has been taken out, hence just leaving the two mtspr/mfspr in the macro. In order to improve readability of the exception functions, we remove those two macros and copy back the two mtspr/mfspr instead. As r10 and r11 are used for SCRATCH0 and SCRATCH1, lets also use r12 for SCRATCH2. It will also improve the readability/maintenance. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
CPU6 ERRATA affects only MPC860 revisions prior to C.0. Manufacturing of those revisiosn was stopped in 1999-2000. Therefore, it has been almost 20 years since this ERRATA has been fixed in the silicon. This patch removes the workaround for that ERRATA. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Since commit 0e6e01ff ("CPM/QE: use genalloc to manage CPM/QE muram"), rheap is not used anymore. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Balbir Singh authored
Certain HMI's such as malfunction error propagate through all threads/core on the system. If a thread was offline prior to us crashing the system and jumping to the kdump kernel, bad things happen when it wakes up due to an HMI in the kdump kernel. There are several possible ways to solve this problem 1. Put the offline cores in a state such that they are not woken up for machine check and HMI errors. This does not work, since we might need to wake up offline threads to handle TB errors 2. Ignore HMI errors, setup HMEER to mask HMI errors, but this still leads the window open for any MCEs and masking them for the duration of the dump might be a concern 3. Wake up offline CPUs, as in send them to crash_ipi_callback (not wake them up as in mark them online as seen by the hotplug). kexec does a wake_online_cpus() call, this patch does something similar, but instead sends an IPI and forces them to crash_ipi_callback() This patch takes approach #3. Care is taken to enable this only for powenv platforms via crash_wake_offline (a global value set at setup time). The crash code sends out IPI's to all CPU's which then move to crash_ipi_callback and kexec_smp_wait(). Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-