- 04 Jul, 2019 19 commits
-
-
Aneesh Kumar K.V authored
If we boot with numa=off, we need to make sure we return NUMA_NO_NODE when looking up associativity details of resources. Without this, we hit crash like below BUG: Unable to handle kernel data access at 0x40000000008 Faulting instruction address: 0xc000000008f31704 cpu 0x1b: Vector: 380 (Data SLB Access) at [c00000000b9bb320] pc: c000000008f31704: _raw_spin_lock+0x14/0x100 lr: c0000000083f41fc: ____cache_alloc_node+0x5c/0x290 sp: c00000000b9bb5b0 msr: 800000010280b033 dar: 40000000008 current = 0xc00000000b9a2700 paca = 0xc00000000a740c00 irqmask: 0x03 irq_happened: 0x01 pid = 1, comm = swapper/27 Linux version 5.2.0-rc4-00925-g74e188c620b1 (root@linux-d8ip) (gcc version 7.4.1 20190424 [gcc-7-branch revision 270538] (SUSE Linux)) #34 SMP Sat Jun 29 00:41:02 EDT 2019 enter ? for help [link register ] c0000000083f41fc ____cache_alloc_node+0x5c/0x290 [c00000000b9bb5b0] 0000000000000dc0 (unreliable) [c00000000b9bb5f0] c0000000083f48c8 kmem_cache_alloc_node_trace+0x138/0x360 [c00000000b9bb670] c000000008aa789c devres_alloc_node+0x4c/0xa0 [c00000000b9bb6a0] c000000008337218 devm_memremap+0x58/0x130 [c00000000b9bb6f0] c000000008aed00c devm_nsio_enable+0xdc/0x170 [c00000000b9bb780] c000000008af3b6c nd_pmem_probe+0x4c/0x180 [c00000000b9bb7b0] c000000008ad84cc nvdimm_bus_probe+0xac/0x260 [c00000000b9bb840] c000000008aa0628 really_probe+0x148/0x500 [c00000000b9bb8d0] c000000008aa0d7c driver_probe_device+0x19c/0x1d0 [c00000000b9bb950] c000000008aa11bc device_driver_attach+0xcc/0x100 [c00000000b9bb990] c000000008aa12ec __driver_attach+0xfc/0x1e0 [c00000000b9bba10] c000000008a9d0a4 bus_for_each_dev+0xb4/0x130 [c00000000b9bba70] c000000008a9fc04 driver_attach+0x34/0x50 [c00000000b9bba90] c000000008a9f118 bus_add_driver+0x1d8/0x300 [c00000000b9bbb20] c000000008aa2358 driver_register+0x98/0x1a0 [c00000000b9bbb90] c000000008ad7e6c __nd_driver_register+0x5c/0x100 [c00000000b9bbbf0] c0000000093efbac nd_pmem_driver_init+0x34/0x48 [c00000000b9bbc10] c0000000080106c0 do_one_initcall+0x60/0x2d0 [c00000000b9bbce0] c00000000938463c kernel_init_freeable+0x384/0x48c [c00000000b9bbdb0] c000000008010a5c kernel_init+0x2c/0x160 [c00000000b9bbe20] c00000000800ba54 ret_from_kernel_thread+0x5c/0x68 Reported-and-debugged-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
If we fail to parse the associativity array we should default to NUMA_NO_NODE instead of NODE 0. Rest of the code fallback to the right default if we find the numa node value NUMA_NO_NODE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
We use mmu_vmemmap_psize to find the page size for mapping the vmmemap area. With radix translation, we are suboptimally setting this value to PAGE_SIZE. We do check for 2M page size support and update mmu_vmemap_psize to use hugepage size but we suboptimally reset the value to PAGE_SIZE in radix__early_init_mmu(). This resulted in always mapping vmemmap area with 64K page size. Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
With hash translation and 4K PAGE_SIZE config, we need to make sure we don't use 64K page size for vmemmap. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Since commit 0034d395 ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") __kernel_virt_size is not used anymore. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
Add a document describing the fields provided by /proc/powerpc/vcpudispatch_stats. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
When enabling or disabling the vcpu dispatch statistics, we do a lot of work including allocating/deallocating memory across all possible cpus for the DTL buffer. In order to guard against hogging the cpu for too long, track the time we're taking and yield the processor if necessary. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
For Shared Processor LPARs, the POWER Hypervisor maintains a relatively static mapping of the LPAR processors (vcpus) to physical processor chips (representing the "home" node) and tries to always dispatch vcpus on their associated physical processor chip. However, under certain scenarios, vcpus may be dispatched on a different processor chip (away from its home node). The actual physical processor number on which a certain vcpu is dispatched is available to the guest in the 'processor_id' field of each DTL entry. The guest can discover the home node of each vcpu through the H_HOME_NODE_ASSOCIATIVITY(flags=1) hcall. The guest can also discover the associativity of physical processors, as represented in the DTL entry, through the H_HOME_NODE_ASSOCIATIVITY(flags=2) hcall. These can then be compared to determine if the vcpu was dispatched on its home node or not. If the vcpu was not dispatched on the home node, it is possible to determine if the vcpu was dispatched in a different chip, socket or drawer. Introduce a procfs file /proc/powerpc/vcpudispatch_stats that can be used to obtain these statistics. Writing '1' to this file enables collecting the statistics, while writing '0' disables the statistics. The statistics themselves are available by reading the procfs file. By default, the DTLB log for each vcpu is processed 50 times a second so as not to miss any entries. This processing frequency can be changed through /proc/powerpc/vcpudispatch_stats_freq. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
hcall_vphn() is specific to pseries and will be used in a subsequent patch. So, move it to a more appropriate place under arch/powerpc/platforms/pseries. Also merge vphn.h into lppaca.h and update vphn selftest to use the new files. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
H_HOME_NODE_ASSOCIATIVITY hcall can take two different flags and return different associativity information in each case. Generalize the existing hcall_vphn() function to take flags as an argument and to return the result. Update the only existing user to pass the proper arguments. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
Since we would be introducing a new user of the DTL buffer in a subsequent patch, we need a way to gatekeep use of the DTL buffer. The current debugfs interface for DTL allows registering and opening cpu-specific DTL buffers. Cpu specific files are exposed under debugfs 'powerpc/dtl/' node, and changing 'dtl_event_mask' in the same directory enables controlling the event mask used when registering DTL buffer for a particular cpu. Subsequently, we will be introducing a user of the DTL buffers that registers access to the DTL buffers across all cpus with the same event mask. To ensure these two users do not step on each other, we introduce a rwlock to gatekeep DTL buffer access. This fits the requirement of the current debugfs interface wanting to allow multiple independent cpu-specific users (read lock), and the subsequent user wanting exclusive access (write lock). Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
Introduce new helpers for DTL buffer allocation and registration and have the existing code use those. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Don't split error messages across lines, for grepability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is enabled, we always initialize DTL enable mask to DTL_LOG_PREEMPT (0x2). There are no other places where the mask is changed. As such, when reading the DTL log buffer through debugfs, there is no need to save and restore the previous mask value. We don't need to save and restore the earlier mask value if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not enabled. So, remove the field from the structure as well. Acked-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
Introduce macros to encode the DTL enable mask fields and use those instead of hardcoding numbers. Acked-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Satheesh Rajendran authored
Enable CONFIG_IPV6 in ppc64_defconfig to enable certain network functionalities required for tests. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Geliang Tang authored
In spufs_cntl_fops, since we use nonseekable_open() to open, we should use no_llseek() to seek, not generic_file_llseek(). Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Geliang Tang authored
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Anton Blanchard authored
latencytop adds almost 4kB to each and every task struct and as such it doesn't deserve to be in our defconfigs. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Enrico Weigelt, metux IT consult authored
Formatting of Kconfig files doesn't look so pretty, so let the Great White Handkerchief come around and clean it up. Also convert "---help---" as requested. Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 03 Jul, 2019 21 commits
-
-
Masahiro Yamada authored
With CONFIG_OPTIMIZE_INLINING enabled, Laura Abbott reported error with gcc 9.1.1: arch/powerpc/mm/book3s64/radix_tlb.c: In function '_tlbiel_pid': arch/powerpc/mm/book3s64/radix_tlb.c:104:2: warning: asm operand 3 probably doesn't match constraints 104 | asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1) | ^~~ arch/powerpc/mm/book3s64/radix_tlb.c:104:2: error: impossible constraint in 'asm' Fixing _tlbiel_pid() is enough to address the warning above, but I inlined more functions to fix all potential issues. To meet the "i" (immediate) constraint for the asm operands, functions propagating "ric" must be always inlined. Fixes: 9012d011 ("compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING") Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nishad Kamdar authored
This patch corrects the SPDX License Identifier style in the powerpc Hardware Architecture related files. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Stewart Smith authored
If the previous comment made sense, continue debugging or call your doctor immediately. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Krzysztof Kozlowski authored
Remove the CONFIG_UEVENT_HELPER_PATH because: 1. It is disabled since commit 1be01d4a ("driver: base: Disable CONFIG_UEVENT_HELPER by default") as its dependency (UEVENT_HELPER) was made default to 'n', 2. It is not recommended (help message: "This should not be used today [...] creates a high system load") and was kept only for ancient userland, 3. Certain userland specifically requests it to be disabled (systemd README: "Legacy hotplug slows down the system and confuses udev"). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christian Lamparter authored
When testing out gpio-keys with a button, a spurious interrupt (and therefore a key press or release event) gets triggered as soon as the driver enables the irq line for the first time. This patch clears any potential bogus generated interrupt that was caused by the switching of the associated irq's type and polarity. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Geert Uytterhoeven authored
"git diff" says: \ No newline at end of file after modifying the file. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> [mpe: Rebase since addition of another test] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Suraj Jitindar Singh authored
Commit ddf35cf3 ("powerpc: Use barrier_nospec in copy_from_user()") Added barrier_nospec before loading from user-controlled pointers. The intention was to order the load from the potentially user-controlled pointer vs a previous branch based on an access_ok() check or similar. In order to achieve the same result, add a barrier_nospec to the raw_copy_in_user() function before loading from such a user-controlled pointer. Fixes: ddf35cf3 ("powerpc: Use barrier_nospec in copy_from_user()") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Neuling authored
When emulating tsr, treclaim and trechkpt, we incorrectly set CR0. The code currently sets: CR0 <- 00 || MSR[TS] but according to the ISA it should be: CR0 <- 0 || MSR[TS] || 0 This fixes the bit shift to put the bits in the correct location. This is a data integrity issue as CR0 is corrupted. Fixes: 4bb3c7a0 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9") Cc: stable@vger.kernel.org # v4.17+ Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alexey Kardashevskiy authored
The powernv platform uses @dma_iommu_ops for non-bypass DMA. These ops need an iommu_table pointer which is stored in dev->archdata.iommu_table_base. It is initialized during pcibios_setup_device() which handles boot time devices. However when a device is taken from the system in order to pass it through, the default IOMMU table is destroyed but the pointer in a device is not updated; also when a device is returned back to the system, a new table pointer is not stored in dev->archdata.iommu_table_base either. So when a just returned device tries using IOMMU, it crashes on accessing stale iommu_table or its members. This calls set_iommu_table_base() when the default window is created. Note it used to be there before but was wrongly removed (see "fixes"). It did not appear before as these days most devices simply use bypass. This adds set_iommu_table_base(NULL) when a device is taken from the system to make it clear that IOMMU DMA cannot be used past that point. Fixes: c4e9d3c1 ("powerpc/powernv/pseries: Rework device adding to IOMMU groups") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alexey Kardashevskiy authored
The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing which reads "assigned-addresses" of every PCI device and initializes the device resources. However if the property is missing or zero sized, then there is no fallback of any kind and the PCI resources remain undiscovered, i.e. pdev->resource[] array remains empty. This adds a fallback which parses the "reg" property in pretty much same way except it marks resources as "unset" which later make Linux assign those resources proper addresses. This has an effect when: 1. a hypervisor failed to assign any resource for a device; 2. /chosen/linux,pci-probe-only=0 is in the DT so the system may try assigning a resource. Neither is likely to happen under PowerVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alexey Kardashevskiy authored
So far the pseries platforms has always been using IOMMU making SWIOTLB unnecessary. Now we want secure guests which means devices can only access certain areas of guest physical memory; we are going to use SWIOTLB for this purpose. This allows SWIOTLB for pseries. By default there is no change in behavior. This enables SWIOTLB when the "swiotlb" kernel parameter is set to "force". With the SWIOTLB enabled, the kernel creates a directly mapped DMA window (using the usual DDW mechanism) and implements SWIOTLB on top of that. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alexey Kardashevskiy authored
The commit 8617a5c5 ("powerpc/dma: handle iommu bypass in dma_iommu_ops") merged direct DMA ops into the IOMMU DMA ops allowing SWIOTLB as well but only for mapping; the unmapping and bouncing parts were left unmodified. This adds missing direct unmapping calls to .unmap_page() and .unmap_sg(). This adds missing sync callbacks and directs them to the direct DMA hooks. Fixes: 8617a5c5 ("powerpc/dma: handle iommu bypass in dma_iommu_ops") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Use the dma_get_mask() helper from dma-mapping.h instead, as they are functionally identical. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Neuling authored
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail at linking with: arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable' This was caused by commit c1fe190c ("powerpc: Add force enable of DAWR on P9 option"). This moves a bunch of code around to fix this. It moves a lot of the DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable compiling it. Fixes: c1fe190c ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Neuling <mikey@neuling.org> [mpe: Minor formatting in set_dawr()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Mathieu Malaterre authored
In commit c1fe190c ("powerpc: Add force enable of DAWR on P9 option") the following piece of code was added: smp_call_function((smp_call_func_t)set_dawr, &null_brk, 0); Since GCC 8 this triggers the following warning about incompatible function types: arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void *)' [-Werror=cast-function-type] Since the warning is there for a reason, and should not be hidden behind a cast, provide an intermediate callback function to avoid the warning. Fixes: c1fe190c ("powerpc: Add force enable of DAWR on P9 option") Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
ISA v3.0 radix modes provide SLBIA variants which can invalidate ERAT for effPID!=0 or for effLPID!=0, which allows user and guest invalidations to retain kernel/host ERAT entries. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
This makes it clear to the caller that it can only be used on POWER9 and later CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use "ISA_3_0" rather than "ARCH_300"] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Branch to the relocated 0xc000 address early (still in real mode), to simplify subsequent branches. Have the virt mode handler avoid just 'windup' and redo the exception from scratch, rather than branching back to the trampoline. Rearrange the stack setup instruction location to match the system reset handler (e.g., right before EXCEPTION_PROLOG_COMMON). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Follow convention and move tramp ahead of common. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
The idle wake up code in the system reset interrupt is not very optimal. There are two requirements: perform idle wake up quickly; and save everything including CFAR for non-idle interrupts, with no performance requirement. The problem with placing the idle test in the middle of the handler and using the normal handler code to save CFAR, is that it's quite costly (e.g., mfcfar is serialising, speculative workarounds get applied, SRR1 has to be reloaded, etc). It also prevents the standard interrupt handler boilerplate being used. This pain can be avoided by using a dedicated idle interrupt handler at the start of the interrupt handler, which restores all registers back to the way they were in case it was not an idle wake up. CFAR is preserved without saving it before the non-idle case by making that the fall-through, and idle is a taken branch. Performance seems to be in the noise, but possibly around 0.5% faster, the executed instructions certainly look better. The bigger benefit is being able to drop in standard interrupt handlers after the idle code, which helps with subsequent cleanup and consolidation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup BE by using DOTSYM for idle_return_gpr_loss call] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-