- 13 Sep, 2019 36 commits
-
-
Hari Bathini authored
Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures that crash data, from previously crash'ed kernel, is preserved. This helps in cases where FADump is not enabled but the subsequent memory preserving kernel boot is likely to process this crash data. One typical usecase for this config option is petitboot kernel. As OPAL allows registering address with it in the first kernel and retrieving it after MPIPL, use it to store the top of boot memory. A kernel that intends to preserve crash data retrieves it and avoids using memory beyond this address. Move arch_reserved_kernel_pages() function as it is needed for both FA_DUMP and PRESERVE_FA_DUMP configurations. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821375751.5656.11459483669542541602.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
The size parameter to fadump_reserve_crash_area() function is not needed as all the memory above boot memory size must be preserved anyway. Update the function by dropping this redundant parameter. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821374440.5656.2945512543806951766.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Commit 0962e800 ("powerpc/prom: Scan reserved-ranges node for memory reservations") enabled support to parse 'reserved-ranges' DT node to reserve kernel memory falling in these ranges for firmware purposes. Along with the preserved area memory, ensure memory in reserved ranges is not overlapped with memory released by capture kernel aftering saving vmcore. Also, fix the off-by-one error in fadump_release_reserved_area function while releasing memory. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821371358.5656.6061214942558818661.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Make allocate_crash_memory_ranges() and free_crash_memory_ranges() functions generic to reuse them for memory management of all types of dynamic memory range arrays. This change helps in memory management of reserved ranges array to be added later. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821369863.5656.4375667005352155892.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Firmware provides architected register state data at the time of crash. Process this data and build CPU notes to append to ELF core. In case this data is missing or in unsupported format, at least append crashing CPU's register data, to have something to work with in the vmcore file. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821367702.5656.5546683836236508389.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Earlier, memblock_find_in_range() was not used to find the memory to be reserved for FADump as bottom up allocation mode was not supported. But since commit 79442ed1 ("mm/memblock.c: introduce bottom-up allocation mode") bottom up allocation mode is supported for memblock. So, use it to find the memory to be reserved for FADump. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821364211.5656.14336025460336135194.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
With FADump support now available on both pseries and OPAL platforms, update FADump documentation with these details. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821361692.5656.11377757995827253404.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Make OPAL call to indicate that the dump is processed and the metadata area in OPAL can be cleared/released. Also, setup/initialize FADump for re-registration. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821356046.5656.12270927048195494911.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
If all kernel boot memory regions are not registered for MPIPL before system crashes, try processing the partial crashdump but warn the user before proceeding. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821352793.5656.1734051341024721407.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Add support in the kernel to process the crash'ed kernel's memory preserved during MPIPL and export it as /proc/vmcore file for the userland scripts to filter and analyze it later. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821351482.5656.6255805804744333073.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Firmware uses a 32-bit field for size while copying/backing-up memory during MPIPL. So, the maximum value that could be represented with a PAGE_SIZE aligned 32-bit field will be the maximum copy size for a region but FADump capture kernel usually needs more memory than that to be preserved to avoid running into out of memory errors. So, request firmware to copy multiple kernel boot memory regions instead of just one (which worked fine for pseries as 64-bit field was used for size there). Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821350193.5656.3664853158523582019.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Make OPAL calls to register and un-register with firmware for MPIPL. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821348482.5656.13646250851483648241.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
During kexec boot, metadata address needs to be reset to avoid running into errors interpreting stale metadata address, in case the kexec'ed kernel crashes before metadata address could be setup again. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821346629.5656.10783321582005237813.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
OPAL allows registering address with it in the first kernel and retrieving it after MPIPL. Setup kernel metadata and register its address with OPAL to use it for processing the crash dump. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821345011.5656.13567765019032928471.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Some code clean-up like using minimal assignments and updating printk messages. Also, add an 'error_out' label for handling error cleanup at one place. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821343485.5656.10202857091553646948.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Add basic callback functions for FADump on PowerNV platform. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821342072.5656.4346362203141486452.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
MPIPL is Memory Preserving IPL supported from POWER9. This enables the kernel to reset the system with memory 'preserved'. Also, it supports copying memory from a source address to some destination address during MPIPL boot. Add MPIPL interface definitions here to leverage these f/w features in adding FADump support for PowerNV platform. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821340710.5656.10071829040515662624.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
fadump is pronounced f-a-dump. Update documentation accordingly. Also, update how fadump_region contents look like with recent changes. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821339317.5656.15852294223821732082.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Move code that supports processing the crash'ed kernel's memory preserved by firmware to platform specific callback functions. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821337690.5656.13050665924800177744.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Except for Reserved dump area (see Documentation/powerpc/firmware- assisted-dump.rst) which is permanent reserved, all memory above boot memory size, where boot memory size is the memory required for the kernel to boot successfully when booted with restricted memory (memory for capture kernel), is released when the dump is invalidated. Make this a bit more explicit in the code. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821336092.5656.1079046285366041687.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Improve how fadump_region contents are displayed by adding source information of memory regions that are to be dumped by f/w. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821334740.5656.5897097884010195405.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Move platform specific register/un-register code, the RTAS calls, to register/un-register callback functions. This would also mean moving code that initializes and prints the platform specific FADump data. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821332856.5656.16380417702046411631.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Introduce callback functions for platform specific operations like register, unregister, invalidate & such. Also, define place-holders for the same on pSeries platform. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821330286.5656.15538934400074110770.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Currently, FADump is only supported on pSeries but that is going to change soon with FADump support being added on PowerNV platform. So, move rtas specific definitions to platform code to allow FADump to have multiple platforms support. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821328494.5656.16219929140866195511.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Use helper functions to simplify memory allocation, pinning down and freeing the memory used for CPU notes buffer. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821323555.5656.2486038022572739622.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
The figures depicting FADump's (Firmware-Assisted Dump) memory layout are missing some finer details like different memory regions and what they represent. Improve the documentation by updating those details. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821322070.5656.8194734198500730487.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Declare helper functions, that can be reused by multiple platforms, in the internal header file. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821320487.5656.2660730464212209984.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Add helper functions to setup & free CPU notes buffer and to find if a given memory area is contiguous. Also, use boolean as return type for the function that finds if boot memory area is contiguous. While at it, save the virtual address of CPU notes buffer instead of physical address as virtual address is used often. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821318971.5656.9281936950510635858.stgit@hbathini.in.ibm.com
-
Hari Bathini authored
Though asm/fadump.h is meant to be used by other components dealing with FADump, it also has macros/definitions internal to FADump code. Move them to a new header file used within FADump code. This also makes way for refactoring platform specific FADump code. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821313134.5656.6597770626574392140.stgit@hbathini.in.ibm.com
-
Masahiro Yamada authored
This slightly improves the prom_init_check rule. [1] Avoid needless check Currently, prom_init_check.sh is invoked every time you run 'make' even if you have changed nothing in prom_init.c. With this commit, the script is re-run only when prom_init.o is recompiled. [2] Beautify the build log Currently, the O= build shows the absolute path to the script: CALL /abs/path/to/source/of/linux/arch/powerpc/kernel/prom_init_check.sh With this commit, it is always a relative path to the timestamp file: PROMCHK arch/powerpc/kernel/prom_init_check Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190912074037.13813-1-yamada.masahiro@socionext.com
-
Michael Ellerman authored
Some of the templates used for KVM patching are only used on certain platforms, but currently they are always built-in, fix that. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190911115746.12433-4-mpe@ellerman.id.au
-
Michael Ellerman authored
All the code in kvm.c can be marked __init. Most of it is already inlined into the initcall, but not all. So instead of relying on the inlining, mark it all as __init. This saves ~280 bytes of text for my configuration. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190911115746.12433-3-mpe@ellerman.id.au
-
Michael Ellerman authored
kvm_tmp is now in .text and so doesn't need a special overlap check. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190911115746.12433-2-mpe@ellerman.id.au
-
Michael Ellerman authored
In some configurations of KVM, guests binary patch themselves to avoid/reduce trapping into the hypervisor. For some instructions this requires replacing one instruction with a sequence of instructions. For those cases we need to write the sequence of instructions somewhere and then patch the location of the original instruction to branch to the sequence. That requires that the location of the sequence be within 32MB of the original instruction. The current solution for this is that we create a 1MB array in BSS, write sequences into there, and then free the remainder of the array. This has a few problems: - it confuses kmemleak. - it confuses lockdep. - it requires mapping kvm_tmp executable, which can cause adjacent areas to also be mapped executable if we're using 16M pages for the linear mapping. - the 32MB limit can be exceeded if the kernel is big enough, especially with STRICT_KERNEL_RWX enabled, which then prevents the patching from working at all. We can fix all those problems by making kvm_tmp just a region of regular .text. However currently it's 1MB in size, and we don't want to waste 1MB of text. In practice however I only see ~30KB of kvm_tmp being used even for an allyes_config. So shrink kvm_tmp to 64K, which ought to be enough for everyone, and move it into .text. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190911115746.12433-1-mpe@ellerman.id.au
-
Michael Ellerman authored
The builds breaks when IOMMU_API=n, eg. skiroot_defconfig: arch/powerpc/platforms/powernv/npu-dma.c:96:28: error: 'get_gpu_pci_dev_and_pe' defined but not used arch/powerpc/platforms/powernv/npu-dma.c:126:13: error: 'pnv_npu_set_window' defined but not used Fixes: b4d37a7b ("powerpc/powernv: Remove unused pnv_npu_try_dma_set_bypass() function") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
The build breaks when STACKTRACE=n, eg. skiroot_defconfig: arch/powerpc/kernel/eeh_event.c:124:23: error: implicit declaration of function 'stack_trace_save' Fix it with some ifdefs for now. Fixes: 25baf3d8 ("powerpc/eeh: Defer printing stack trace") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 11 Sep, 2019 4 commits
-
-
Greg Kurz authored
There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call to return the 32-bit value 0xffffffff when OPAL has run out of IRQs. Unfortunatelty, OPAL return values are signed 64-bit entities and errors are supposed to be negative. If that happens, the linux code confusingly treats 0xffffffff as a valid IRQ number and panics at some point. A fix was recently merged in skiboot: e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()") but we need a workaround anyway to support older skiboots already in the field. Internally convert 0xffffffff to OPAL_RESOURCE which is the usual error returned upon resource exhaustion. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821713818.1985334.14123187368108582810.stgit@bahia.lan
-
Nathan Lynch authored
prep_irq_for_idle() is intended to be called before entering H_CEDE (and it is used by the pseries cpuidle driver). However the default pseries idle routine does not call it, leading to mismanaged lazy irq state when the cpuidle driver isn't in use. Manifestations of this include: * Dropped IPIs in the time immediately after a cpu comes online (before it has installed the cpuidle handler), making the online operation block indefinitely waiting for the new cpu to respond. * Hitting this WARN_ON in arch_local_irq_restore(): /* * We should already be hard disabled here. We had bugs * where that wasn't the case so let's dbl check it and * warn if we are wrong. Only do that when IRQ tracing * is enabled as mfmsr() can be costly. */ if (WARN_ON_ONCE(mfmsr() & MSR_EE)) __hard_irq_disable(); Call prep_irq_for_idle() from pseries_lpar_idle() and honor its result. Fixes: 363edbe2 ("powerpc: Default arch idle could cede processor on pseries") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190910225244.25056-1-nathanl@linux.ibm.com
-
Ravi Bangoria authored
If watchpoint exception is generated by larx/stcx instructions, the reservation created by larx gets lost while handling exception, and thus stcx instruction always fails. Generally these instructions are used in a while(1) loop, for example spinlocks. And because stcx never succeeds, it loops forever and ultimately hangs the system. Note that ptrace anyway works in one-shot mode and thus for ptrace we don't change the behaviour. It's up to ptrace user to take care of this. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190910131513.30499-1-ravi.bangoria@linux.ibm.com
-
Vasant Hegde authored
We have OPAL_MSG_PRD message type to pass prd related messages from OPAL to `opal-prd`. It can handle messages upto 64 bytes. We have a requirement to send bigger than 64 bytes of data from OPAL to `opal-prd`. Lets add new message type (OPAL_MSG_PRD2) to pass bigger data. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Make the error string clear that it's the PRD2 event that failed] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190826065701.8853-2-hegdevasant@linux.vnet.ibm.com
-