- 13 Oct, 2018 30 commits
-
-
Joel Stanley authored
Ever since commit 15a3204d ("powerpc/64s: Set assembler machine type to POWER4") we force -mpower4 to be passed to the assembler irrespective of the CFLAGS used (for Book3s 64). When building a powerpc64 kernel with clang, clang will not add -many to the assembler flags, so any instructions that the compiler has generated that are not available on power4 will cause an error: /usr/bin/as -a64 -mppc64 -mlittle-endian -mpower8 \ -I ./arch/powerpc/include -I ./arch/powerpc/include/generated \ -I ./include -I ./arch/powerpc/include/uapi \ -I ./arch/powerpc/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi -I arch/powerpc -I arch/powerpc \ -maltivec -mpower4 -o init/do_mounts.o /tmp/do_mounts-3b0a3d.s /tmp/do_mounts-51ce54.s:748: Error: unrecognized opcode: `isel' GCC does include -many, so the GCC driven gas call will succeed: as -v -I ./arch/powerpc/include -I ./arch/powerpc/include/generated -I ./include -I ./arch/powerpc/include/uapi -I ./arch/powerpc/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi -I arch/powerpc -I arch/powerpc -a64 -mpower8 -many -mlittle -maltivec -mpower4 -o init/do_mounts.o Note that isel is power7 and above for IBM CPUs. GCC only generates it for Power9 and above, but the above test was run against the clang generated assembly. Peter Bergner explains: When using -many -mpower4, gas will first try and find a matching power4 mnemonic and failing that, it will then allow any valid mnemonic that gas knows about. GCC's use of -many predates me though. IIRC, Alan looked at trying to remove it, but I forget why he didn't. Could be either a gcc or gas issue at the time. I'm not sure whether issue still exists or not. He and I have modified how gas works internally a fair amount since he tried removing gcc use of -many. I will also note that when using -many, gas will choose the first mnemonic that matches in the mnemonic table and we have (mostly) sorted the table so that server mnemonics show up earlier in the table than other mnemonics, so they'll be seen/chosen first. By explicitly setting -many we can build with Clang and GCC while retaining the -mpower4 option. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
YueHaibing authored
The variable 'aa_index' is defined as an unsigned value in update_lmb_associativity_index(), but find_aa_index() may return -1 when dlpar_clone_property() fails. So change find_aa_index() to return a bool, which indicates whether 'aa_index' was found or not. Fixes: c05a5a40 ("powerpc/pseries: Dynamic add entires to associativity lookup array") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Nathan Fontenot nfont@linux.vnet.ibm.com> [mpe: Tweak changelog, rename is_found to just found] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
Rather than mixing "if (state)" blocks and gotos, convert entirely to "if (state)" blocks to make the state machine behaviour clearer. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
The wait_state member of eeh_ops does not need to be platform dependent; it's just logic around eeh_ops.get_state(). Therefore, merge the two (slightly different!) platform versions into a new function, eeh_wait_state() and remove the eeh_ops member. While doing this, also correct: * The wait logic, so that it never waits longer than max_wait. * The wait logic, so that it never waits less than EEH_STATE_MIN_WAIT_TIME. * One call site where the result is treated like a bit field before it's checked for negative error values. * In pseries_eeh_get_state(), rename the "state" parameter to "delay" because that's what it is. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
Currently, eeh_pe_state_mark() marks a PE (and it's children) with a state and then performs additional processing if that state included EEH_PE_ISOLATED. The state parameter is always a constant at the call site, so rearrange eeh_pe_state_mark() into two functions and just call the appropriate one at each site. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
The function eeh_pe_state_mark_with_cfg() just performs the work of eeh_pe_state_mark() and then, conditionally, the work of eeh_pe_state_clear(). However it is only ever called with a constant state such that the condition is always true, so replace it by direct calls. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
Move the call to eeh_dev_to_pe() up, so that later it's clear that "pe" isn't NULL. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
Change the name of the fields in eeh_rmv_data to clarify their usage. Change "edev_list" to "removed_vf_list" because it does not contain generic edevs, but rather only edevs that contain virtual functions (which need to be removed during recovery). Similarly, change "removed" to "removed_dev_count" because it is a count of any removed devices, not just those in the above list. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
Instances of struct eeh_pe are placed in a tree structure using the fields "child_list" and "child", so place these next to each other in the definition. The field "child" is a list entry, so remove the unnecessary and misleading use of the list initializer, LIST_HEAD(), on it. The eeh_dev struct contains two list entry fields, called "list" and "rmv_list". Rename them to "entry" and "rmv_entry" and, as above, stop initializing them with LIST_HEAD(). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
Remove the unnecessary cast through void * on the first parameter and remove the unused second parameter (always NULL). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
The 'bus' member of struct eeh_dev is assigned to once but never used, so remove it. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
Currently a flag, EEH_POSTPONED_PROBE, is used to prevent an incorrect message "EEH: No capable adapters found" from being displayed during the boot of powernv systems. It is necessary because, on powernv, the call to eeh_probe_devices() made from eeh_init() is too early and EEH can't yet be enabled. A second call is made later from eeh_pnv_post_init(), which succeeds. (On pseries, the first call succeeds because PCI devices are set up early enough and no second call is made.) This can be simplified by moving the early call to eeh_probe_devices() from eeh_init() (where it's seen by both platforms) to pSeries_final_fixup(), so that each platform only calls eeh_probe_devices() once, at a point where it can succeed. This is slightly later in the boot sequence, but but still early enough and it is now in the same place in the sequence for both platforms (the pcibios_fixup hook). The display of the message can be cleaned up as well, by moving it into eeh_probe_devices(). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
eeh_add_to_parent_pe() sometimes removes the EEH_PE_KEEP flag, but it incorrectly removes it from pe->type, instead of pe->state. However, rather than clearing it from the correct field, remove it. Inspection of the code shows that it can't ever have had any effect (even if it had been cleared from the correct field), because the field is never tested after it is cleared by the statement in question. The clear statement was added by commit 807a827d ("powerpc/eeh: Keep PE during hotplug"), but it didn't explain why it was necessary. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
If a device is removed during EEH processing (either by a driver's handler or as part of recovery), it can lead to a null dereference in eeh_pe_report_edev(). To handle this, skip devices that have been removed. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
If an error occurs during an unplug operation, it's possible for eeh_dump_dev_log() to be called when edev->pdn is null, which currently leads to dereferencing a null pointer. Handle this by skipping the error log for those devices. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Joel Stanley authored
The boot wrapper is currently built with -Os. By building with O2 we can meaningfully reduce the time decompressing the kernel. I tested by comparing 10 runs of each option in Qemu and on hardware. The kernel is compressed with KERNEL_XZ built with GCC 8.2.0-7ubuntu1. The values are counts of the timebase. Qemu TCG powernv Power8: Os O2 O3 median 10221123889 6201518438 6568186825 stddev 1361267211 429090641 657930076 improvement 39.33% 35.74% Palmetto Power8: Os O2 O3 median 50279 50599 35790 stddev 992144533 627130655 623721078 improvement 36.79% 37.13% Romulus Power9: Os O2 O3 median 670312391 454733720 448881398 stddev 157569 107276 108760 improvement 32.16% 33.03% TCG was quite noisy, with every few runs producing an outlier. Even so, O2 is faster than O3. On hardware the numbers were less noisy and O3 is slightly faster than O2. The wrapper size increases when moving from Os. Comparing zImage.epapr to the existing Os build using bloat-o-meter: Before=43401, After=56837 (13KB), chg +30.96% Before=43401, After=64305 (20KB), chg +48.16% I chose O2 for a balance between Qemu and hardware speed up. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Joel Stanley authored
This will avoid auto-vectorisation when building with higher optimisation levels. We don't know if the machine can support VSX and even if it's present it's probably not going to be enabled at this point in boot. These flag were both added prior to GCC 4.6 which is the minimum compiler version supported by upstream, thanks to Segher for the details. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Joel Stanley authored
As of commit 10c77dba ("powerpc/boot: Fix build failure in 32-bit boot wrapper") the opal code is hidden behind CONFIG_PPC64_BOOT_WRAPPER, but the boot wrapper avoids include/linux, so it does not get the normal Kconfig flags. We can drop the guard entirely as in commit f8e8e69c ("powerpc/boot: Only build OPAL code when necessary") the makefile only includes opal.c in the build if CONFIG_PPC64_BOOT_WRAPPER is set. Fixes: 10c77dba ("powerpc/boot: Fix build failure in 32-bit boot wrapper") Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Joel Stanley authored
Currently the wrapper is built without including anything in $(src)/include/, which means there are no CONFIG_ symbols defined. This means the platform specific serial drivers were never enabled. We now copy the definitions into the boot directory, so any C file can now include autoconf.h to depend on configuration options. Fixes: 866bfc75 ("powerpc: conditionally compile platform-specific serial drivers") Signed-off-by: Joel Stanley <joel@jms.id.au> [mpe: Fix to use $(objtree) to find autoconf.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Bartlomiej Zolnierkiewicz authored
'default n' is the default value for any bool or tristate Kconfig setting so there is no need to write it explicitly. Also since commit f467c564 ("kconfig: only write '# CONFIG_FOO is not set' for visible symbols") the Kconfig behavior is the same regardless of 'default n' being present or not: ... One side effect of (and the main motivation for) this change is making the following two definitions behave exactly the same: config FOO bool config FOO bool default n With this change, neither of these will generate a '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied). That might make it clearer to people that a bare 'default n' is redundant. ... Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Oliver O'Halloran authored
Currently when we get an unknown RTAS event it prints the type as "Unknown" and no other useful information. Add the raw type code to the log message so that we have something to work off. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Joel Stanley authored
The powerpc kernel uses setjmp which causes a warning when building with clang: In file included from arch/powerpc/xmon/xmon.c:51: ./arch/powerpc/include/asm/setjmp.h:15:13: error: declaration of built-in function 'setjmp' requires inclusion of the header <setjmp.h> [-Werror,-Wbuiltin-requires-header] extern long setjmp(long *); ^ ./arch/powerpc/include/asm/setjmp.h:16:13: error: declaration of built-in function 'longjmp' requires inclusion of the header <setjmp.h> [-Werror,-Wbuiltin-requires-header] extern void longjmp(long *, long); ^ This *is* the header and we're not using the built-in setjump but rather the one in arch/powerpc/kernel/misc.S. As the compiler warning does not make sense, it for the files where setjmp is used. Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> [mpe: Move subdir-ccflags in xmon/Makefile to not clobber -Werror] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Dan Carpenter authored
The "count < sizeof(struct os_area_db)" comparison is type promoted to size_t so negative values of "count" are treated as very high values and we accidentally return success instead of a negative error code. This doesn't really change runtime much but it fixes a static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Joel Stanley authored
On a Power9 box we get a few screens full of these on boot. Drop them to pr_debug. [ 5.993645] nest_centaur6_imc performance monitor hardware support registered [ 5.993728] nest_centaur7_imc performance monitor hardware support registered [ 5.996510] core_imc performance monitor hardware support registered [ 5.996569] nest_mba0_imc performance monitor hardware support registered [ 5.996631] nest_mba1_imc performance monitor hardware support registered [ 5.996685] nest_mba2_imc performance monitor hardware support registered Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
instructions_to_print var is assigned value 16 and there is no way to change it. This patch replaces it by a constant. Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
When two processes crash at the same time, we sometimes encounter interleaving in the middle of a line: init[1]: segfault (11) at 0 nip 0 lr 0 code 1 init[1]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[74]: segfault (11) at 10a74 nip 1000c198 lr 100078c8 code 1 in sh[10000000+14000] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[1]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[74]: code: 90010024 bf61000c 91490a7c 3fa01002 3be00000 7d3e4b78 3bbd0c20 3b600000 init[74]: code: 3b9d0040 7c7fe02e 2f830000 419e0028 <89230000> 2f890000 41be001c 4b7f6e79 This patch fixes it by preparing complete lines in a buffer and printing it at once. Fixes: 88b0fe17 ("powerpc: Add show_user_instructions()") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Use seq_buf_printf() not seq_buf_puts() which doesn't NULL terminate] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
As spotted by sparse: arch/powerpc/kernel/process.c:1302:6: warning: symbol 'show_user_instructions' was not declared. Should it be static? Fixes: 88b0fe17 ("powerpc: Add show_user_instructions()") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This patch fixes the following warnings, which are leftovers from when __get_user() was replaced by probe_kernel_address(). arch/powerpc/kernel/process.c:1287:22: warning: incorrect type in argument 2 (different address spaces) arch/powerpc/kernel/process.c:1287:22: expected void const *src arch/powerpc/kernel/process.c:1287:22: got unsigned int [noderef] <asn:1>*<noident> arch/powerpc/kernel/process.c:1319:21: warning: incorrect type in argument 2 (different address spaces) arch/powerpc/kernel/process.c:1319:21: expected void const *src arch/powerpc/kernel/process.c:1319:21: got unsigned int [noderef] <asn:1>*<noident> Fixes: 7b051f66 ("powerpc: Use probe_kernel_address in show_instructions") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
commit 06ec27ae ("powerpc/64: add stack protector support") doesn't initialise the stack canary on SMP secondary CPU's paca, leading to the following false positive report from the stack protector. smp: Bringing up secondary CPUs ... Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __schedule+0x978/0xa80 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc7-next-20181010-autotest-autotest #1 Call Trace: [c000001fed5b3bf0] [c000000000a0ef3c] dump_stack+0xb0/0xf4 (unreliable) [c000001fed5b3c30] [c0000000000f9d68] panic+0x140/0x308 [c000001fed5b3cc0] [c0000000000f9844] __stack_chk_fail+0x24/0x30 [c000001fed5b3d20] [c000000000a2c3a8] __schedule+0x978/0xa80 [c000001fed5b3e00] [c000000000a2c9b4] schedule_idle+0x34/0x60 [c000001fed5b3e30] [c00000000013d344] do_idle+0x224/0x3d0 [c000001fed5b3ec0] [c00000000013d6e0] cpu_startup_entry+0x30/0x50 [c000001fed5b3ef0] [c000000000047f34] start_secondary+0x4d4/0x520 [c000001fed5b3f90] [c00000000000b370] start_secondary_prolog+0x10/0x14 This patch properly initialises the stack_canary of the secondary idle tasks. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Fixes: 06ec27ae ("powerpc/64: add stack protector support") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 09 Oct, 2018 1 commit
-
-
Michael Ellerman authored
Merge our fixes branch. It has a few important fixes that are needed for futher testing and also some commits that will conflict with content in next.
-
- 08 Oct, 2018 7 commits
-
-
Finn Thain authored
Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Finn Thain authored
Add missing severity level to log messages. Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Finn Thain authored
Modifying the request queue or changing the current state requires mutual exclusion. Use local_irq_disable() consistently for this rather than disabling the ADB interrupt. This simplifies the locking scheme and brings via-macii into line with the other ADB drivers. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Finn Thain authored
The BUG_ON assertions I added to the via-macii driver over a decade ago haven't fired AFAIK. Some can never fire (by inspection). One assertion checks for a NULL pointer, but that would merely substitute a BUG crash for an Oops crash. Remove the pointless BUG_ON assertions and replace the others with a WARN_ON and an array bounds check. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Finn Thain authored
Make the reset operation synchronous, like the other ADB drivers. The reset request is static data but callers may not know that. This way the struct is not in use when the reset method returns. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Finn Thain authored
Avoid the KERN_CONT problem by avoiding message fragments. The problem arises during async ADB bus probing, when ADB messages may get mixed up with other messages. See also, commit 4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines"). Remove a number of printk() continuation lines by logging handler changes in adb_try_handler_change() instead. This patch addresses the problematic use of "\n" at the beginning of pr_cont() messages, which got overlooked in commit f2be6295 ("macintosh/adb: Properly mark continued kernel messages"). That commit also changed printk(KERN_DEBUG ...) to pr_debug(...), which hinders work on low-level ADB driver bugs. Revert that change. Cc: Andreas Schwab <schwab@linux-m68k.org> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Finn Thain authored
Now that the 68k Mac port has adopted the via-pmu driver, the same RTC code can be shared between m68k and powerpc. Replace duplicated code in arch/powerpc and arch/m68k with common RTC accessors for Cuda and PMU. Drop the problematic WARN_ON which was introduced in commit 22db552b ("powerpc/powermac: Fix rtc read/write functions"). Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 05 Oct, 2018 2 commits
-
-
Srikar Dronamraju authored
With commit 2ea62630 ("powerpc/topology: Get topology for shared processors at boot"), kdump kernel on shared LPAR may crash. The necessary conditions are - Shared LPAR with at least 2 nodes having memory and CPUs. - Memory requirement for kdump kernel must be met by the first N-1 nodes where there are at least N nodes with memory and CPUs. Example numactl of such a machine. $ numactl -H available: 5 nodes (0,2,5-7) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: node 2 size: 255 MB node 2 free: 189 MB node 5 cpus: 24 25 26 27 28 29 30 31 node 5 size: 4095 MB node 5 free: 4024 MB node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 6 size: 6353 MB node 6 free: 5998 MB node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 node 7 size: 7640 MB node 7 free: 7164 MB node distances: node 0 2 5 6 7 0: 10 40 40 40 40 2: 40 10 40 40 40 5: 40 40 10 40 40 6: 40 40 40 10 20 7: 40 40 40 20 10 Steps to reproduce. 1. Load / start kdump service. 2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger) When booting a kdump kernel with 2048M: kexec: Starting switchover sequence. I'm in purgatory Using 1TB segments hash-mmu: Initializing hash mmu with SLB Linux version 4.19.0-rc5-master+ (srikar@linux-xxu6) (gcc version 4.8.5 (SUSE Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018 Found initrd at 0xc000000009e70000:0xc00000000ae554b4 Using pSeries machine description ----------------------------------------------------- ppc64_pft_size = 0x1e phys_mem_size = 0x88000000 dcache_bsize = 0x80 icache_bsize = 0x80 cpu_features = 0x000000ff8f5d91a7 possible = 0x0000fbffcf5fb1a7 always = 0x0000006f8b5c91a1 cpu_user_features = 0xdc0065c2 0xef000000 mmu_features = 0x7c006001 firmware_features = 0x00000007c45bfc57 htab_hash_mask = 0x7fffff physical_start = 0x8000000 ----------------------------------------------------- numa: NODE_DATA [mem 0x87d5e300-0x87d67fff] numa: NODE_DATA(0) on node 6 numa: NODE_DATA [mem 0x87d54600-0x87d5e2ff] Top of RAM: 0x88000000, Total RAM: 0x88000000 Memory hole size: 0MB Zone ranges: DMA [mem 0x0000000000000000-0x0000000087ffffff] DMA32 empty Normal empty Movable zone start for each node Early memory node ranges node 6: [mem 0x0000000000000000-0x0000000087ffffff] Could not find start_pfn for node 0 Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000] On node 0 totalpages: 0 Initmem setup node 6 [mem 0x0000000000000000-0x0000000087ffffff] On node 6 totalpages: 34816 Unable to handle kernel paging request for data at address 0x00000060 Faulting instruction address: 0xc000000008703a54 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1 NIP: c000000008703a54 LR: c000000008703a38 CTR: 0000000000000000 REGS: c00000000b673440 TRAP: 0380 Not tainted (4.19.0-rc5-master+) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24022022 XER: 20000002 CFAR: c0000000086fc238 IRQMASK: 0 GPR00: c000000008703a38 c00000000b6736c0 c000000009281900 0000000000000000 GPR04: 0000000000000000 0000000000000000 fffffffffffff001 c00000000b660080 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000220 GPR12: 0000000000002200 c000000009e51400 0000000000000000 0000000000000008 GPR16: 0000000000000000 c000000008c152e8 c000000008c152a8 0000000000000000 GPR20: c000000009422fd8 c000000009412fd8 c000000009426040 0000000000000008 GPR24: 0000000000000000 0000000000000000 c000000009168bc8 c000000009168c78 GPR28: c00000000b126410 0000000000000000 c00000000916a0b8 c00000000b126400 NIP [c000000008703a54] bus_add_device+0x84/0x1e0 LR [c000000008703a38] bus_add_device+0x68/0x1e0 Call Trace: [c00000000b6736c0] [c000000008703a38] bus_add_device+0x68/0x1e0 (unreliable) [c00000000b673740] [c000000008700194] device_add+0x454/0x7c0 [c00000000b673800] [c00000000872e660] __register_one_node+0xb0/0x240 [c00000000b673860] [c00000000839a6bc] __try_online_node+0x12c/0x180 [c00000000b673900] [c00000000839b978] try_online_node+0x58/0x90 [c00000000b673930] [c0000000080846d8] find_and_online_cpu_nid+0x158/0x190 [c00000000b673a10] [c0000000080848a0] numa_update_cpu_topology+0x190/0x580 [c00000000b673c00] [c000000008d3f2e4] smp_cpus_done+0x94/0x108 [c00000000b673c70] [c000000008d5c00c] smp_init+0x174/0x19c [c00000000b673d00] [c000000008d346b8] kernel_init_freeable+0x1e0/0x450 [c00000000b673dc0] [c0000000080102e8] kernel_init+0x28/0x160 [c00000000b673e30] [c00000000800b65c] ret_from_kernel_thread+0x5c/0x80 Instruction dump: 60000000 60000000 e89e0020 7fe3fb78 4bff87d5 60000000 7c7d1b79 4082008c e8bf0050 e93e0098 3b9f0010 2fa50000 <e8690060> 38630018 419e0114 7f84e378 ---[ end trace 593577668c2daa65 ]--- However a regular kernel with 4096M (2048 gets reserved for crash kernel) boots properly. Unlike regular kernels, which mark all available nodes as online, kdump kernel only marks just enough nodes as online and marks the rest as offline at boot. However kdump kernel boots with all available CPUs. With Commit 2ea62630 ("powerpc/topology: Get topology for shared processors at boot"), all CPUs are onlined on their respective nodes at boot time. try_online_node() tries to online the offline nodes but fails as all needed subsystems are not yet initialized. As part of fix, detect and skip early onlining of a offline node. Fixes: 2ea62630 ("powerpc/topology: Get topology for shared processors at boot") Reported-by: Pavithra Prakash <pavrampu@in.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Recently we implemented show_user_instructions() which dumps the code around the NIP when a user space process dies with an unhandled signal. This was modelled on the x86 code, and we even went so far as to implement the exact same bug, namely that if the user process crashed with its NIP pointing into the kernel we will dump kernel text to dmesg. eg: bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1 bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6 bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048 This was discovered on x86 by Jann Horn and fixed in commit 342db04a ("x86/dumpstack: Don't dump kernel memory based on usermode RIP"). Fix it by checking the adjusted NIP value (pc) and number of instructions against USER_DS, and bail if we fail the check, eg: bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1 bad-bctr[2969]: Bad NIP, not dumping instructions. Fixes: 88b0fe17 ("powerpc: Add show_user_instructions()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-