- 18 May, 2020 40 commits
-
-
Jordan Niethe authored
Instead of using memcpy() and flush_icache_range() use patch_instruction() which not only accomplishes both of these steps but will also make it easier to add support for prefixed instructions. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-16-jniethe5@gmail.com
-
Jordan Niethe authored
Introduce a probe_kernel_read_inst() function to use in cases where probe_kernel_read() is used for getting an instruction. This will be more useful for prefixed instructions. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> [mpe: Don't write to *inst on error] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-15-jniethe5@gmail.com
-
Jordan Niethe authored
Introduce a probe_user_read_inst() function to use in cases where probe_user_read() is used for getting an instruction. This will be more useful for prefixed instructions. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> [mpe: Don't write to *inst on error, fold in __user annotations] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-14-jniethe5@gmail.com
-
Jordan Niethe authored
Prefixed instructions will mean there are instructions of different length. As a result dereferencing a pointer to an instruction will not necessarily give the desired result. Introduce a function for reading instructions from memory into the instruction data type. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-13-jniethe5@gmail.com
-
Jordan Niethe authored
Currently unsigned ints are used to represent instructions on powerpc. This has worked well as instructions have always been 4 byte words. However, ISA v3.1 introduces some changes to instructions that mean this scheme will no longer work as well. This change is Prefixed Instructions. A prefixed instruction is made up of a word prefix followed by a word suffix to make an 8 byte double word instruction. No matter the endianness of the system the prefix always comes first. Prefixed instructions are only planned for powerpc64. Introduce a ppc_inst type to represent both prefixed and word instructions on powerpc64 while keeping it possible to exclusively have word instructions on powerpc32. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Fix compile error in emulate_spe()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-12-jniethe5@gmail.com
-
Jordan Niethe authored
In preparation for an instruction data type that can not be directly used with the '==' operator use functions for checking equality. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Balamuruhan S <bala24@linux.ibm.com> Link: https://lore.kernel.org/r/20200506034050.24806-11-jniethe5@gmail.com
-
Jordan Niethe authored
Use a function for byte swapping instructions in preparation of a more complicated instruction type. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Balamuruhan S <bala24@linux.ibm.com> Link: https://lore.kernel.org/r/20200506034050.24806-10-jniethe5@gmail.com
-
Jordan Niethe authored
In preparation for using a data type for instructions that can not be directly used with the '>>' operator use a function for getting the op code of an instruction. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-9-jniethe5@gmail.com
-
Jordan Niethe authored
In preparation for introducing a more complicated instruction type to accommodate prefixed instructions use an accessor for getting an instruction as a u32. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-8-jniethe5@gmail.com
-
Jordan Niethe authored
In preparation for instructions having a more complex data type start using a macro, ppc_inst(), for making an instruction out of a u32. A macro is used so that instructions can be used as initializer elements. Currently this does nothing, but it will allow for creating a data type that can represent prefixed instructions. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Change include guard to _ASM_POWERPC_INST_H] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-7-jniethe5@gmail.com
-
Jordan Niethe authored
create_branch(), create_cond_branch() and translate_branch() return the instruction that they create, or return 0 to signal an error. Separate these concerns in preparation for an instruction type that is not just an unsigned int. Fill the created instruction to a pointer passed as the first parameter to the function and use a non-zero return value to signify an error. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-6-jniethe5@gmail.com
-
Jordan Niethe authored
A modulo operation is used for calculating the current offset from a breakpoint within the breakpoint table. As instruction lengths are always a power of 2, this can be replaced with a bitwise 'and'. The current check for word alignment can be replaced with checking that the lower 2 bits are not set. Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-5-jniethe5@gmail.com
-
Jordan Niethe authored
The instructions for xmon's breakpoint are stored bpt_table[] which is in the data section. This is problematic as the data section may be marked as no execute. Move bpt_table[] to the text section. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-4-jniethe5@gmail.com
-
Jordan Niethe authored
To execute an instruction out of line after a breakpoint, the NIP is set to the address of struct bpt::instr. Here a copy of the instruction that was replaced with a breakpoint is kept, along with a trap so normal flow can be resumed after XOLing. The struct bpt's are located within the data section. This is problematic as the data section may be marked as no execute. Instead of each struct bpt holding the instructions to be XOL'd, make a new array, bpt_table[], with enough space to hold instructions for the number of supported breakpoints. A later patch will move this to the text section. Make struct bpt::instr a pointer to the instructions in bpt_table[] associated with that breakpoint. This association is a simple mapping: bpts[n] -> bpt_table[n * words per breakpoint]. Currently we only need the copied instruction followed by a trap, so 2 words per breakpoint. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-3-jniethe5@gmail.com
-
Jordan Niethe authored
For modifying instructions in xmon, patch_instruction() can serve the same role that store_inst() is performing with the advantage of not being specific to xmon. In some places patch_instruction() is already being using followed by store_inst(). In these cases just remove the store_inst(). Otherwise replace store_inst() with patch_instruction(). Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20200506034050.24806-2-jniethe5@gmail.com
-
Geoff Levand authored
The ps3_mm_region_destroy() and ps3_mm_vas_destroy() routines are called very late in the shutdown via kexec's mmu_cleanup_all routine. By the time mmu_cleanup_all runs it is too late to use udbg_printf, and calling it will cause PS3 systems to hang. Remove all debugging statements from ps3_mm_region_destroy() and ps3_mm_vas_destroy() and replace any error reporting with calls to lv1_panic. With this change builds with 'DEBUG' defined will not cause kexec reboots to hang, and builds with 'DEBUG' defined or not will end in lv1_panic if an error is encountered. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7325c4af2b4c989c19d6a26b90b1fec9c0615ddf.1589049250.git.geoff@infradead.org
-
Emmanuel Nicolet authored
Since commit dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count"), the kernel will bug_on on the PS3 because bio_split() is called with sectors == 0: kernel BUG at block/bio.c:1853! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=8 NUMA PS3 Modules linked in: firewire_sbp2 rtc_ps3(+) soundcore ps3_gelic(+) \ ps3rom(+) firewire_core ps3vram(+) usb_common crc_itu_t CPU: 0 PID: 97 Comm: blkid Not tainted 5.3.0-rc4 #1 NIP: c00000000027d0d0 LR: c00000000027d0b0 CTR: 0000000000000000 REGS: c00000000135ae90 TRAP: 0700 Not tainted (5.3.0-rc4) MSR: 8000000000028032 <SF,EE,IR,DR,RI> CR: 44008240 XER: 20000000 IRQMASK: 0 GPR00: c000000000289368 c00000000135b120 c00000000084a500 c000000004ff8300 GPR04: 0000000000000c00 c000000004c905e0 c000000004c905e0 000000000000ffff GPR08: 0000000000000000 0000000000000001 0000000000000000 000000000000ffff GPR12: 0000000000000000 c0000000008ef000 000000000000003e 0000000000080001 GPR16: 0000000000000100 000000000000ffff 0000000000000000 0000000000000004 GPR20: c00000000062fd7e 0000000000000001 000000000000ffff 0000000000000080 GPR24: c000000000781788 c00000000135b350 0000000000000080 c000000004c905e0 GPR28: c00000000135b348 c000000004ff8300 0000000000000000 c000000004c90000 NIP [c00000000027d0d0] .bio_split+0x28/0xac LR [c00000000027d0b0] .bio_split+0x8/0xac Call Trace: [c00000000135b120] [c00000000027d130] .bio_split+0x88/0xac (unreliable) [c00000000135b1b0] [c000000000289368] .__blk_queue_split+0x11c/0x53c [c00000000135b2d0] [c00000000028f614] .blk_mq_make_request+0x80/0x7d4 [c00000000135b3d0] [c000000000283a8c] .generic_make_request+0x118/0x294 [c00000000135b4b0] [c000000000283d34] .submit_bio+0x12c/0x174 [c00000000135b580] [c000000000205a44] .mpage_bio_submit+0x3c/0x4c [c00000000135b600] [c000000000206184] .mpage_readpages+0xa4/0x184 [c00000000135b750] [c0000000001ff8fc] .blkdev_readpages+0x24/0x38 [c00000000135b7c0] [c0000000001589f0] .read_pages+0x6c/0x1a8 [c00000000135b8b0] [c000000000158c74] .__do_page_cache_readahead+0x118/0x184 [c00000000135b9b0] [c0000000001591a8] .force_page_cache_readahead+0xe4/0xe8 [c00000000135ba50] [c00000000014fc24] .generic_file_read_iter+0x1d8/0x830 [c00000000135bb50] [c0000000001ffadc] .blkdev_read_iter+0x40/0x5c [c00000000135bbc0] [c0000000001b9e00] .new_sync_read+0x144/0x1a0 [c00000000135bcd0] [c0000000001bc454] .vfs_read+0xa0/0x124 [c00000000135bd70] [c0000000001bc7a4] .ksys_read+0x70/0xd8 [c00000000135be20] [c00000000000a524] system_call+0x5c/0x70 Instruction dump: 7fe3fb78 482e30dc 7c0802a6 482e3085 7c9e2378 f821ff71 7ca42b78 7d3e00d0 7c7d1b78 79290fe0 7cc53378 69290001 <0b090000> 81230028 7bca0020 7929ba62 [ end trace 313fec760f30aa1f ]--- The problem originates from setting the segment boundary of the request queue to -1UL. This makes get_max_segment_size() return zero when offset is zero, whatever the max segment size. The test with BLK_SEG_BOUNDARY_MASK fails and 'mask - (mask & offset) + 1' overflows to zero in the return statement. Not setting the segment boundary and using the default value (BLK_SEG_BOUNDARY_MASK) fixes the problem. Signed-off-by: Emmanuel Nicolet <emmanuel.nicolet@gmail.com> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/060a416c43138f45105c0540eff1a45539f7e2fc.1589049250.git.geoff@infradead.org
-
Markus Elfring authored
Remove an extra message for a memory allocation failure in function gelic_descr_prepare_rx(). Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ba4bea4da97308c804fd3a0fae3773dde27b20ce.1589049250.git.geoff@infradead.org
-
Markus Elfring authored
Remove duplicate memory allocation failure error messages. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c763425d8e6f680d3180b3246c9e77727df179d0.1589049250.git.geoff@infradead.org
-
Geoff Levand authored
Remove the '-m4' option to grep to allow grep to process all of nm's output. This avoids the nm warning: nm terminated with signal 13 [Broken pipe] Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/872b6c84a4250ff140e476c62cabe9e56a02b6c2.1589049250.git.geoff@infradead.org
-
Geoff Levand authored
To aid debugging wrapper troubles, output a linker map file 'wrapper.map' when the build is verbose. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fb477f5e91c6b74a1dec98df3cc0a1c91632d94d.1589049250.git.geoff@infradead.org
-
Geoff Levand authored
To aid debugging build problems turn on shell tracing for the head_check script when the build is verbose. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1ae1aed811ba6760af2e46d331285dd6a4de5b80.1589049250.git.geoff@infradead.org
-
Nicholas Piggin authored
System Reset and Machine Check interrupts that are not recoverable due to being nested or interrupting when RI=0 currently panic. This is not necessary, and can often just kill the current context and recover. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com
-
Nicholas Piggin authored
Similarly to the previous patch, do not trace system reset. This code is used when there is a crash or hang, and tracing disturbs the system more and has been known to crash in the crash handling path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-15-npiggin@gmail.com
-
Nicholas Piggin authored
Rather than notrace annotations throughout a significant part of the machine check code across kernel/ pseries/ and powernv/ which can easily be broken and is infrequently tested, use paca->ftrace_enabled to blanket-disable tracing of the real-mode non-maskable handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-14-npiggin@gmail.com
-
Nicholas Piggin authored
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200508043408.886394-13-npiggin@gmail.com
-
Nicholas Piggin authored
machine_check_early() is taken as an NMI, so nmi_enter() is used there. machine_check_exception() is no longer taken as an NMI (it's invoked via irq_work in the case a machine check hits in kernel mode), so remove the nmi_enter() from that case. In NMI context, hash faults don't try to refill the hash table, which can lead to crashes accessing non-pinned kernel pages. System reset still has this potential problem. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop change in show_regs() which breaks Book3E] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-12-npiggin@gmail.com
-
Nicholas Piggin authored
With the previous patch, machine checks can use rtas_call_unlocked() which avoids the RTAS spinlock which would deadlock if a machine check hits while making an RTAS call. This also avoids the complex RTAS error logging which has more RTAS calls and includes kmalloc (which can return memory beyond RMA, which would also crash). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-11-npiggin@gmail.com
-
Nicholas Piggin authored
This allows rtas_args to be put on the machine check stack, which avoids a lot of complications with re-entrancy deadlocks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-10-npiggin@gmail.com
-
Nicholas Piggin authored
PAPR does not specify that fwnmi sreset should be interlocked, and PowerVM (and therefore now QEMU) do not require it. These "ibm,nmi-interlock" calls are ignored by firmware, but there is a possibility that the sreset could have interrupted a machine check and release the machine check's interlock too early, corrupting it if another machine check came in. This is an extremely rare case, but it should be fixed for clarity and reducing the code executed in the sreset path. Firmware also does not provide error information for the sreset case to look at, so remove that comment. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use __be64 to silence some sparse warnings] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-9-npiggin@gmail.com
-
Nicholas Piggin authored
If there is some error with the fwnmi save area, r3 has already been modified which doesn't help with debugging. Only update r3 when to restore the saved value. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-8-npiggin@gmail.com
-
Nicholas Piggin authored
This was discovered developing qemu fwnmi sreset support. This off-by-one bug means the last 16 bytes of the rtas area can not be used for a 16 byte save area. It's not a serious bug, and QEMU implementation has to retain a workaround for old kernels, but it's good to tighten it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-7-npiggin@gmail.com
-
Nicholas Piggin authored
In the interest of reducing code and possible failures in the machine check and system reset paths, grab the "ibm,nmi-interlock" token at init time. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-6-npiggin@gmail.com
-
Nicholas Piggin authored
pseries fwnmi machine check code pops the soft-irq checks in rtas_call (after the next patch to remove rtas_token from this call path). Rather than play whack a mole with these and forever having fragile code, it seems better to have the early machine check handler perform the same kind of reconcile as the other NMI interrupts. WARNING: CPU: 0 PID: 493 at arch/powerpc/kernel/irq.c:343 CPU: 0 PID: 493 Comm: a Tainted: G W NIP: c00000000001ed2c LR: c000000000042c40 CTR: 0000000000000000 REGS: c0000001fffd38b0 TRAP: 0700 Tainted: G W MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000 CFAR: c00000000001ec90 IRQMASK: 0 GPR00: c000000000043820 c0000001fffd3b40 c0000000012ba300 0000000000000000 GPR04: 0000000048000488 0000000000000000 0000000000000000 00000000deadbeef GPR08: 0000000000000080 0000000000000000 0000000000000000 0000000000001001 GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000001 c000000001360810 0000000000000000 NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100 LR [c000000000042c40] unlock_rtas+0x30/0x90 Call Trace: [c0000001fffd3b40] [c000000001360810] 0xc000000001360810 (unreliable) [c0000001fffd3b60] [c000000000043820] rtas_call+0x1c0/0x280 [c0000001fffd3bb0] [c0000000000dc328] fwnmi_release_errinfo+0x38/0x70 [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540 [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70 [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0 --- interrupt: 200 at 0x13f1307c8 LR = 0x7fff888b8528 Instruction dump: 60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000 60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-5-npiggin@gmail.com
-
Nicholas Piggin authored
A spare interrupt stack slot is needed to save irq state when reconciling NMIs (sreset and decrementer soft-nmi). _DAR is used for this, but we want to reconcile machine checks as well, which do use _DAR. Switch to using RESULT instead, as it's used by system calls. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-4-npiggin@gmail.com
-
Nicholas Piggin authored
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-3-npiggin@gmail.com
-
Nicholas Piggin authored
The architecture allows for machine check exceptions to cause idle wakeups which resume at the 0x200 address which has to return via the idle wakeup code, but the early machine check handler is run first. The case of a no state-loss sleep is broken because the early handler uses non-volatile register r1 , which is needed for the wakeup protocol, but it is not restored. Fix this by loading r1 from the MCE exception frame before returning to the idle wakeup code. Also update the comment which has become stale since the idle rewrite in C. This crash was found and fix confirmed with a machine check injection test in qemu powernv model (which is not upstream in qemu yet). Fixes: 10d91611 ("powerpc/64s: Reimplement book3s idle code in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-2-npiggin@gmail.com
-
Sam Bobroff authored
EEH device state is currently removed (by eeh_remove_device()) during the device release handler, which is invoked as the device's reference count drops to zero. This may take some time, or forever, as other threads may hold references. However, the PCI device state is released synchronously by pci_stop_and_remove_bus_device(). This mismatch causes problems, for example the device may be re-discovered as a new device before the release handler has been called, leaving the PCI and EEH state mismatched. So instead, call eeh_remove_device() from the bus device removal handlers, which are called synchronously in the removal path. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0a1f5105d3a33b1c090bba31de63eb0cdd25de7b.1588045502.git.sbobroff@linux.ibm.com
-
Sam Bobroff authored
If a device is hot unplgged during EEH recovery, it's possible for the RTAS call to ibm,configure-pe in pseries_eeh_configure() to return parameter error (-3), however negative return values are not checked for and this leads to an infinite loop. Fix this by correctly bailing out on negative values. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Link: https://lore.kernel.org/r/1b0a6010a647dc915816e44845b64d72066676a7.1588045502.git.sbobroff@linux.ibm.com
-
Michael Ellerman authored
Currently we don't report anything useful in /proc/<pid>/status: $ grep Speculation_Store_Bypass /proc/self/status Speculation_Store_Bypass: unknown Our mitigation is currently always a barrier instruction, which doesn't map that well onto the existing possibilities for the PR_SPEC values. However even if we added a "barrier" type PR_SPEC value, userspace would still need to consult some other source to work out which type of barrier to use. So reporting "vulnerable" seems sufficient, as userspace can see that and then consult its source to determine what barrier to use. Signed-off-by: Gustavo Walbon <gwalbon@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200402124929.3574166-1-mpe@ellerman.id.au
-