- 23 Feb, 2019 40 commits
-
-
Christophe Leroy authored
thread_info is not anymore in the stack, so the entire stack can now be used. There is also no risk anymore of corrupting task_cpu(p) with a stack overflow so the patch removes the test. When doing this, an explicit test for NULL stack pointer is needed in validate_sp() as it is not anymore implicitely covered by the sizeof(thread_info) gap. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. - Modifies klp_init_thread_info() to take a task_struct pointer argument. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add task_stack.h to livepatch.h to fix build fails] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Make sure CURRENT_THREAD_INFO() is used with r1 which is the virtual address of the stack, in order to ease the switch to r2 when we enable THREAD_INFO_IN_TASK, as we have no register having the phys address of current. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Change current_pt_regs() to use task_stack_page() rather than current_thread_info() so that it keeps working once we enable THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of large patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
When we enable THREAD_INFO_IN_TASK we will remove our definition of current_thread_info(). Instead it will come from linux/thread_info.h So switch processor.h to include the latter, so that it can continue to find current_thread_info(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof the type which is the same value but will continue to work. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Rather than using the thread info use task_stack_page() to initialise paca->kstack, that way it will work with THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Update a few comments that talk about current_thread_info() in preparation for THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
We have a few places that use current_thread_info()->task to access current. This won't work with THREAD_INFO_IN_TASK so fix them now. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
A few places use CURRENT_THREAD_INFO, or the C version, to find the stack. This will no longer work with THREAD_INFO_IN_TASK so change them to find the stack in other ways. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
The purpose of the pointer given to call_do_softirq() and call_do_irq() is to point the new stack. Currently that's the same thing as the thread_info, but won't be with THREAD_INFO_IN_TASK. So change the parameter to void* and rename it 'sp'. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This patch renames THREAD_INFO to TASK_STACK, because it is in fact the offset of the pointer to the stack in task_struct so this pointer will not be impacted by the move of THREAD_INFO. Also make it available on 64-bit, as we'll need it there when we activate THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make available on 64-bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
[text copied from commit 9bbd4c56 ("arm64: prep stack walkers for THREAD_INFO_IN_TASK")] When CONFIG_THREAD_INFO_IN_TASK is selected, task stacks may be freed before a task is destroyed. To account for this, the stacks are refcounted, and when manipulating the stack of another task, it is necessary to get/put the stack to ensure it isn't freed and/or re-used while we do so. This patch reworks the powerpc stack walking code to account for this. When CONFIG_THREAD_INFO_IN_TASK is not selected these perform no refcounting, and this should only be a structural change that does not affect behaviour. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move try_get_task_stack() below tsk == NULL check in show_stack()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field gets moved into task_struct and only defined when CONFIG_SMP is set. This patch ensures that TI_CPU is only used when CONFIG_SMP is set and that task_struct 'cpu' field is not used directly out of SMP code. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes asm/current.h. This generates a circular dependency. To avoid that, asm/processor.h shall not be included in mmu-hash.h. In order to do that, this patch moves into a new header called asm/task_size_64/32.h all the TASK_SIZE related constants, which can then be included in mmu-hash.h directly. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out all the TASK_SIZE constants not just 64-bit ones] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Since only the virtual address of allocated blocks is used, lets use functions returning directly virtual address. Those functions have the advantage of also zeroing the block. Suggested-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
In __secondary_start() we load the thread_info of the idle task of the secondary CPU from current_set[cpu], and then convert it into a stack pointer before storing that back to paca->kstack. As pointed out in commit f761622e ("powerpc: Initialise paca->kstack before early_setup_secondary") it's important that we initialise paca->kstack before calling the MMU setup code, in particular slb_initialize(), because it will bolt the SLB entry for the kstack into the SLB. However we have already setup paca->kstack in cpu_idle_thread_init(), since commit 3b575064 ("[POWERPC] Bolt in SLB entry for kernel stack on secondary cpus") (May 2008). It's also in cpu_idle_thread_init() that we initialise current_set[cpu] with the thread_info pointer, so there is no issue of the timing being different between the two. Therefore the initialisation of paca->kstack in __setup_secondary() is completely redundant, so remove it. This has the added benefit of removing code that runs in real mode, and is therefore restricted by the RMO, and so opens the way for us to enable THREAD_INFO_IN_TASK. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Currently in system_call_exit() we have an optimisation where we disable MSR_RI (recoverable interrupt) and MSR_EE (external interrupt enable) in a single mtmsrd instruction. Unfortunately this will no longer work with THREAD_INFO_IN_TASK, because then the load of TI_FLAGS might fault and faulting with MSR_RI clear is treated as an unrecoverable exception which leads to a panic(). So change the code to only clear MSR_EE prior to loading TI_FLAGS, leaving the clear of MSR_RI until later. We have some latitude in where do the clear of MSR_RI. A bit of experimentation has shown that this location gives the least slow down. This still causes a noticeable slow down in our null_syscall performance. On a Power9 DD2.2: Before After Delta Delta % 955 cycles 999 cycles -44 -4.6% On the plus side this does simplify the code somewhat, because we don't have to reenable MSR_RI on the restore_math() or syscall_exit_work() paths which was necessitated previously by the optimisation. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Andrew Donnellan authored
kcov provides kernel coverage data that's useful for fuzzing tools like syzkaller. Wire up kcov support on powerpc. Disable kcov instrumentation on the same files where we currently disable gcov and UBSan instrumentation, plus some additional exclusions which appear necessary to boot on book3e machines. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Daniel Axtens <dja@axtens.net> # e6500 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
On 8xx, large pages (512kb or 8M) are used to map kernel linear memory. Aligning to 8M reduces TLB misses as only 8M pages are used in that case. We make 8M the default for data. This patchs allows the user to do it via Kconfig. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This patch implements handling of STRICT_KERNEL_RWX with large TLBs directly in the TLB miss handlers. To do so, etext and sinittext are aligned on 512kB boundaries and the miss handlers use 512kB pages instead of 8Mb pages for addresses close to the boundaries. It sets RO PP flags for addresses under sinittext. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Depending on the number of available BATs for mapping the different kernel areas, it might be needed to increase the alignment of _etext and/or of data areas. This patchs allows the user to do it via Kconfig. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Today, STRICT_KERNEL_RWX is based on the use of regular pages to map kernel pages. On Book3s 32, it has three consequences: - Using pages instead of BAT for mapping kernel linear memory severely impacts performance. - Exec protection is not effective because no-execute cannot be set at page level (except on 603 which doesn't have hash tables) - Write protection is not effective because PP bits do not provide RO mode for kernel-only pages (except on 603 which handles it in software via PAGE_DIRTY) On the 603+, we have: - Independent IBAT and DBAT allowing limitation of exec parts. - NX bit can be set in segment registers to forbit execution on memory mapped by pages. - RO mode on DBATs even for kernel-only blocks. On the 601, there is nothing much we can do other than warn the user about it, because: - BATs are common to instructions and data. - BAT do not provide RO mode for kernel-only blocks. - segment registers don't have the NX bit. In order to use IBAT for exec protection, this patch: - Aligns _etext to BAT block sizes (128kb) - Set NX bit in kernel segment register (Except on vmalloc area when CONFIG_MODULES is selected) - Maps kernel text with IBATs. In order to use DBAT for exec protection, this patch: - Aligns RW DATA to BAT block sizes (4M) - Maps kernel RO area with write prohibited DBATs - Maps remaining memory with remaining DBATs Here is what we get with this patch on a 832x when activating STRICT_KERNEL_RWX: Symbols: c0000000 T _stext c0680000 R __start_rodata c0680000 R _etext c0800000 T __init_begin c0800000 T _sinittext ~# cat /sys/kernel/debug/block_address_translation ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent 1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent 2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent 3: - 4: - 5: - 6: - 7: - ---[ Data Block Address Translation ]--- 0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent 1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent 2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent 3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent 4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent 5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent 6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent 7: - ~# cat /sys/kernel/debug/segment_registers ---[ User Segments ]--- 0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0 0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1 0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2 0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903 0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14 0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25 0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36 0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47 0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58 0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69 0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a 0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b ---[ Kernel Segments ]--- 0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc 0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd 0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee 0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs: 16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb (A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block) Aligning data to 4M allows to map up to 512Mb data with 8 DBATs: 16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb Because some processors only have 4 BATs and because some targets need DBATs for mapping other areas, the following patch will allow to modify _etext and data alignment. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
setibat() and clearibat() allows to manipulate IBATs independently of DBATs. update_bats() allows to update bats after init. This is done with MMU off. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
CONFIG_STRICT_KERNEL_RWX requires a special alignment for DATA for some subarches. Today it is just defined as an #ifdef in vmlinux.lds.S In order to get more flexibility, this patch moves the definition of this alignment in Kconfig On some subarches, CONFIG_STRICT_KERNEL_RWX will require a special alignment of _etext. This patch also adds a configuration item for it in Kconfig Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This patch defined CONFIG_PPC_PAGE_SHIFT in order to be able to use PAGE_SHIFT value inside Kconfig. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Add a helper to know whether STRICT_KERNEL_RWX is enabled. This is based on rodata_enabled flag which is defined only when CONFIG_STRICT_KERNEL_RWX is selected. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This patch add an helper which wraps 'mtsrin' instruction to write into segment registers. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Do not set IBAT when setbat() is called without _PAGE_EXEC Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
wii_mmu_mapin_mem2() is not used anymore, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
When CONFIG_BDI_SWITCH is set, the page tables have to be populated allthough large TLBs are used, because the BDI switch knows nothing about those large TLBs which are handled directly in TLB miss logic. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Now that mmu_mapin_ram() is able to handle other blocks than the one starting at 0, the WII can use it for all its blocks. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This patch reworks mmu_mapin_ram() to be more generic and map as much blocks as possible. It now supports blocks not starting at address 0. It scans DBATs array to find free ones instead of forcing the use of BAT2 and BAT3. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
At the time being, mmu_mapin_ram() always maps RAM from the beginning. But some platforms like the WII have to map a second block of RAM. This patch adds to mmu_mapin_ram() the base address of the block. At the moment, only base address 0 is supported. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC deny the use of BATS for mapping memory. This patch makes sure that the specific wii RAM mapping function takes it into account as well. Fixes: de32400d ("wii: use both mem1 and mem2 as ram") Cc: stable@vger.kernel.org Reviewed-by: Jonathan Neuschafer <j.neuschaefer@gmx.net> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
At the time being, initial MMU setup allows 24 Mbytes of DATA and 8 Mbytes of code. Some debug setup like CONFIG_KASAN generate huge kernels with text size over the 8M limit and data over the 24 Mbytes limit. Here is an 8xx kernel compiled with CONFIG_KASAN_INLINE for one of my boards: [root@po16846vm linux-powerpc]# size -x vmlinux text data bss dec hex filename 0x111019c 0x41b0d4 0x490de0 26984528 19bc050 vmlinux This patch maps up to 32 Mbytes code based on _einittext symbol and allows 32 Mbytes of memory instead of 24. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This patch replaces most #ifdef mess by IS_ENABLED() in 8xx_mmu.c This has the advantage of allowing syntax verification at compile time regardless of selected options. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sandipan Das authored
This adds test cases for the addc[.] instruction. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sandipan Das authored
This adds test cases for the add[.] instruction. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sandipan Das authored
This enhances the current selftest framework for validating the in-kernel instruction emulation infrastructure by adding support for compute type instructions i.e. integer ALU-based instructions. Originally, this framework was limited to only testing load and store instructions. While most of the GPRs can be validated, support for SPRs is limited to LR, CR and XER for now. When writing the test cases, one must ensure that the Stack Pointer (GPR1) or the Thread Pointer (GPR13) are not touched by any means as these are vital non-volatile registers. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> [mpe: Use patch_site for the code patching] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-