- 07 Feb, 2022 1 commit
-
-
Sourabh Jain authored
On large config LPARs (having 192 and more cores), Linux fails to boot due to insufficient memory in the first memblock. It is due to the memory reservation for the crash kernel which starts at 128MB offset of the first memblock. This memory reservation for the crash kernel doesn't leave enough space in the first memblock to accommodate other essential system resources. The crash kernel start address was set to 128MB offset by default to ensure that the crash kernel get some memory below the RMA region which is used to be of size 256MB. But given that the RMA region size can be 512MB or more, setting the crash kernel offset to mid of RMA size will leave enough space for the kernel to allocate memory for other system resources. Since the above crash kernel offset change is only applicable to the LPAR platform, the LPAR feature detection is pushed before the crash kernel reservation. The rest of LPAR specific initialization will still be done during pseries_probe_fw_features as usual. This patch is dependent on changes to paca allocation for boot CPU. It expect boot CPU to discover 1T segment support which is introduced by the patch posted here: https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.htmlReported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com
-
- 03 Feb, 2022 12 commits
-
-
Christophe Leroy authored
On 603 core, TLB miss handler don't do any change to the page tables so pte_update() doesn't need to be atomic. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cc89d3c11fc9c742d0df3454a657a3a00be24046.1643538554.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
arch/powerpc/include/asm/nohash/{32/64}/pgtable.h has #define __HAVE_ARCH_PTE_SAME #define pte_same(A,B) ((pte_val(A) ^ pte_val(B)) == 0) include/linux/pgtable.h has #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { return pte_val(pte_a) == pte_val(pte_b); } #endif Remove the powerpc version which is similar to the generic one. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/83c97bd58a3596ef1b0ff28b1e41fd492d005520.1643616989.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
On book3s/32 MMU, PP bits don't offer kernel RO protection, kernel pages are always RW. However, on the 603 a page fault is always generated when the C bit (change bit = dirty bit) is not set. Enforce kernel RO protection by clearing C bit in TLB miss handler when the page doesn't have _PAGE_RW flag. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bbb13848ff0100a76ee9ea95118058c30ae95f2c.1643613343.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Since commit 84de6ab0 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") page table is not updated anymore by TLB miss handlers. Remove the comment. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/38b1ffefd2146fa56bf8aa605d476ad9736bbb37.1643613296.git.christophe.leroy@csgroup.eu
-
Chen Jingwen authored
The shadow's page table is not updated when PTE_RPN_SHIFT is 24 and PAGE_SHIFT is 12. It not only causes false positives but also false negative as shown the following text. Fix it by bringing the logic of kasan_early_shadow_page_entry here. 1. False Positive: ================================================================== BUG: KASAN: vmalloc-out-of-bounds in pcpu_alloc+0x508/0xa50 Write of size 16 at addr f57f3be0 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-12267-gdebe436e #1 Call Trace: [c80d1c20] [c07fe7b8] dump_stack_lvl+0x4c/0x6c (unreliable) [c80d1c40] [c02ff668] print_address_description.constprop.0+0x88/0x300 [c80d1c70] [c02ff45c] kasan_report+0x1ec/0x200 [c80d1cb0] [c0300b20] kasan_check_range+0x160/0x2f0 [c80d1cc0] [c03018a4] memset+0x34/0x90 [c80d1ce0] [c0280108] pcpu_alloc+0x508/0xa50 [c80d1d40] [c02fd7bc] __kmem_cache_create+0xfc/0x570 [c80d1d70] [c0283d64] kmem_cache_create_usercopy+0x274/0x3e0 [c80d1db0] [c2036580] init_sd+0xc4/0x1d0 [c80d1de0] [c00044a0] do_one_initcall+0xc0/0x33c [c80d1eb0] [c2001624] kernel_init_freeable+0x2c8/0x384 [c80d1ef0] [c0004b14] kernel_init+0x24/0x170 [c80d1f10] [c001b26c] ret_from_kernel_thread+0x5c/0x64 Memory state around the buggy address: f57f3a80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f57f3b00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 >f57f3b80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ^ f57f3c00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f57f3c80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ================================================================== 2. False Negative (with KASAN tests): ================================================================== Before fix: ok 45 - kmalloc_double_kzfree # vmalloc_oob: EXPECTATION FAILED at lib/test_kasan.c:1039 KASAN failure expected in "((volatile char *)area)[3100]", but none occurred not ok 46 - vmalloc_oob not ok 1 - kasan ================================================================== After fix: ok 1 - kasan Fixes: cbd18991 ("powerpc/mm: Fix an Oops in kasan_mmu_init()") Cc: stable@vger.kernel.org # 5.4.x Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211229035226.59159-1-chenjingwen6@huawei.com
-
Christophe JAILLET authored
'xive_irq_bitmap_add()' can return -ENOMEM. In this case, we should free the memory already allocated and return 'false' to the caller. Also add an error path which undoes the 'tima = ioremap(...)' Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/564998101804886b151235c8a9f93020923bfd2c.1643718324.git.christophe.jaillet@wanadoo.fr
-
Athira Rajeev authored
Trace IMC (In-Memory collection counters) in powerpc is useful for application level profiling. For trace_imc, presently task context (task_ctx_nr) is set to perf_hw_context. But perf_hw_context should only be used for CPU PMU. See commit 26657848 ("perf/core: Verify we have a single perf_hw_context PMU"). So for trace_imc, even though it is per thread PMU, it is preferred to use sw_context in order to be able to do application level monitoring. Hence change the task_ctx_nr to use perf_sw_context. Fixes: 012ae244 ("powerpc/perf: Trace imc PMU functions") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Update subject & incorporate notes into change log, reflow comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202041837.65968-1-atrajeev@linux.vnet.ibm.com
-
Wedson Almeida Filho authored
Without this patch, module init sections are disabled by patching their names in arch-specific code when they're loaded (which prevents code in layout_sections from finding init sections). This patch uses the new arch-specific module_init_section instead. This allows modules that have .init_array sections to have the initialisers properly called (on load, before init). Without this patch, the initialisers are not called because .init_array is renamed to _init_array, and thus isn't found by code in find_module_sections(). Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202055123.2144842-1-wedsonaf@google.com
-
Mamatha Inamdar authored
This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules (descriptions taken from Kconfig file). Signed-off-by: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200924051343.16052.9571.stgit@localhost.localdomain
-
Julia Lawall authored
Other uses of &gang->aff_list_head, eg in spufs_assert_affinity, indicate that the list elements have type spu_context, not spu as used here. Change the type of tmp accordingly. This has no impact on the execution, because tmp is not used in the body of the loop. Fixes: c5fc8d2a ("[CELL] cell: add placement computation for scheduling of affinity contexts") Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1588929176-28527-1-git-send-email-Julia.Lawall@inria.fr
-
Bhaskar Chowdhury authored
s/parmeters/parameters/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210320213932.22697-1-unixbhaskar@gmail.com
-
Fabiano Rosas authored
When figuring out the number of threads, the debug message prints "1 thread" for the first iteration of the loop, instead of the actual number of threads calculated from the length of the "ibm,ppc-interrupt-server#s" property. * /cpus/PowerPC,POWER8@20... ibm,ppc-interrupt-server#s -> 1 threads <--- WRONG thread 0 -> cpu 0 (hard id 32) thread 1 -> cpu 1 (hard id 33) thread 2 -> cpu 2 (hard id 34) thread 3 -> cpu 3 (hard id 35) thread 4 -> cpu 4 (hard id 36) thread 5 -> cpu 5 (hard id 37) thread 6 -> cpu 6 (hard id 38) thread 7 -> cpu 7 (hard id 39) * /cpus/PowerPC,POWER8@28... ibm,ppc-interrupt-server#s -> 8 threads thread 0 -> cpu 8 (hard id 40) thread 1 -> cpu 9 (hard id 41) thread 2 -> cpu 10 (hard id 42) thread 3 -> cpu 11 (hard id 43) thread 4 -> cpu 12 (hard id 44) thread 5 -> cpu 13 (hard id 45) thread 6 -> cpu 14 (hard id 46) thread 7 -> cpu 15 (hard id 47) (...) Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210120181847.952106-1-farosas@linux.ibm.com
-
- 02 Feb, 2022 5 commits
-
-
Michael Ellerman authored
As reported by sparse: arch/powerpc/mm/ptdump/hashpagetable.c:264:29: warning: restricted __be64 degrades to integer arch/powerpc/mm/ptdump/hashpagetable.c:265:49: warning: restricted __be64 degrades to integer arch/powerpc/mm/ptdump/hashpagetable.c:267:36: warning: incorrect type in assignment (different base types) arch/powerpc/mm/ptdump/hashpagetable.c:267:36: expected unsigned long long [usertype] arch/powerpc/mm/ptdump/hashpagetable.c:267:36: got restricted __be64 [usertype] v arch/powerpc/mm/ptdump/hashpagetable.c:268:36: warning: incorrect type in assignment (different base types) arch/powerpc/mm/ptdump/hashpagetable.c:268:36: expected unsigned long long [usertype] arch/powerpc/mm/ptdump/hashpagetable.c:268:36: got restricted __be64 [usertype] r The values returned by plpar_pte_read_4() are CPU endian, not __be64, so assigning them to struct hash_pte confuses sparse. As a minimal fix open code a struct to hold the values with CPU endian types. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202053039.691917-1-mpe@ellerman.id.au
-
Corentin Labbe authored
pci_driver name is const char pointer, so the cast it not necessary. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125135421.4081740-1-clabbe@baylibre.com
-
Michael Ellerman authored
Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size() and paca allocation. The first is that on a Radix capable machine but with "disable_radix" on the command line, there is a window during early boot where early_radix_enabled() is true, even though it will later become false. early_init_devtree: <- early_radix_enabled() = false early_init_dt_scan_cpus: <- early_radix_enabled() = false ... check_cpu_pa_features: <- early_radix_enabled() = false ... ^ <- early_radix_enabled() = TRUE allocate_paca: | <- early_radix_enabled() = TRUE ... | ppc64_bolted_size: | <- early_radix_enabled() = TRUE if (early_radix_enabled())| <- early_radix_enabled() = TRUE return ULONG_MAX; | ... | ... | <- early_radix_enabled() = TRUE ... | <- early_radix_enabled() = TRUE mmu_early_init_devtree() V ... <- early_radix_enabled() = false This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's paca allocation, even though later it will return a different value. This is not currently a bug because the paca allocation is also limited by the RMA size, but that is very fragile. The second issue is that when using the Hash MMU, when we call ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet detected whether 1T segments are available. That causes ppc64_bolted_size() to return 256MB, even if the machine can actually support up to 1T. This is usually OK, we generally have space below 256MB for one paca, but for a kdump kernel placed above 256MB it causes the boot to fail. At boot we cannot discover all the features of the machine instantaneously, so there will always be some periods where we have incomplete knowledge of the system. However both the above problems stem from the fact that we allocate the boot CPU's paca (and paca pointers array) before we decide which MMU we are using, or discover its exact features. Moving the paca allocation slightly later still can solve both the issues described above, and means for a normal boot we don't do any permanent allocations until after we've discovered the MMU. Note that although we move the boot CPU's paca allocation later, we still have a temporary paca (boot_paca) accessible via r13, so code that does read only access to paca fields is safe. The only risk is that some code writes to the boot_paca, and that write will then be lost when we switch away from the boot_paca later in early_setup(). The additional code that runs before the paca allocation is primarily mmu_early_init_devtree(), which is scanning the device tree and populating globals and cur_cpu_spec with MMU related flags. I do not see any additional code that writes to paca fields. [1]: https://lore.kernel.org/r/20211018084434.217772-2-sourabhjain@linux.ibm.com [2]: https://lore.kernel.org/r/20211018084434.217772-3-sourabhjain@linux.ibm.comSigned-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124130544.408675-1-mpe@ellerman.id.au
-
Maxim Kiselev authored
On board rev A, the network interface labels for the switch ports written on the front panel are different than on rev B and later. This patch fixes network interface names for the switch ports according to labels that are written on the front panel of the board rev B. They start from ETH3 and end at ETH10. This patch also introduces a separate device tree for rev A. The main device tree is supposed to cover rev B and later. Fixes: e69eb082 ("powerpc: dts: t1040rdb: add ports for Seville Ethernet switch") Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com> Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220121091447.3412907-1-bigunclemax@gmail.com
-
Laurent Dufour authored
The LPAR name may be changed after the LPAR has been started in the HMC. In that case lparstat command is not reporting the updated value because it reads it from the device tree which is read at boot time. However this value could be read from RTAS. Adding this value in the /proc/powerpc/lparcfg output allows to read the updated value. However the hypervisor, like Qemu/KVM, may not support this RTAS parameter. In that case the value reported in lparcfg is read from the device tree and so is not updated accordingly. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> [mpe: Drop doc-comment syntax, change RTAS/DT to lower case, use of_root to fix missing of_node_put(), use of_property_read_string()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220106161339.74656-1-ldufour@linux.ibm.com
-
- 31 Jan, 2022 5 commits
-
-
Thierry Reding authored
The unit-address for the Maxim MAX1237 ADCs on XPedite5200 boards don't match the value in the "reg" property and cause a DTC warning. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211220134036.683309-1-thierry.reding@gmail.com
-
Maxim Kiselev authored
T1040RDB has two RTL8211E-VB phys which requires setting of internal delays for correct work. Changing the phy-connection-type property to `rgmii-id` will fix this issue. Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com> Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211230151123.1258321-1-bigunclemax@gmail.com
-
Tobias Waldekranz authored
This means an idle guest won't needlessly consume an entire core on the host, waiting for work to show up. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220112112459.1033754-1-troglobit@gmail.com
-
Michal Suchanek authored
The link stack flush status is not visible in debugfs. It can be enabled even when count cache flush is disabled. Add separate file for its status. Signed-off-by: Michal Suchanek <msuchanek@suse.de> [mpe: Update for change to link_stack_flush_type] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191127220959.6208-1-msuchanek@suse.de
-
Sachin Sant authored
Cédric pointed out that XIVE IPI information exported via sysfs (debug/powerpc/xive) display empty lines for processors which are not online. Switch to using for_each_online_cpu() so that information is displayed for online-only processors. Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Sachin Sant <sachinp@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/164146703333.19039.10920919226094771665.sendpatchset@MacBook-Pro.local
-
- 30 Jan, 2022 17 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Borislav Petkov: - Drop an unused private data field in the AIC driver - Various fixes to the realtek-rtl driver - Make the GICv3 ITS driver compile again in !SMP configurations - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec - Yet another kfree/bitmap_free conversion - Various DT updates (Renesas, SiFive) * tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support irqchip/gic-v3-its: Reset each ITS's BASERn register before probe irqchip/gic-v3-its: Fix build for !SMP irqchip/loongson-pch-ms: Use bitmap_free() to free bitmap irqchip/realtek-rtl: Service all pending interrupts irqchip/realtek-rtl: Fix off-by-one in routing irqchip/realtek-rtl: Map control data to virq irqchip/apple-aic: Drop unused ipi_hwirq field
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Borislav Petkov: - Prevent accesses to the per-CPU cgroup context list from another CPU except the one it belongs to, to avoid list corruption - Make sure parent events are always woken up to avoid indefinite hangs in the traced workload * tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix cgroup event list management perf: Always wake the parent event
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fix from Borislav Petkov: "Make sure the membarrier-rseq fence commands are part of the reported set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY" * tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - Add another Intel CPU model to the list of CPUs supporting the processor inventory unique number - Allow writing to MCE thresholding sysfs files again - a previous change had accidentally disabled it and no one noticed. Goes to show how much is this stuff used * tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN x86/MCE/AMD: Allow thresholding interface updates after init
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "12 patches. Subsystems affected by this patch series: sysctl, binfmt, ia64, mm (memory-failure, folios, kasan, and psi), selftests, and ocfs2" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: ocfs2: fix a deadlock when commit trans jbd2: export jbd2_journal_[grab|put]_journal_head psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n mm, kasan: use compare-exchange operation to set KASAN page tag kasan: test: fix compatibility with FORTIFY_SOURCE tools/testing/scatterlist: add missing defines mm: page->mapping folio->mapping should have the same offset memory-failure: fetch compound_head after pgmap_pfn_valid() ia64: make IA64_MCA_RECOVERY bool instead of tristate binfmt_misc: fix crash when load/unload module include/linux/sysctl.h: fix register_sysctl_mount_point() return type
-
Joseph Qi authored
commit 6f1b2285 introduces a regression which can deadlock as follows: Task1: Task2: jbd2_journal_commit_transaction ocfs2_test_bg_bit_allocatable spin_lock(&jh->b_state_lock) jbd_lock_bh_journal_head __jbd2_journal_remove_checkpoint spin_lock(&jh->b_state_lock) jbd2_journal_put_journal_head jbd_lock_bh_journal_head Task1 and Task2 lock bh->b_state and jh->b_state_lock in different order, which finally result in a deadlock. So use jbd2_journal_[grab|put]_journal_head instead in ocfs2_test_bg_bit_allocatable() to fix it. Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.com Fixes: 6f1b2285 ("ocfs2: fix race between searching chunks and release journal_head from buffer_head") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Tested-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Reported-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joseph Qi authored
Patch series "ocfs2: fix a deadlock case". This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols jbd2_journal_[grab|put]_journal_head as preparation and later use them in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the deadlock. This patch (of 2): This exports symbols jbd2_journal_[grab|put]_journal_head, which will be used outside modules, e.g. ocfs2. Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.comSigned-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Suren Baghdasaryan authored
When CONFIG_PROC_FS is disabled psi code generates the following warnings: kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=] 1364 | static const struct proc_ops psi_cpu_proc_ops = { | ^~~~~~~~~~~~~~~~ kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=] 1355 | static const struct proc_ops psi_memory_proc_ops = { | ^~~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=] 1346 | static const struct proc_ops psi_io_proc_ops = { | ^~~~~~~~~~~~~~~ Make definitions of these structures and related functions conditional on CONFIG_PROC_FS config. Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com Fixes: 0e94682b ("psi: introduce psi monitor") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Suren Baghdasaryan authored
When CONFIG_CGROUPS is disabled psi code generates the following warnings: kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes] 1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group, | ^~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes] 1182 | void psi_trigger_destroy(struct psi_trigger *t) | ^~~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes] 1249 | __poll_t psi_trigger_poll(void **trigger_ptr, | ^~~~~~~~~~~~~~~~ Change the declarations of these functions in the header to provide the prototypes even when they are unused. Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com Fixes: 0e94682b ("psi: introduce psi monitor") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Collingbourne authored
It has been reported that the tag setting operation on newly-allocated pages can cause the page flags to be corrupted when performed concurrently with other flag updates as a result of the use of non-atomic operations. Fix the problem by using a compare-exchange loop to update the tag. Link: https://lkml.kernel.org/r/20220120020148.1632253-1-pcc@google.com Link: https://linux-review.googlesource.com/id/I456b24a2b9067d93968d43b4bb3351c0cec63101 Fixes: 2813b9c0 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Marco Elver authored
With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform dynamic checks using __builtin_object_size(ptr), which when failed will panic the kernel. Because the KASAN test deliberately performs out-of-bounds operations, the kernel panics with FORTIFY_SOURCE, for example: | kernel BUG at lib/string_helpers.c:910! | invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI | CPU: 1 PID: 137 Comm: kunit_try_catch Tainted: G B 5.16.0-rc3+ #3 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 | RIP: 0010:fortify_panic+0x19/0x1b | ... | Call Trace: | kmalloc_oob_in_memset.cold+0x16/0x16 | ... Fix it by also hiding `ptr` from the optimizer, which will ensure that __builtin_object_size() does not return a valid size, preventing fortified string functions from panicking. Link: https://lkml.kernel.org/r/20220124160744.1244685-1-elver@google.comSigned-off-by: Marco Elver <elver@google.com> Reported-by: Nico Pache <npache@redhat.com> Reviewed-by: Nico Pache <npache@redhat.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Maor Gottlieb authored
The cited commits replaced preemptible with pagefault_disabled and flush_kernel_dcache_page with flush_dcache_page respectively, hence need to update the corresponding defines in the test. scatterlist.c: In function ‘sg_miter_stop’: scatterlist.c:919:4: warning: implicit declaration of function ‘flush_dcache_page’ [-Wimplicit-function-declaration] flush_dcache_page(miter->page); ^~~~~~~~~~~~~~~~~ In file included from linux/scatterlist.h:8:0, from scatterlist.c:9: scatterlist.c:922:18: warning: implicit declaration of function ‘pagefault_disabled’ [-Wimplicit-function-declaration] WARN_ON_ONCE(!pagefault_disabled()); ^ linux/mm.h:23:25: note: in definition of macro ‘WARN_ON_ONCE’ int __ret_warn_on = !!(condition); \ ^~~~~~~~~ Link: https://lkml.kernel.org/r/20220118082105.1737320-1-maorg@nvidia.com Fixes: 723aca20 ("mm/scatterlist: replace the !preemptible warning in sg_miter_stop()") Fixes: 0e84f5db ("scatterlist: replace flush_kernel_dcache_page with flush_dcache_page") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
As with the other members of folio, the offset of page->mapping and folio->mapping must be the same. The compile-time check was inadvertently removed during development. Add it back. [willy@infradead.org: changelog redo] Link: https://lkml.kernel.org/r/20220104011734.21714-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joao Martins authored
memory_failure_dev_pagemap() at the moment assumes base pages (e.g. dax_lock_page()). For devmap with compound pages fetch the compound_head in case a tail page memory failure is being handled. Currently this is a nop, but in the advent of compound pages in dev_pagemap it allows memory_failure_dev_pagemap() to keep working. Without this fix memory-failure handling (i.e. MCEs on pmem) with device-dax configured namespaces will regress (and crash). Link: https://lkml.kernel.org/r/20211202204422.26777-2-joao.m.martins@oracle.comReported-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Randy Dunlap authored
In linux-next, IA64_MCA_RECOVERY uses the (new) function make_task_dead(), which is not exported for use by modules. Instead of exporting it for one user, convert IA64_MCA_RECOVERY to be a bool Kconfig symbol. In a config file from "kernel test robot <lkp@intel.com>" for a different problem, this linker error was exposed when CONFIG_IA64_MCA_RECOVERY=m. Fixes this build error: ERROR: modpost: "make_task_dead" [arch/ia64/kernel/mca_recovery.ko] undefined! Link: https://lkml.kernel.org/r/20220124213129.29306-1-rdunlap@infradead.org Fixes: 0e25498f ("exit: Add and use make_task_dead.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tong Zhang authored
We should unregister the table upon module unload otherwise something horrible will happen when we load binfmt_misc module again. Also note that we should keep value returned by register_sysctl_mount_point() and release it later, otherwise it will leak. Also, per Christian's comment, to fully restore the old behavior that won't break userspace the check(binfmt_misc_header) should be eliminated. To reproduce: modprobe binfmt_misc modprobe -r binfmt_misc modprobe binfmt_misc modprobe -r binfmt_misc modprobe binfmt_misc resulting in modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory and an unhappy kernel: binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point BUG: unable to handle page fault for address: fffffbfff8004802 Call Trace: init_misc_binfmt+0x2d/0x1000 [binfmt_misc] Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com Fixes: 3ba442d5 ("fs: move binfmt_misc sysctl to its own file") Signed-off-by: Tong Zhang <ztong0001@gmail.com> Co-developed-by: Christian Brauner<brauner@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-