- 30 Aug, 2019 19 commits
-
-
Yong Wu authored
This patch only move the clk_prepare_enable and config_port into the runtime suspend/resume callback. It doesn't change the code content and sequence. This is a preparing patch for adjusting SMI_BUS_SEL for mt8183. (SMI_BUS_SEL need to be restored after smi-common resume every time.) Also it gives a chance to get rid of mtk_smi_larb_get/put which could be a next topic. CC: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
Normally the M4U HW connect EMI with smi. the diagram is like below: EMI | M4U | smi-common | ----------------- | | | | ... larb0 larb1 larb2 larb3 Actually there are 2 mmu cells in the M4U HW, like this diagram: EMI --------- | | mmu0 mmu1 <- M4U | | --------- | smi-common | ----------------- | | | | ... larb0 larb1 larb2 larb3 This patch add support for mmu1. In order to get better performance, we could adjust some larbs go to mmu1 while the others still go to mmu0. This is controlled by a SMI COMMON register SMI_BUS_SEL(0x220). mt2712, mt8173 and mt8183 M4U HW all have 2 mmu cells. the default value of that register is 0 which means all the larbs go to mmu0 defaultly. This is a preparing patch for adjusting SMI_BUS_SEL for mt8183. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
The M4U IP blocks in mt8183 is MediaTek's generation2 M4U which use the ARM Short-descriptor like mt8173, and most of the HW registers are the same. Here list main differences between mt8183 and mt8173/mt2712: 1) mt8183 has only one M4U HW like mt8173 while mt2712 has two. 2) mt8183 don't have the "bclk" clock, it use the EMI clock instead. 3) mt8183 can support the dram over 4GB, but it doesn't call this "4GB mode". 4) mt8183 pgtable base register(0x0) extend bit[1:0] which represent the bit[33:32] in the physical address of the pgtable base, But the standard ttbr0[1] means the S bit which is enabled defaultly, Hence, we add a mask. 5) mt8183 HW has a GALS modules, SMI should enable "has_gals" support. 6) mt8183 need reset_axi like mt8173. 7) the larb-id in smi-common is remapped. M4U should add its larbid_remap. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
In some SoCs like mt8183, SMI add GALS(Global Async Local Sync) module which can help synchronize for the modules in different clock frequency. It can be seen as a "asynchronous fifo". This is a example diagram: M4U | ---------- | | gals0-rx gals1-rx | | | | gals0-tx gals1-tx | | ------------ SMI Common ------------ | +-----+--------+-----+- ... | | | | | gals-rx gals-rx | | | | | | | | | | gals-tx gals-tx | | | | | larb1 larb2 larb3 larb4 GALS only help transfer the command/data while it doesn't have the configuring register, thus it has the special "smi" clock and doesn't have the "apb" clock. From the diagram above, we add "gals0" and "gals1" clocks for smi-common and add a "gals" clock for smi-larb. This patch adds gals clock supporting in the SMI. Note that some larbs may still don't have the "gals" clock like larb1 and larb4 above. This is also a preparing patch for mt8183 which has GALS. CC: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
Both mt8173 and mt8183 don't have this vld_pa_rng(valid physical address range) register while mt2712 have. Move it into the plat_data. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
In mt8173 and mt8183, 0x48 is REG_MMU_STANDARD_AXI_MODE while it is REG_MMU_CTRL in the other SoCs, and the bits meaning is completely different with the REG_MMU_STANDARD_AXI_MODE. This patch moves this property to plat_data, it's also a preparing patch for mt8183. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Nicolas Boichat <drinkcat@chromium.org> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
The protect memory setting is a little different in the different SoCs. In the register REG_MMU_CTRL_REG(0x110), the TF_PROT(translation fault protect) shift bit is normally 4 while it shift 5 bits only in the mt8173. This patch delete the complex MACRO and use a common if-else instead. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
The larb-id may be remapped in the smi-common, this means the larb-id reported in the mtk_iommu_isr isn't the real larb-id, Take mt8183 as a example: M4U | --------------------------------------------- | SMI common | -0-----7-----5-----6-----1-----2------3-----4- <- Id remapped | | | | | | | | larb0 larb1 IPU0 IPU1 larb4 larb5 larb6 CCU disp vdec img cam venc img cam As above, larb0 connects with the id 0 in smi-common. larb1 connects with the id 7 in smi-common. ... If the larb-id reported in the isr is 7, actually it's larb1(vdec). In order to output the right larb-id in the isr, we add a larb-id remapping relationship in this patch. If there is no this larb-id remapping in some SoCs, use the linear mapping array instead. This also is a preparing patch for mt8183. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Nicolas Boichat <drinkcat@chromium.org> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
In some SoCs, M4U doesn't have its "bclk", it will use the EMI clock instead which has always been enabled when entering kernel. Currently mt2712 and mt8173 have this bclk while mt8183 doesn't. This also is a preparing patch for mt8183. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Evan Green <evgreen@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
After extending the v7s support PA[33:32] for MediaTek, we have to adjust the PA ourself for the 4GB mode. In the 4GB Mode, the PA will remap like this: CPU PA -> M4U output PA 0x4000_0000 0x1_4000_0000 (Add bit32) 0x8000_0000 0x1_8000_0000 ... 0xc000_0000 0x1_c000_0000 ... 0x1_0000_0000 0x1_0000_0000 (No change) 1) Always add bit32 for CPU PA in ->map. 2) Discard the bit32 in iova_to_phys if PA > 0x1_4000_0000 since the iommu consumer always use the CPU PA. Besides, the "oas" always is set to 34 since v7s has already supported our case. Both mt2712 and mt8173 support this "4GB mode" while the mt8183 don't. The PA in mt8183 won't remap. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
MediaTek extend the arm v7s descriptor to support up to 34 bits PA where the bit32 and bit33 are encoded in the bit9 and bit4 of the PTE respectively. Meanwhile the iova still is 32bits. Regarding whether the pagetable address could be over 4GB, the mt8183 support it while the previous mt8173 don't, thus keep it as is. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
In previous mt2712/mt8173, MediaTek extend the v7s to support 4GB dram. But in the latest mt8183, We extend it to support the PA up to 34bit. Then the "MTK_4GB" name is not so fit, This patch only change the quirk name to "MTK_EXT". Signed-off-by: Yong Wu <yong.wu@mediatek.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
Use ias/oas to check the valid iova/pa. Synchronize this checking with io-pgtable-arm.c. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
Add two helper functions: paddr_to_iopte and iopte_to_paddr. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Evan Green <evgreen@chromium.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
In M4U 4GB mode, the physical address is remapped as below: CPU Physical address: ==================== 0 1G 2G 3G 4G 5G |---A---|---B---|---C---|---D---|---E---| +--I/O--+------------Memory-------------+ IOMMU output physical address: ============================= 4G 5G 6G 7G 8G |---E---|---B---|---C---|---D---| +------------Memory-------------+ The Region 'A'(I/O) can not be mapped by M4U; For Region 'B'/'C'/'D', the bit32 of the CPU physical address always is needed to set, and for Region 'E', the CPU physical address keep as is. something looks like this: CPU PA -> M4U OUTPUT PA 0x4000_0000 0x1_4000_0000 (Add bit32) 0x8000_0000 0x1_8000_0000 ... 0xc000_0000 0x1_c000_0000 ... 0x1_0000_0000 0x1_0000_0000 (No change) Additionally, the iommu consumers always use the CPU phyiscal address. The PA in the iova_to_phys that is got from v7s always is u32, But from the CPU point of view, PA only need add BIT(32) when PA < 0x4000_0000. Fixes: 30e2fccf ("iommu/mediatek: Enlarge the validate PA range for 4GB mode") Signed-off-by: Yong Wu <yong.wu@mediatek.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
Use a struct as the platform special data instead of the enumeration. Also there is a minor change that moving the position of "enum mtk_smi_gen" definition, this is because we expect define "struct mtk_smi_common_plat" before it is referred. This is a preparing patch for mt8183. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: Evan Green <evgreen@chromium.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
The config_port of mt2712 and mt8183 are the same. Use a general config_port interface instead. In addition, in mt2712, larb8 and larb9 are the bdpsys larbs which are not the normal larb, their register space are different from the normal one. thus, we can not call the general config_port. In mt8183, IPU0/1 and CCU connect with smi-common directly, they also are not the normal larb. Hence, we add a "larb_direct_to_common_mask" for these larbs which connect to smi-commmon directly. This is also a preparing patch for adding mt8183 SMI support. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: Evan Green <evgreen@chromium.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
Use a struct as the platform special data instead of the enumeration. This is a prepare patch for adding mt8183 iommu support. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: Evan Green <evgreen@chromium.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Yong Wu authored
This patch adds decriptions for mt8183 IOMMU and SMI. mt8183 has only one M4U like mt8173 and is also MTK IOMMU gen2 which uses ARM Short-Descriptor translation table format. The mt8183 M4U-SMI HW diagram is as below: EMI | M4U | ---------- | | gals0-rx gals1-rx | | | | gals0-tx gals1-tx | | ------------ SMI Common ------------ | +-----+-----+--------+-----+-----+-------+-------+ | | | | | | | | | | gals-rx gals-rx | gals-rx gals-rx gals-rx | | | | | | | | | | | | | | | | | | gals-tx gals-tx | gals-tx gals-tx gals-tx | | | | | | | | larb0 larb1 IPU0 IPU1 larb4 larb5 larb6 CCU disp vdec img cam venc img cam All the connections are HW fixed, SW can NOT adjust it. Compared with mt8173, we add a GALS(Global Async Local Sync) module between SMI-common and M4U, and additional GALS between larb2/3/5/6 and SMI-common. GALS can help synchronize for the modules in different clock frequency, it can be seen as a "asynchronous fifo". GALS can only help transfer the command/data while it doesn't have the configuring register, thus it has the special "smi" clock and it doesn't have the "apb" clock. From the diagram above, we add "gals0" and "gals1" clocks for smi-common and add a "gals" clock for smi-larb. >From the diagram above, IPU0/IPU1(Image Processor Unit) and CCU(Camera Control Unit) is connected with smi-common directly, we can take them as "larb2", "larb3" and "larb7", and their register spaces are different with the normal larb. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Evan Green <evgreen@chromium.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
- 25 Aug, 2019 21 commits
-
-
Linus Torvalds authored
-
git://github.com/ojeda/linuxLinus Torvalds authored
Pull auxdisplay cleanup from Miguel Ojeda: "Make ht16k33_fb_fix and ht16k33_fb_var constant (Nishka Dasgupta)" * tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linux: auxdisplay: ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constant
-
git://git.kernel.org/pub/scm/linux/kernel/git/rw/umlLinus Torvalds authored
Pull UML fix from Richard Weinberger: "Fix time travel mode" * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: fix time travel mode
-
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifsLinus Torvalds authored
Pull UBIFS and JFFS2 fixes from Richard Weinberger: "UBIFS: - Don't block too long in writeback_inodes_sb() - Fix for a possible overrun of the log head - Fix double unlock in orphan_delete() JFFS2: - Remove C++ style from UAPI header and unbreak picky toolchains" * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Limit the number of pages in shrink_liability ubifs: Correctly initialize c->min_log_bytes ubifs: Fix double unlock around orphan_delete() jffs2: Remove C++ style comments from uapi header
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Thomas Gleixner: "A few fixes for x86: - Fix a boot regression caused by the recent bootparam sanitizing change, which escaped the attention of all people who reviewed that code. - Address a boot problem on machines with broken E820 tables caused by an underflow which ended up placing the trampoline start at physical address 0. - Handle machines which do not advertise a legacy timer of any form, but need calibration of the local APIC timer gracefully by making the calibration routine independent from the tick interrupt. Marked for stable as well as there seems to be quite some new laptops rolled out which expose this. - Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are affected by broken firmware which does not initialize RDRAND correctly after resume. Add a command line parameter to override this for machine which either do not use suspend/resume or have a fixed BIOS. Unfortunately there is no way to detect this on boot, so the only safe decision is to turn it off by default. - Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which caused fast KVM instruction emulation to break. - Explain the Intel CPU model naming convention so that the repeating discussions come to an end" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 x86/boot: Fix boot regression caused by bootparam sanitizing x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h x86/boot/compressed/64: Fix boot on machines with broken E820 table x86/apic: Handle missing global clockevent gracefully x86/cpu: Explain Intel model naming convention
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timekeeping fix from Thomas Gleixner: "A single fix for a regression caused by the generic VDSO implementation where a math overflow causes CLOCK_BOOTTIME to become a random number generator" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fix from Thomas Gleixner: "Handle the worker management in situations where a task is scheduled out on a PI lock contention correctly and schedule a new worker if possible" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Schedule new worker even if PI-blocked
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Thomas Gleixner: "Two small fixes for kprobes and perf: - Prevent a deadlock in kprobe_optimizer() causes by reverse lock ordering - Fix a comment typo" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes: Fix potential deadlock in kprobe_optimizer() perf/x86: Fix typo in comment
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fix from Thomas Gleixner: "A single fix for a imbalanced kobject operation in the irq decriptor code which was unearthed by the new warnings in the kobject code" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Properly pair kobject_del() with kobject_add()
-
Linus Torvalds authored
Mergr misc fixes from Andrew Morton: "11 fixes" Mostly VM fixes, one psi polling fix, and one parisc build fix. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y mm/zsmalloc.c: fix race condition in zs_destroy_pool mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely mm, page_owner: handle THP splits correctly userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx psi: get poll_work to run when calling poll syscall next time mm: memcontrol: flush percpu vmevents before releasing memcg mm: memcontrol: flush percpu vmstats before releasing memcg parisc: fix compilation errrors mm, page_alloc: move_freepages should not examine struct page of reserved memory mm/z3fold.c: fix race between migration and destruction
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fixes from Christoph Hellwig: "Two fixes for regressions in this merge window: - select the Kconfig symbols for the noncoherent dma arch helpers on arm if swiotlb is selected, not just for LPAE to not break then Xen build, that uses swiotlb indirectly through swiotlb-xen - fix the page allocator fallback in dma_alloc_contiguous if the CMA allocation fails" * tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping: dma-direct: fix zone selection after an unaddressable CMA allocation arm: select the dma-noncoherent symbols for all swiotlb builds
-
Andrey Ryabinin authored
The code like this: ptr = kmalloc(size, GFP_KERNEL); page = virt_to_page(ptr); offset = offset_in_page(ptr); kfree(page_address(page) + offset); may produce false-positive invalid-free reports on the kernel with CONFIG_KASAN_SW_TAGS=y. In the example above we lose the original tag assigned to 'ptr', so kfree() gets the pointer with 0xFF tag. In kfree() we check that 0xFF tag is different from the tag in shadow hence print false report. Instead of just comparing tags, do the following: 1) Check that shadow doesn't contain KASAN_TAG_INVALID. Otherwise it's double-free and it doesn't matter what tag the pointer have. 2) If pointer tag is different from 0xFF, make sure that tag in the shadow is the same as in the pointer. Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com Fixes: 7f94ffbc ("kasan: add hooks implementation for tag-based mode") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Walter Wu <walter-zh.wu@mediatek.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Henry Burns authored
In zs_destroy_pool() we call flush_work(&pool->free_work). However, we have no guarantee that migration isn't happening in the background at that time. Since migration can't directly free pages, it relies on free_work being scheduled to free the pages. But there's nothing preventing an in-progress migrate from queuing the work *after* zs_unregister_migration() has called flush_work(). Which would mean pages still pointing at the inode when we free it. Since we know at destroy time all objects should be free, no new migrations can come in (since zs_page_isolate() fails for fully-free zspages). This means it is sufficient to track a "# isolated zspages" count by class, and have the destroy logic ensure all such pages have drained before proceeding. Keeping that state under the class spinlock keeps the logic straightforward. In this case a memory leak could lead to an eventual crash if compaction hits the leaked page. This crash would only occur if people are changing their zswap backend at runtime (which eventually starts destruction). Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com Fixes: 48b4800a ("zsmalloc: page migration support") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Henry Burns authored
In zs_page_migrate() we call putback_zspage() after we have finished migrating all pages in this zspage. However, the return value is ignored. If a zs_free() races in between zs_page_isolate() and zs_page_migrate(), freeing the last object in the zspage, putback_zspage() will leave the page in ZS_EMPTY for potentially an unbounded amount of time. To fix this, we need to do the same thing as zs_page_putback() does: schedule free_work to occur. To avoid duplicated code, move the sequence to a new putback_zspage_deferred() function which both zs_page_migrate() and zs_page_putback() call. Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com Fixes: 48b4800a ("zsmalloc: page migration support") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
THP splitting path is missing the split_page_owner() call that split_page() has. As a result, split THP pages are wrongly reported in the page_owner file as order-9 pages. Furthermore when the former head page is freed, the remaining former tail pages are not listed in the page_owner file at all. This patch fixes that by adding the split_page_owner() call into __split_huge_page(). Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz Fixes: a9627bc5 ("mm/page_owner: introduce split_page_owner and replace manual handling") Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if mm->core_state != NULL. Otherwise a page fault can see userfaultfd_missing() == T and use an already freed userfaultfd_ctx. Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com Fixes: 04f5866e ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Peter Xu <peterx@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jason Xing authored
Only when calling the poll syscall the first time can user receive POLLPRI correctly. After that, user always fails to acquire the event signal. Reproduce case: 1. Get the monitor code in Documentation/accounting/psi.txt 2. Run it, and wait for the event triggered. 3. Kill and restart the process. The question is why we can end up with poll_scheduled = 1 but the work not running (which would reset it to 0). And the answer is because the scheduling side sees group->poll_kworker under RCU protection and then schedules it, but here we cancel the work and destroy the worker. The cancel needs to pair with resetting the poll_scheduled flag. Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.comSigned-off-by: Jason Xing <kerneljasonxing@linux.alibaba.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Caspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roman Gushchin authored
Similar to vmstats, percpu caching of local vmevents leads to an accumulation of errors on non-leaf levels. This happens because some leftovers may remain in percpu caches, so that they are never propagated up by the cgroup tree and just disappear into nonexistence with on releasing of the memory cgroup. To fix this issue let's accumulate and propagate percpu vmevents values before releasing the memory cgroup similar to what we're doing with vmstats. Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate only over online cpus. Link: http://lkml.kernel.org/r/20190819202338.363363-4-guro@fb.com Fixes: 42a30035 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roman Gushchin authored
Percpu caching of local vmstats with the conditional propagation by the cgroup tree leads to an accumulation of errors on non-leaf levels. Let's imagine two nested memory cgroups A and A/B. Say, a process belonging to A/B allocates 100 pagecache pages on the CPU 0. The percpu cache will spill 3 times, so that 32*3=96 pages will be accounted to A/B and A atomic vmstat counters, 4 pages will remain in the percpu cache. Imagine A/B is nearby memory.max, so that every following allocation triggers a direct reclaim on the local CPU. Say, each such attempt will free 16 pages on a new cpu. That means every percpu cache will have -16 pages, except the first one, which will have 4 - 16 = -12. A/B and A atomic counters will not be touched at all. Now a user removes A/B. All percpu caches are freed and corresponding vmstat numbers are forgotten. A has 96 pages more than expected. As memory cgroups are created and destroyed, errors do accumulate. Even 1-2 pages differences can accumulate into large numbers. To fix this issue let's accumulate and propagate percpu vmstat values before releasing the memory cgroup. At this point these numbers are stable and cannot be changed. Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate only over online cpus. Link: http://lkml.kernel.org/r/20190819202338.363363-2-guro@fb.com Fixes: 42a30035 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Qian Cai authored
Commit 0cfaee2a ("include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used") converted a few functions from macros to static inline, which causes parisc to complain, In file included from include/asm-generic/4level-fixup.h:38:0, from arch/parisc/include/asm/pgtable.h:5, from arch/parisc/include/asm/io.h:6, from include/linux/io.h:13, from sound/core/memory.c:9: include/asm-generic/5level-fixup.h:14:18: error: unknown type name 'pgd_t'; did you mean 'pid_t'? #define p4d_t pgd_t ^ include/asm-generic/5level-fixup.h:24:28: note: in expansion of macro 'p4d_t' static inline int p4d_none(p4d_t p4d) ^~~~~ It is because "4level-fixup.h" is included before "asm/page.h" where "pgd_t" is defined. Link: http://lkml.kernel.org/r/20190815205305.1382-1-cai@lca.pw Fixes: 0cfaee2a ("include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used") Signed-off-by: Qian Cai <cai@lca.pw> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Rientjes authored
After commit 907ec5fc ("mm: zero remaining unavailable struct pages"), struct page of reserved memory is zeroed. This causes page->flags to be 0 and fixes issues related to reading /proc/kpageflags, for example, of reserved memory. The VM_BUG_ON() in move_freepages_block(), however, assumes that page_zone() is meaningful even for reserved memory. That assumption is no longer true after the aforementioned commit. There's no reason why move_freepages_block() should be testing the legitimacy of page_zone() for reserved memory; its scope is limited only to pages on the zone's freelist. Note that pfn_valid() can be true for reserved memory: there is a backing struct page. The check for page_to_nid(page) is also buggy but reserved memory normally only appears on node 0 so the zeroing doesn't affect this. Move the debug checks to after verifying PageBuddy is true. This isolates the scope of the checks to only be for buddy pages which are on the zone's freelist which move_freepages_block() is operating on. In this case, an incorrect node or zone is a bug worthy of being warned about (and the examination of struct page is acceptable bcause this memory is not reserved). Why does move_freepages_block() gets called on reserved memory? It's simply math after finding a valid free page from the per-zone free area to use as fallback. We find the beginning and end of the pageblock of the valid page and that can bring us into memory that was reserved per the e820. pfn_valid() is still true (it's backed by a struct page), but since it's zero'd we shouldn't make any inferences here about comparing its node or zone. The current node check just happens to succeed most of the time by luck because reserved memory typically appears on node 0. The fix here is to validate that we actually have buddy pages before testing if there's any type of zone or node strangeness going on. We noticed it almost immediately after bringing 907ec5fc in on CONFIG_DEBUG_VM builds. It depends on finding specific free pages in the per-zone free area where the math in move_freepages() will bring the start or end pfn into reserved memory and wanting to claim that entire pageblock as a new migratetype. So the path will be rare, require CONFIG_DEBUG_VM, and require fallback to a different migratetype. Some struct pages were already zeroed from reserve pages before 907ec5fca3c so it theoretically could trigger before this commit. I think it's rare enough under a config option that most people don't run that others may not have noticed. I wouldn't argue against a stable tag and the backport should be easy enough, but probably wouldn't single out a commit that this is fixing. Mel said: : The overhead of the debugging check is higher with this patch although : it'll only affect debug builds and the path is not particularly hot. : If this was a concern, I think it would be reasonable to simply remove : the debugging check as the zone boundaries are checked in : move_freepages_block and we never expect a zone/node to be smaller than : a pageblock and stuck in the middle of another zone. Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1908122036560.10779@chino.kir.corp.google.comSigned-off-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-