- 02 Apr, 2020 40 commits
-
-
Sebastian Andrzej Siewior authored
The proc file `compact_unevictable_allowed' should allow 0 and 1 only, the `extra*' attribues have been set properly but without proc_dointvec_minmax() as the `proc_handler' the limit will not be enforced. Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid specified range. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Link: http://lkml.kernel.org/r/20200303202054.gsosv7fsx2ma3cic@linutronix.deSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
Dan reports: The patch 5e1f0f09: "mm, compaction: capture a page under direct compaction" from Mar 5, 2019, leads to the following Smatch complaint: mm/compaction.c:2321 compact_zone_order() error: we previously assumed 'capture' could be null (see line 2313) mm/compaction.c 2288 static enum compact_result compact_zone_order(struct zone *zone, int order, 2289 gfp_t gfp_mask, enum compact_priority prio, 2290 unsigned int alloc_flags, int classzone_idx, 2291 struct page **capture) ^^^^^^^ 2313 if (capture) ^^^^^^^ Check for NULL 2314 current->capture_control = &capc; 2315 2316 ret = compact_zone(&cc, &capc); 2317 2318 VM_BUG_ON(!list_empty(&cc.freepages)); 2319 VM_BUG_ON(!list_empty(&cc.migratepages)); 2320 2321 *capture = capc.page; ^^^^^^^^ Unchecked dereference. 2322 current->capture_control = NULL; 2323 In practice this is not an issue, as the only caller path passes non-NULL capture: __alloc_pages_direct_compact() struct page *page = NULL; try_to_compact_pages(capture = &page); compact_zone_order(capture = capture); So let's remove the unnecessary check, which should also make Smatch happy. Fixes: 5e1f0f09 ("mm, compaction: capture a page under direct compaction") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: http://lkml.kernel.org/r/18b0df3c-0589-d96c-23fa-040798fee187@suse.czSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rik van Riel authored
The code to implement THP migrations already exists, and the code for CMA to clear out a region of memory already exists. Only a few small tweaks are needed to allow CMA to move THP memory when attempting an allocation from alloc_contig_range. With these changes, migrating THPs from a CMA area works when allocating a 1GB hugepage from CMA memory. [riel@surriel.com: fix hugetlbfs pages per Mike, cleanup per Vlastimil] Link: http://lkml.kernel.org/r/20200228104700.0af2f18d@imladris.surriel.comSigned-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joonsoo Kim <js1304@gmail.com> Link: http://lkml.kernel.org/r/20200227213238.1298752-2-riel@surriel.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rik van Riel authored
Patch series "fix THP migration for CMA allocations", v2. Transparent huge pages are allocated with __GFP_MOVABLE, and can end up in CMA memory blocks. Transparent huge pages also have most of the infrastructure in place to allow migration. However, a few pieces were missing, causing THP migration to fail when attempting to use CMA to allocate 1GB hugepages. With these patches in place, THP migration from CMA blocks seems to work, both for anonymous THPs and for tmpfs/shmem THPs. This patch (of 2): Add information to struct compact_control to indicate that the allocator would really like to clear out this specific part of memory, used by for example CMA. Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Joonsoo Kim <js1304@gmail.com> Link: http://lkml.kernel.org/r/20200227213238.1298752-1-riel@surriel.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
It was noticed that mlock2 tests are failing after 9c4e6b1a ("mm, mlock, vmscan: no more skipping pagevecs") because the patch has changed the timing on when the page is added to the unevictable LRU list and thus gains the unevictable page flag. The test was just too dependent on the implementation details which were true at the time when it was introduced. Page flags and the timing when they are set is something no userspace should ever depend on. The test should be testing only for the user observable contract of the tested syscalls. Those are defined pretty well for the mlock and there are other means for testing them. In fact this is already done and testing for page flags can be safely dropped to achieve the aimed purpose. Present bits can be checked by /proc/<pid>/smaps RSS field and the locking state by VmFlags although I would argue that Locked: field would be more appropriate. Drop all the page flag machinery and considerably simplify the test. This should be more robust for future kernel changes while checking the promised contract is still valid. Fixes: 9c4e6b1a ("mm, mlock, vmscan: no more skipping pagevecs") Reported-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200324154218.GS19542@dhcp22.suse.czSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mateusz Nosek authored
sc->memcg_low_skipped resets skipped_deactivate to 0 but this is not needed as this code path is never reachable with skipped_deactivate != 0 due to previous sc->skipped_deactivate branch. [mhocko@kernel.org: rewrite changelog] Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200319165938.23354-1-mateusznosek0@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill Tkhai authored
This gives some size improvement: $size mm/vmscan.o (before) text data bss dec hex filename 53670 24123 12 77805 12fed mm/vmscan.o $size mm/vmscan.o (after) text data bss dec hex filename 53648 24123 12 77783 12fd7 mm/vmscan.o Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/Message-ID: Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mateusz Nosek authored
Previously 0 was assigned to variable 'lruvec_size', but the variable was never read later. So the assignment can be removed. Fixes: f87bccde ("mm/vmscan: remove unused lru_pages argument") Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200229214022.11853-1-mateusznosek0@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Qian Cai authored
pgdat->kswapd_classzone_idx could be accessed concurrently in wakeup_kswapd(). Plain writes and reads without any lock protection result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as well as saving a branch (compilers might well optimize the original code in an unintentional way anyway). While at it, also take care of pgdat->kswapd_order and non-kswapd threads in allow_direct_reclaim(). The data races were reported by KCSAN, BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13: wakeup_kswapd+0xf1/0x400 wakeup_kswapd at mm/vmscan.c:3967 wake_all_kswapds+0x59/0xc0 wake_all_kswapds at mm/page_alloc.c:4241 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_slowpath at mm/page_alloc.c:4512 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7454: #0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 do_user_addr_fault at arch/x86/mm/fault.c:1405 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 irq event stamp: 6944085 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38: wakeup_kswapd+0xc8/0x400 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7472: #0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 irq event stamp: 6793561 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 BUG: KCSAN: data-race in kswapd / wakeup_kswapd write to 0xffff90973ffff2dc of 4 bytes by task 820 on cpu 6: kswapd+0x27c/0x8d0 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 read to 0xffff90973ffff2dc of 4 bytes by task 6299 on cpu 0: wakeup_kswapd+0xf3/0x450 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/1582749472-5171-1-git-send-email-cai@lca.pwSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
kswapd kernel thread starts either with a CPU affinity set to the full cpu mask of its target node or without any affinity at all if the node is CPUless. There is a cpu hotplug callback (kswapd_cpu_online) that implements an elaborate way to update this mask when a cpu is onlined. It is not really clear whether there is any actual benefit from this scheme. Completely CPU-less NUMA nodes rarely gain a new CPU during runtime. Drop the code for that reason. If there is a real usecase then we can resurrect and simplify the code. [mhocko@suse.com rewrite changelog] Suggested-by: Michal Hocko <mhocko@suse.org> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/20200218224422.3407-1-richardw.yang@linux.intel.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yang Shi authored
The commit 98fa15f3 ("mm: replace all open encodings for NUMA_NO_NODE") did the replacement across the kernel tree, but we got some more in vmscan.c since then. Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1581568298-45317-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yang Shi authored
Use mem_cgroup_is_root() API to check if memcg is root memcg instead of open coding. Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1581398649-125989-2-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yang Shi authored
When kstrndup fails, no memory was allocated and we can exit directly. [david@redhat.com: reword changelog] Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1581398649-125989-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
chenqiwu authored
Simplify page_is_buddy() to reduce the redundant code for better code readability. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: http://lkml.kernel.org/r/1583853751-5525-1-git-send-email-qiwuchen55@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mateusz Nosek authored
Previously if branch condition was false, the assignment was not executed. The assignment can be safely executed even when the condition is false and it is not incorrect as it assigns the value of 'nodemask' to 'ac.nodemask' which already has the same value. So as the assignment can be executed unconditionally, the branch can be removed. Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: http://lkml.kernel.org/r/20200307225335.31300-1-mateusznosek0@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
chenqiwu authored
Use free_area_empty() API to replace list_empty() for better code readability. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: http://lkml.kernel.org/r/1583674354-7713-1-git-send-email-qiwuchen55@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mateusz Nosek authored
This patch makes ALLOC_KSWAPD equal to __GFP_KSWAPD_RECLAIM (cast to int). Thanks to that code like: if (gfp_mask & __GFP_KSWAPD_RECLAIM) alloc_flags |= ALLOC_KSWAPD; can be changed to: alloc_flags |= (__force int) (gfp_mask &__GFP_KSWAPD_RECLAIM); Thanks to this one branch less is generated in the assembly. In case of ALLOC_KSWAPD flag two branches are saved, first one in code that always executes in the beginning of page allocation and the second one in loop in page allocator slowpath. Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: http://lkml.kernel.org/r/20200304162118.14784-1-mateusznosek0@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joel Savitz authored
Currently, the vm.min_free_kbytes sysctl value is capped at a hardcoded 64M in init_per_zone_wmark_min (unless it is overridden by khugepaged initialization). This value has not been modified since 2005, and enterprise-grade systems now frequently have hundreds of GB of RAM and multiple 10, 40, or even 100 GB NICs. We have seen page allocation failures on heavily loaded systems related to NIC drivers. These issues were resolved by an increase to vm.min_free_kbytes. This patch increases the hardcoded value by a factor of 4 as a temporary solution. Further work to make the calculation of vm.min_free_kbytes more consistent throughout the kernel would be desirable. As an example of the inconsistency of the current method, this value is recalculated by init_per_zone_wmark_min() in the case of memory hotplug which will override the value set by set_recommended_min_free_kbytes() called during khugepaged initialization even if khugepaged remains enabled, however an on/off toggle of khugepaged will then recalculate and set the value via set_recommended_min_free_kbytes(). Signed-off-by: Joel Savitz <jsavitz@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael Aquini <aquini@redhat.com> Link: http://lkml.kernel.org/r/20200220150103.5183-1-jsavitz@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Walter Wu authored
Test negative size in memmove in order to verify whether it correctly get KASAN report. Casting negative numbers to size_t would indeed turn up as a large size_t, so it will have out-of-bounds bug and be detected by KASAN. [walter-zh.wu@mediatek.com: fix -Wstringop-overflow warning] Link: http://lkml.kernel.org/r/20200311134244.13016-1-walter-zh.wu@mediatek.comSigned-off-by: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: kernel test robot <lkp@intel.com> Link: http://lkml.kernel.org/r/20191112065313.7060-1-walter-zh.wu@mediatek.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Walter Wu authored
Patch series "fix the missing underflow in memory operation function", v4. The patchset helps to produce a KASAN report when size is negative in memory operation functions. It is helpful for programmer to solve an undefined behavior issue. Patch 1 based on Dmitry's review and suggestion, patch 2 is a test in order to verify the patch 1. [1]https://bugzilla.kernel.org/show_bug.cgi?id=199341 [2]https://lore.kernel.org/linux-arm-kernel/20190927034338.15813-1-walter-zh.wu@mediatek.com/ This patch (of 2): KASAN missed detecting size is a negative number in memset(), memcpy(), and memmove(), it will cause out-of-bounds bug. So needs to be detected by KASAN. If size is a negative number, then it has a reason to be defined as out-of-bounds bug type. Casting negative numbers to size_t would indeed turn up as a large size_t and its value will be larger than ULONG_MAX/2, so that this can qualify as out-of-bounds. KASAN report is shown below: BUG: KASAN: out-of-bounds in kmalloc_memmove_invalid_size+0x70/0xa0 Read of size 18446744073709551608 at addr ffffff8069660904 by task cat/72 CPU: 2 PID: 72 Comm: cat Not tainted 5.4.0-rc1-next-20191004ajb-00001-gdb8af2f372b2-dirty #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x288 show_stack+0x14/0x20 dump_stack+0x10c/0x164 print_address_description.isra.9+0x68/0x378 __kasan_report+0x164/0x1a0 kasan_report+0xc/0x18 check_memory_region+0x174/0x1d0 memmove+0x34/0x88 kmalloc_memmove_invalid_size+0x70/0xa0 [1] https://bugzilla.kernel.org/show_bug.cgi?id=199341 [cai@lca.pw: fix -Wdeclaration-after-statement warn] Link: http://lkml.kernel.org/r/1583509030-27939-1-git-send-email-cai@lca.pw [peterz@infradead.org: fix objtool warning] Link: http://lkml.kernel.org/r/20200305095436.GV2596@hirez.programming.kicks-ass.netReported-by: kernel test robot <lkp@intel.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Link: http://lkml.kernel.org/r/20191112065302.7015-1-walter-zh.wu@mediatek.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Baoquan He authored
When allocating memmap for hot added memory with the classic sparse, the specified 'nid' is ignored in populate_section_memmap(). While in allocating memmap for the classic sparse during boot, the node given by 'nid' is preferred. And VMEMMAP prefers the node of 'nid' in both boot stage and memory hot adding. So seems no reason to not respect the node of 'nid' for the classic sparse when hot adding memory. Use kvmalloc_node instead to use the passed in 'nid'. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: http://lkml.kernel.org/r/20200316125625.GH3486@MiWiFi-R3L-srvSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Baoquan He authored
This change makes populate_section_memmap()/depopulate_section_memmap much simpler. Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/20200316125450.GG3486@MiWiFi-R3L-srvSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pingfan Liu authored
After introducing mem sub section concept, pfn_present() loses its literal meaning, and will not be necessary a truth on partial populated mem section. Since all of the callers use it to judge an absent section, it is better to rename pfn_present() as pfn_in_present_section(). Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Dan Williams <dan.j.williams@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Leonardo Bras <leonardo@linux.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Link: http://lkml.kernel.org/r/1581919110-29575-1-git-send-email-kernelfans@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
memmap should be the address to page struct instead of address to pfn. As mentioned by David, if system memory and devmem sit within a section, the mismatch address would lead kdump to dump unexpected memory. Since sub-section only works for SPARSEMEM_VMEMMAP, pfn_to_page() is valid to get the page struct address at this point. Fixes: ba72b4c8 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Baoquan He <bhe@redhat.com> Link: http://lkml.kernel.org/r/20200210005048.10437-1-richardw.yang@linux.intel.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Brian Geffon authored
Add a few simple self tests for the new flag MREMAP_DONTUNMAP, they are simple smoke tests which also demonstrate the behavior. [akpm@linux-foundation.org: convert eight-spaces to hard tabs] [bgeffon@google.com: v7] Link: http://lkml.kernel.org/r/20200221174248.244748-2-bgeffon@google.com [akpm@linux-foundation.org: coding style fixes] Signed-off-by: Brian Geffon <bgeffon@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Deacon <will@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Jesse Barnes <jsbarnes@google.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Link: http://lkml.kernel.org/r/20200218173221.237674-2-bgeffon@google.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Brian Geffon authored
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. The remap operation will be performed as it would have been normally by moving over the page tables to the new mapping. The old vma will have any locked flags cleared, have no pagetables, and any userfaultfds that were watching that range will continue watching it. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. Because MREMAP_DONTUNMAP always results in moving a VMA you MUST use the MREMAP_MAYMOVE flag, it's not possible to resize a VMA while also moving with MREMAP_DONTUNMAP so old_len must always be equal to the new_len otherwise it will return -EINVAL. We hope to use this in Chrome OS where with userfaultfd we could write an anonymous mapping to disk without having to STOP the process or worry about VMA permission changes. This feature also has a use case in Android, Lokesh Gidra has said that "As part of using userfaultfd for GC, We'll have to move the physical pages of the java heap to a separate location. For this purpose mremap will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java heap, its virtual mapping will be removed as well. Therefore, we'll require performing mmap immediately after. This is not only time consuming but also opens a time window where a native thread may call mmap and reserve the java heap's address range for its own usage. This flag solves the problem." [bgeffon@google.com: v6] Link: http://lkml.kernel.org/r/20200218173221.237674-1-bgeffon@google.com [bgeffon@google.com: v7] Link: http://lkml.kernel.org/r/20200221174248.244748-1-bgeffon@google.comSigned-off-by: Brian Geffon <bgeffon@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Deacon <will@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Jesse Barnes <jsbarnes@google.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Link: http://lkml.kernel.org/r/20200207201856.46070-1-bgeffon@google.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jaewon Kim authored
Even on 64 bit kernel, the mmap failure can happen for a 32 bit task. Virtual memory space shortage of a task on mmap is reported to userspace as -ENOMEM. It can be confused as physical memory shortage of overall system. The vm_unmapped_area can be called to by some drivers or other kernel core system like filesystem. In my platform, GPU driver calls to vm_unmapped_area and the driver returns -ENOMEM even in GPU side shortage. It can be hard to distinguish which code layer returns the -ENOMEM. Create mmap trace file and add trace point of vm_unmapped_area. i.e.) 277.156599: vm_unmapped_area: addr=77e0d03000 err=0 total_vm=0x17014b flags=0x1 len=0x400000 lo=0x8000 hi=0x7878c27000 mask=0x0 ofs=0x1 342.838740: vm_unmapped_area: addr=0 err=-12 total_vm=0xffb08 flags=0x0 len=0x100000 lo=0x40000000 hi=0xfffff000 mask=0x0 ofs=0x22 [akpm@linux-foundation.org: prefix address printk with 0x, per Matthew] Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200320055823.27089-3-jaewon31.kim@samsung.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jaewon Kim authored
Patch series "mm: mmap: add mmap trace point", v3. Create mmap trace file and add trace point of vm_unmapped_area(). This patch (of 2): In preparation for next patch remove inline of vm_unmapped_area and move code to mmap.c. There is no logical change. Also remove unmapped_area[_topdown] out of mm.h, there is no code calling to them. Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20200320055823.27089-2-jaewon31.kim@samsung.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wang Wenhu authored
The param "start" actually referes to the physical memory start, which is to be mapped into virtual area vma. And it is the field vma->vm_start which stands for the start of the area. Most of the time, we do not read through whole implementation of a function but only the definition and essential comments. Accurate comments are definitely the base stone. Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200318052206.105104-1-wenhu.wang@vivo.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
WANG Wenhu authored
It really made me scratch my head. Replace the comment with an accurate and consistent description. The parameter pfn actually refers to the page frame number which is right-shifted by PAGE_SHIFT from the physical address. Signed-off-by: WANG Wenhu <wenhu.wang@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200310073955.43415-1-wenhu.wang@vivo.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
Userfaultfd fault path was by default killable even if the caller does not have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE. Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code right now, this patch should have no functional change. It also cleaned the code a little bit by introducing some helpers. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
The existing gup code does not react to the fatal signals in many code paths. For example, in one retry path of gup we're still using down_read() rather than down_read_killable(). Also, when doing page faults we don't pass in FAULT_FLAG_KILLABLE as well, which means that within the faulting process we'll wait in non-killable way as well. These were spotted by Linus during the code review of some other patches. Let's allow the gup code to react to fatal signals to improve the responsiveness of threads when during gup and being killed. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160256.9887-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
This is the gup counterpart of the change that allows the VM_FAULT_RETRY to happen for more than once. One thing to mention is that we must check the fatal signal here before retry because the GUP can be interrupted by that, otherwise we can loop forever. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220195357.16371-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag): - ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try - ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try - !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all - !ALLOW_RETRY and TRIED: this is forbidden and should never be used In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained. This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. GUP code is not touched yet and will be covered in follow up patch. Please read the thread below for more information. [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
handle_userfaultfd() is currently the only one place in the kernel page fault procedures that can respond to non-fatal userspace signals. It was trying to detect such an allowance by checking against USER & KILLABLE flags, which was "un-official". In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show that the fault handler allows the fault procedure to respond even to non-fatal signals. Meanwhile, add this new flag to the default fault flags so that all the page fault handlers can benefit from the new flag. With that, replacing the userfault check to this one. Since the line is getting even longer, clean up the fault flags a bit too to ease TTY users. Although we've got a new flag and applied it, we shouldn't have any functional change with this patch so far. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL. Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
This patch removes the risk path in handle_userfault() then we will be sure that the callers of handle_mm_fault() will know that the VMAs might have changed. Meanwhile with previous patch we don't lose responsiveness as well since the core mm code now can handle the nonfatal userspace signals even if we return VM_FAULT_RETRY. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
The idea comes from the upstream discussion between Linus and Andrea: https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ A summary to the issue: there was a special path in handle_userfault() in the past that we'll return a VM_FAULT_NOPAGE when we detected non-fatal signals when waiting for userfault handling. We did that by reacquiring the mmap_sem before returning. However that brings a risk in that the vmas might have changed when we retake the mmap_sem and even we could be holding an invalid vma structure. This patch is a preparation of removing that special path by allowing the page fault to return even faster if we were interrupted by a non-fatal signal during a user-mode page fault handling routine. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160230.9598-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
Let SH to use the new fault_signal_pending() helper. Here we'll need to move the up_read() out because that's actually needed as long as !RETRY cases. At the meantime we can drop all the rest of up_read()s now (which seems to be cleaner). Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160226.9550-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
Let powerpc code to use the new helper, by moving the signal handling earlier before the retry logic. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160222.9422-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-