- 14 Jun, 2019 14 commits
-
-
Dan Williams authored
The p2pdma facility enables a provider to publish a pool of dma addresses for a consumer to allocate. A genpool is used internally by p2pdma to collect dma resources, 'chunks', to be handed out to consumers. Whenever a consumer allocates a resource it needs to pin the 'struct dev_pagemap' instance that backs the chunk selected by pci_alloc_p2pmem(). Currently that reference is taken globally on the entire provider device. That sets up a lifetime mismatch whereby the p2pdma core needs to maintain hacks to make sure the percpu_ref is not released twice. This lifetime mismatch also stands in the way of a fix to devm_memremap_pages() whereby devm_memremap_pages_release() must wait for the percpu_ref ->release() callback to complete before it can proceed to teardown pages. So, towards fixing this situation, introduce the ability to store a 'chunk owner' at gen_pool_add() time, and a facility to retrieve the owner at gen_pool_{alloc,free}() time. For p2pdma this will be used to store and recall individual dev_pagemap reference counter instances per-chunk. Link: http://lkml.kernel.org/r/155727338118.292046.13407378933221579644.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dan Williams authored
The pci_p2pdma_add_resource() implementation immediately frees the pgmap if gen_pool_add_virt() fails. However, that means that when @dev triggers a devres release devm_memremap_pages_release() will crash trying to access the freed @pgmap. Use the new devm_memunmap_pages() to manually free the mapping in the error path. Link: http://lkml.kernel.org/r/155727337603.292046.13101332703665246702.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com> Fixes: 52916982 ("PCI/P2PDMA: Support peer-to-peer memory") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dan Williams authored
Use the new devm_release_action() facility to allow devm_memremap_pages_release() to be manually triggered. Link: http://lkml.kernel.org/r/155727337088.292046.5774214552136776763.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dan Williams authored
Patch series "mm/devm_memremap_pages: Fix page release race", v2. Logan audited the devm_memremap_pages() shutdown path and noticed that it was possible to proceed to arch_remove_memory() before all potential page references have been reaped. Introduce a new ->cleanup() callback to do the work of waiting for any straggling page references and then perform the percpu_ref_exit() in devm_memremap_pages_release() context. For p2pdma this involves some deeper reworks to reference count resources on a per-instance basis rather than a per pci-device basis. A modified genalloc api is introduced to convey a driver-private pointer through gen_pool_{alloc,free}() interfaces. Also, a devm_memunmap_pages() api is introduced since p2pdma does not auto-release resources on a setup failure. The dax and pmem changes pass the nvdimm unit tests, and the p2pdma changes should now pass testing with the pci_p2pdma_release() fix. Jrme, how does this look for HMM? This patch (of 6): The devm_add_action() facility allows a resource allocation routine to add custom devm semantics. One such user is devm_memremap_pages(). There is now a need to manually trigger devm_memremap_pages_release(). Introduce devm_release_action() so the release action can be triggered via a new devm_memunmap_pages() api in a follow-on change. Link: http://lkml.kernel.org/r/155727336530.292046.2926860263201336366.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
There was the below bug report from Wu Fangsuo. On the CMA allocation path, isolate_migratepages_range() could isolate unevictable LRU pages and reclaim_clean_page_from_list() can try to reclaim them if they are clean file-backed pages. page:ffffffbf02f33b40 count:86 mapcount:84 mapping:ffffffc08fa7a810 index:0x24 flags: 0x19040c(referenced|uptodate|arch_1|mappedtodisk|unevictable|mlocked) raw: 000000000019040c ffffffc08fa7a810 0000000000000024 0000005600000053 raw: ffffffc009b05b20 ffffffc009b05b20 0000000000000000 ffffffc09bf3ee80 page dumped because: VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page)) page->mem_cgroup:ffffffc09bf3ee80 ------------[ cut here ]------------ kernel BUG at /home/build/farmland/adroid9.0/kernel/linux/mm/vmscan.c:1350! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 7125 Comm: syz-executor Tainted: G S 4.14.81 #3 Hardware name: ASR AQUILAC EVB (DT) task: ffffffc00a54cd00 task.stack: ffffffc009b00000 PC is at shrink_page_list+0x1998/0x3240 LR is at shrink_page_list+0x1998/0x3240 pc : [<ffffff90083a2158>] lr : [<ffffff90083a2158>] pstate: 60400045 sp : ffffffc009b05940 .. shrink_page_list+0x1998/0x3240 reclaim_clean_pages_from_list+0x3c0/0x4f0 alloc_contig_range+0x3bc/0x650 cma_alloc+0x214/0x668 ion_cma_allocate+0x98/0x1d8 ion_alloc+0x200/0x7e0 ion_ioctl+0x18c/0x378 do_vfs_ioctl+0x17c/0x1780 SyS_ioctl+0xac/0xc0 Wu found it's due to commit ad6b6704 ("mm: remove SWAP_MLOCK in ttu"). Before that, unevictable pages go to cull_mlocked so that we can't reach the VM_BUG_ON_PAGE line. To fix the issue, this patch filters out unevictable LRU pages from the reclaim_clean_pages_from_list in CMA. Link: http://lkml.kernel.org/r/20190524071114.74202-1-minchan@kernel.org Fixes: ad6b6704 ("mm: remove SWAP_MLOCK in ttu") Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Wu Fangsuo <fangsuowu@asrmicro.com> Debugged-by: Wu Fangsuo <fangsuowu@asrmicro.com> Tested-by: Wu Fangsuo <fangsuowu@asrmicro.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Pankaj Suryawanshi <pankaj.suryawanshi@einfochips.com> Cc: <stable@vger.kernel.org> [4.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrea Arcangeli authored
When fixing the race conditions between the coredump and the mmap_sem holders outside the context of the process, we focused on mmget_not_zero()/get_task_mm() callers in 04f5866e ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping"), but those aren't the only cases where the mmap_sem can be taken outside of the context of the process as Michal Hocko noticed while backporting that commit to older -stable kernels. If mmgrab() is called in the context of the process, but then the mm_count reference is transferred outside the context of the process, that can also be a problem if the mmap_sem has to be taken for writing through that mm_count reference. khugepaged registration calls mmgrab() in the context of the process, but the mmap_sem for writing is taken later in the context of the khugepaged kernel thread. collapse_huge_page() after taking the mmap_sem for writing doesn't modify any vma, so it's not obvious that it could cause a problem to the coredump, but it happens to modify the pmd in a way that breaks an invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page() needs the mmap_sem for writing just to block concurrent page faults that call pmd_trans_huge_lock(). Specifically the invariant that "!pmd_trans_huge()" cannot become a "pmd_trans_huge()" doesn't hold while collapse_huge_page() runs. The coredump will call __get_user_pages() without mmap_sem for reading, which eventually can invoke a lockless page fault which will need a functional pmd_trans_huge_lock(). So collapse_huge_page() needs to use mmget_still_valid() to check it's not running concurrently with the coredump... as long as the coredump can invoke page faults without holding the mmap_sem for reading. This has "Fixes: khugepaged" to facilitate backporting, but in my view it's more a bug in the coredump code that will eventually have to be rewritten to stop invoking page faults without the mmap_sem for reading. So the long term plan is still to drop all mmget_still_valid(). Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com Fixes: ba76149f ("thp: khugepaged") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
swkhack authored
On a 64-bit machine the value of "vma->vm_end - vma->vm_start" may be negative when using 32 bit ints and the "count >> PAGE_SHIFT"'s result will be wrong. So change the local variable and return value to unsigned long to fix the problem. Link: http://lkml.kernel.org/r/20190513023701.83056-1-swkhack@gmail.com Fixes: 0cf2f6f6 ("mm: mlock: check against vma for actual mlock() size") Signed-off-by: swkhack <swkhack@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yang Shi authored
A few new fields were added to mmu_gather to make TLB flush smarter for huge page by telling what level of page table is changed. __tlb_reset_range() is used to reset all these page table state to unchanged, which is called by TLB flush for parallel mapping changes for the same range under non-exclusive lock (i.e. read mmap_sem). Before commit dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap"), the syscalls (e.g. MADV_DONTNEED, MADV_FREE) which may update PTEs in parallel don't remove page tables. But, the forementioned commit may do munmap() under read mmap_sem and free page tables. This may result in program hang on aarch64 reported by Jan Stancek. The problem could be reproduced by his test program with slightly modified below. ---8<--- static int map_size = 4096; static int num_iter = 500; static long threads_total; static void *distant_area; void *map_write_unmap(void *ptr) { int *fd = ptr; unsigned char *map_address; int i, j = 0; for (i = 0; i < num_iter; i++) { map_address = mmap(distant_area, (size_t) map_size, PROT_WRITE | PROT_READ, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (map_address == MAP_FAILED) { perror("mmap"); exit(1); } for (j = 0; j < map_size; j++) map_address[j] = 'b'; if (munmap(map_address, map_size) == -1) { perror("munmap"); exit(1); } } return NULL; } void *dummy(void *ptr) { return NULL; } int main(void) { pthread_t thid[2]; /* hint for mmap in map_write_unmap() */ distant_area = mmap(0, DISTANT_MMAP_SIZE, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); munmap(distant_area, (size_t)DISTANT_MMAP_SIZE); distant_area += DISTANT_MMAP_SIZE / 2; while (1) { pthread_create(&thid[0], NULL, map_write_unmap, NULL); pthread_create(&thid[1], NULL, dummy, NULL); pthread_join(thid[0], NULL); pthread_join(thid[1], NULL); } } ---8<--- The program may bring in parallel execution like below: t1 t2 munmap(map_address) downgrade_write(&mm->mmap_sem); unmap_region() tlb_gather_mmu() inc_tlb_flush_pending(tlb->mm); free_pgtables() tlb->freed_tables = 1 tlb->cleared_pmds = 1 pthread_exit() madvise(thread_stack, 8M, MADV_DONTNEED) zap_page_range() tlb_gather_mmu() inc_tlb_flush_pending(tlb->mm); tlb_finish_mmu() if (mm_tlb_flush_nested(tlb->mm)) __tlb_reset_range() __tlb_reset_range() would reset freed_tables and cleared_* bits, but this may cause inconsistency for munmap() which do free page tables. Then it may result in some architectures, e.g. aarch64, may not flush TLB completely as expected to have stale TLB entries remained. Use fullmm flush since it yields much better performance on aarch64 and non-fullmm doesn't yields significant difference on x86. The original proposed fix came from Jan Stancek who mainly debugged this issue, I just wrapped up everything together. Jan's testing results: v5.2-rc2-24-gbec7550c -------------------------- mean stddev real 37.382 2.780 user 1.420 0.078 sys 54.658 1.855 v5.2-rc2-24-gbec7550c + "mm: mmu_gather: remove __tlb_reset_range() for force flush" ---------------------------------------------------------------------------------------_ mean stddev real 37.119 2.105 user 1.548 0.087 sys 55.698 1.357 [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1558322252-113575-1-git-send-email-yang.shi@linux.alibaba.com Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Jan Stancek <jstancek@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Suggested-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Piggin <npiggin@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Nadav Amit <namit@vmware.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [4.20+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wengang Wang authored
ocfs2_dentry_attach_lock() can be executed in parallel threads against the same dentry. Make that race safe. The race is like this: thread A thread B (A1) enter ocfs2_dentry_attach_lock, seeing dentry->d_fsdata is NULL, and no alias found by ocfs2_find_local_alias, so kmalloc a new ocfs2_dentry_lock structure to local variable "dl", dl1 ..... (B1) enter ocfs2_dentry_attach_lock, seeing dentry->d_fsdata is NULL, and no alias found by ocfs2_find_local_alias so kmalloc a new ocfs2_dentry_lock structure to local variable "dl", dl2. ...... (A2) set dentry->d_fsdata with dl1, call ocfs2_dentry_lock() and increase dl1->dl_lockres.l_ro_holders to 1 on success. ...... (B2) set dentry->d_fsdata with dl2 call ocfs2_dentry_lock() and increase dl2->dl_lockres.l_ro_holders to 1 on success. ...... (A3) call ocfs2_dentry_unlock() and decrease dl2->dl_lockres.l_ro_holders to 0 on success. .... (B3) call ocfs2_dentry_unlock(), decreasing dl2->dl_lockres.l_ro_holders, but see it's zero now, panic Link: http://lkml.kernel.org/r/20190529174636.22364-1-wen.gang.wang@oracle.comSigned-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reported-by: Daniel Sobe <daniel.sobe@nxp.com> Tested-by: Daniel Sobe <daniel.sobe@nxp.com> Reviewed-by: Changwei Ge <gechangwei@live.cn> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill Tkhai authored
Johannes pointed out that after commit 886cf190 ("mm: move recent_rotated pages calculation to shrink_inactive_list()") we lost all zone_reclaim_stat::recent_rotated history. This fixes it. Link: http://lkml.kernel.org/r/155905972210.26456.11178359431724024112.stgit@localhost.localdomain Fixes: 886cf190 ("mm: move recent_rotated pages calculation to shrink_inactive_list()") Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Potyra, Stefan authored
If mlockall() is called with only MCL_ONFAULT as flag, it removes any previously applied lockings and does nothing else. This behavior is counter-intuitive and doesn't match the Linux man page. For mlockall(): EINVAL Unknown flags were specified or MCL_ONFAULT was specified without either MCL_FUTURE or MCL_CURRENT. Consequently, return the error EINVAL, if only MCL_ONFAULT is passed. That way, applications will at least detect that they are calling mlockall() incorrectly. Link: http://lkml.kernel.org/r/20190527075333.GA6339@er01809n.ebgroup.elektrobit.com Fixes: b0f205c2 ("mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage") Signed-off-by: Stefan Potyra <Stefan.Potyra@elektrobit.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manuel Traut authored
At least for ARM64 kernels compiled with the crosstoolchain from Debian/stretch or with the toolchain from kernel.org the line number is not decoded correctly by 'decode_stacktrace.sh': $ echo "[ 136.513051] f1+0x0/0xc [kcrash]" | \ CROSS_COMPILE=/opt/gcc-8.1.0-nolibc/aarch64-linux/bin/aarch64-linux- \ ./scripts/decode_stacktrace.sh /scratch/linux-arm64/vmlinux \ /scratch/linux-arm64 \ /nfs/debian/lib/modules/4.20.0-devel [ 136.513051] f1 (/linux/drivers/staging/kcrash/kcrash.c:68) kcrash If addr2line from the toolchain is used the decoded line number is correct: [ 136.513051] f1 (/linux/drivers/staging/kcrash/kcrash.c:57) kcrash Link: http://lkml.kernel.org/r/20190527083425.3763-1-manut@linutronix.deSigned-off-by: Manuel Traut <manut@linutronix.de> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shakeel Butt authored
Syzbot reported following memory leak: ffffffffda RBX: 0000000000000003 RCX: 0000000000441f79 BUG: memory leak unreferenced object 0xffff888114f26040 (size 32): comm "syz-executor626", pid 7056, jiffies 4294948701 (age 39.410s) hex dump (first 32 bytes): 40 60 f2 14 81 88 ff ff 40 60 f2 14 81 88 ff ff @`......@`...... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: slab_post_alloc_hook mm/slab.h:439 [inline] slab_alloc mm/slab.c:3326 [inline] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 kmalloc include/linux/slab.h:547 [inline] __memcg_init_list_lru_node+0x58/0xf0 mm/list_lru.c:352 memcg_init_list_lru_node mm/list_lru.c:375 [inline] memcg_init_list_lru mm/list_lru.c:459 [inline] __list_lru_init+0x193/0x2a0 mm/list_lru.c:626 alloc_super+0x2e0/0x310 fs/super.c:269 sget_userns+0x94/0x2a0 fs/super.c:609 sget+0x8d/0xb0 fs/super.c:660 mount_nodev+0x31/0xb0 fs/super.c:1387 fuse_mount+0x2d/0x40 fs/fuse/inode.c:1236 legacy_get_tree+0x27/0x80 fs/fs_context.c:661 vfs_get_tree+0x2e/0x120 fs/super.c:1476 do_new_mount fs/namespace.c:2790 [inline] do_mount+0x932/0xc50 fs/namespace.c:3110 ksys_mount+0xab/0x120 fs/namespace.c:3319 __do_sys_mount fs/namespace.c:3333 [inline] __se_sys_mount fs/namespace.c:3330 [inline] __x64_sys_mount+0x26/0x30 fs/namespace.c:3330 do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is a simple off by one bug on the error path. Link: http://lkml.kernel.org/r/20190528043202.99980-1-shakeelb@google.com Fixes: 60d3fd32 ("list_lru: introduce per-memcg lists") Reported-by: syzbot+f90a420dfe2b1b03cb2c@syzkaller.appspotmail.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
The kernel test robot noticed a 26% will-it-scale pagefault regression from commit 42a30035 ("mm: memcontrol: fix recursive statistics correctness & scalabilty"). This appears to be caused by bouncing the additional cachelines from the new hierarchical statistics counters. We can fix this by getting rid of the batched local counters instead. Originally, there were *only* group-local counters, and they were fully maintained per cpu. A reader of a stats file high up in the cgroup tree would have to walk the entire subtree and collect each level's per-cpu counters to get the recursive view. This was prohibitively expensive, and so we switched to per-cpu batched updates of the local counters during a983b5eb ("mm: memcontrol: fix excessive complexity in memory.stat reporting"), reducing the complexity from nr_subgroups * nr_cpus to nr_subgroups. With growing machines and cgroup trees, the tree walk itself became too expensive for monitoring top-level groups, and this is when the culprit patch added hierarchy counters on each cgroup level. When the per-cpu batch size would be reached, both the local and the hierarchy counters would get batch-updated from the per-cpu delta simultaneously. This makes local and hierarchical counter reads blazingly fast, but it unfortunately makes the write-side too cache line intense. Since local counter reads were never a problem - we only centralized them to accelerate the hierarchy walk - and use of the local counters are becoming rarer due to replacement with hierarchical views (ongoing rework in the page reclaim and workingset code), we can make those local counters unbatched per-cpu counters again. The scheme will then be as such: when a memcg statistic changes, the writer will: - update the local counter (per-cpu) - update the batch counter (per-cpu). If the batch is full: - spill the batch into the group's atomic_t - spill the batch into all ancestors' atomic_ts - empty out the batch counter (per-cpu) when a local memcg counter is read, the reader will: - collect the local counter from all cpus when a hiearchy memcg counter is read, the reader will: - read the atomic_t We might be able to simplify this further and make the recursive counters unbatched per-cpu counters as well (batch upward propagation, but leave per-cpu collection to the readers), but that will require a more in-depth analysis and testing of all the callsites. Deal with the immediate regression for now. Link: http://lkml.kernel.org/r/20190521151647.GB2870@cmpxchg.org Fixes: 42a30035 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: kernel test robot <rong.a.chen@intel.com> Tested-by: kernel test robot <rong.a.chen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 13 Jun, 2019 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds authored
Pull HID fixes from Jiri Kosina: - regression fixes (reverts) for module loading changes that turned out to be incompatible with some userspace, from Benjamin Tissoires - regression fix for special Logitech unifiying receiver 0xc52f, from Hans de Goede - a few device ID additions to logitech driver, from Hans de Goede - fix for Bluetooth support on 2nd-gen Wacom Intuos Pro, from Jason Gerecke * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: logitech-dj: Fix 064d:c52f receiver support Revert "HID: core: Call request_module before doing device_add" Revert "HID: core: Do not call request_module() in async context" Revert "HID: Increase maximum report size allowed by hid_field_extract()" HID: a4tech: fix horizontal scrolling HID: hyperv: Add a module description line HID: logitech-hidpp: Add support for the S510 remote control HID: multitouch: handle faulty Elo touch device HID: wacom: Sync INTUOSP2_BT touch state after each frame if necessary HID: wacom: Correct button numbering 2nd-gen Intuos Pro over Bluetooth HID: wacom: Send BTN_TOUCH in response to INTUOSP2_BT eraser contact HID: wacom: Don't report anything prior to the tool entering range HID: wacom: Don't set tool type until we're in range HID: rmi: Use SET_REPORT request on control endpoint for Acer Switch 3 and 5 HID: logitech-hidpp: add support for the MX5500 keyboard HID: logitech-dj: add support for the Logitech MX5500's Bluetooth Mini-Receiver HID: i2c-hid: add iBall Aer3 to descriptor override
-
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinuxLinus Torvalds authored
Pull selinux fixes from Paul Moore: "Three patches for v5.2. One fixes a problem where we weren't correctly logging raw SELinux labels, the other two fix problems where we weren't properly checking calls to kmemdup()" * tag 'selinux-pr-20190612' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix a missing-check bug in selinux_sb_eat_lsm_opts() selinux: fix a missing-check bug in selinux_add_mnt_opt( ) selinux: log raw contexts as untrusted strings
-
- 12 Jun, 2019 7 commits
-
-
Gen Zhang authored
In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' should be freed when error. Signed-off-by: Gen Zhang <blackgod016574@gmail.com> Fixes: 99dbbb59 ("selinux: rewrite selinux_sb_eat_lsm_opts()") Cc: <stable@vger.kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds authored
Pull media fixes from Mauro Carvalho Chehab: - a debug warning for satellite tuning at dvb core was producing too much noise - a regression at hfi_parser on Venus driver * tag 'media/v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: venus: hfi_parser: fix a regression in parser media: dvb: warning about dvb frequency limits produces too much noise
-
Gen Zhang authored
In selinux_add_mnt_opt(), 'val' is allocated by kmemdup_nul(). It returns NULL when fails. So 'val' should be checked. And 'mnt_opts' should be freed when error. Signed-off-by: Gen Zhang <blackgod016574@gmail.com> Fixes: 757cbe59 ("LSM: new method: ->sb_add_mnt_opt()") Cc: <stable@vger.kernel.org> [PM: fixed some indenting problems] Signed-off-by: Paul Moore <paul@paul-moore.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds authored
Pull ptrace fixes from Eric Biederman: "This is just two very minor fixes: - prevent ptrace from reading unitialized kernel memory found twice by syzkaller - restore a missing smp_rmb in ptrace_may_access and add comment tp it so it is not removed by accident again. Apologies for being a little slow about getting this to you, I am still figuring out how to develop with a little baby in the house" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ptrace: restore smp_rmb() in __ptrace_may_access() signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO
-
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlbLinus Torvalds authored
Pull swiotlb fix from Konrad Rzeszutek Wilk: "One tiny fix for ARM64 where we could allocate the SWIOTLB twice" * 'stable/for-linus-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: xen/swiotlb: don't initialize swiotlb twice on arm64
-
git://github.com/awilliam/linux-vfioLinus Torvalds authored
Pull VFIO fixes from Alex Williamson: "Fix mdev device create/remove paths to provide initialized device for parent driver create callback and correct ordering of device removal from bus prior to initiating removal by parent. Also resolve races between parent removal and device create/remove paths (all from Parav Pandit)" * tag 'vfio-v5.2-rc5' of git://github.com/awilliam/linux-vfio: vfio/mdev: Synchronize device create/remove with parent removal vfio/mdev: Avoid creating sysfs remove file on stale device removal vfio/mdev: Improve the create/remove sequence
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fix from David Sterba: "One regression fix to TRIM ioctl. The range cannot be used as its meaning can be confusing regarding physical and logical addresses. This confusion in code led to potential corruptions when the range overlapped data. The original patch made it to several stable kernels and was promptly reverted, the version for master branch is different due to additional changes but the change is effectively the same" * tag 'for-5.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: Always trim all unallocated space in btrfs_trim_free_extents
-
- 11 Jun, 2019 2 commits
-
-
Ondrej Mosnacek authored
These strings may come from untrusted sources (e.g. file xattrs) so they need to be properly escaped. Reproducer: # setenforce 0 # touch /tmp/test # setfattr -n security.selinux -v 'kuřecí řízek' /tmp/test # runcon system_u:system_r:sshd_t:s0 cat /tmp/test (look at the generated AVCs) Actual result: type=AVC [...] trawcon=kuřecí řízek Expected result: type=AVC [...] trawcon=6B75C5996563C3AD20C599C3AD7A656B Fixes: fede1483 ("selinux: log invalid contexts in AVCs") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
-
Jann Horn authored
Restore the read memory barrier in __ptrace_may_access() that was deleted a couple years ago. Also add comments on this barrier and the one it pairs with to explain why they're there (as far as I understand). Fixes: bfedb589 ("mm: Add a user_ns owner to mm_struct and fix ptrace permission checks") Cc: stable@vger.kernel.org Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
-
- 10 Jun, 2019 4 commits
-
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block cgroup symlink revert from Jens Axboe: "I talked to Tejun about this offline, and he's not a huge fan of the symlink. So let's revert this for now, and Paolo can do this properly for 5.3 instead" * tag 'for-linus-20190610' of git://git.kernel.dk/linux-block: cgroup/bfq: revert bfq.weight symlink change
-
Linus Torvalds authored
Merge tag 'regulator-fix-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "Just one driver specific fix here, for a boot regression introduced during some modernization work on the tps6507x driver" * tag 'regulator-fix-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: tps6507x: Fix boot regression due to testing wrong init_data pointer
-
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spiLinus Torvalds authored
Pull spi fixes from Mark Brown: "A small set of fixes here. One core fix for error handling when we fail to set up the hardware before initiating a transfer and another one reverting a change in the core which broke Raspberry Pi in common use cases as part of some optimization work. There's also a couple of driver specific fixes" * tag 'spi-fix-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: abort spi_sync if failed to prepare_transfer_hardware spi: spi-fsl-spi: call spi_finalize_current_message() at the end spi: bitbang: Fix NULL pointer dereference in spi_unregister_master spi: Fix Raspberry Pi breakage
-
Jens Axboe authored
There's some discussion on how to do this the best, and Tejun prefers that BFQ just create the file itself instead of having cgroups support a symlink feature. Hence revert commit 54b7b868 and 19e9da9e for 5.2, and this can be done properly for 5.3. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 09 Jun, 2019 1 commit
-
-
Linus Torvalds authored
-
- 08 Jun, 2019 10 commits
-
-
git://github.com/ceph/ceph-clientLinus Torvalds authored
Pull ceph fixes from Ilya Dryomov: "A change to call iput() asynchronously to avoid a possible deadlock when iput_final() needs to wait for in-flight I/O (e.g. readahead) and a fixup for a cleanup that went into -rc1" * tag 'ceph-for-5.2-rc4' of git://github.com/ceph/ceph-client: ceph: fix error handling in ceph_get_caps() ceph: avoid iput_final() while holding mutex or in dispatch thread ceph: single workqueue for inode related works
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull xen fix from Juergen Gross: "Just one fix for the Xen block frontend driver avoiding allocations with order > 0" * tag 'for-linus-5.2b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-blkfront: switch kcalloc to kvcalloc for large array allocation
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull s390 fixes from Heiko Carstens: - fix stack unwinder: the stack unwinder rework has on off-by-one bug which prevents following stack backchains over more than one context (e.g. irq -> process). - fix address space detection in exception handler: if user space switches to access register mode, which is not supported anymore, the exception handler may resolve to the wrong address space. * tag 's390-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/unwind: correct stack switching during unwind s390/mm: fix address space detection in exception handling
-
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds authored
Pull MIPS fixes from Paul Burton: - Declare ginvt() __always_inline due to its use of an argument as an inline asm immediate. - A VDSO build fix following Kbuild changes made this cycle. - A fix for boot failures on txx9 systems following memory initialization changes made this cycle. - Bounds check virt_addr_valid() to prevent it spuriously indicating that bogus addresses are valid, in turn fixing hardened usercopy failures that have been present since v4.12. - Build uImage.gz for pistachio systems by default, since this is the image we need in order to actually boot on a board. - Remove an unused variable in our uprobes code. * tag 'mips_fixes_5.2_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: uprobes: remove set but not used variable 'epc' MIPS: pistachio: Build uImage.gz by default MIPS: Make virt_addr_valid() return bool MIPS: Bounds check virt_addr_valid MIPS: TXx9: Fix boot crash in free_initmem() MIPS: remove a space after -I to cope with header search paths for VDSO MIPS: mark ginvt() as __always_inline
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds authored
Pull yet more SPDX updates from Greg KH: "Another round of SPDX header file fixes for 5.2-rc4 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being added, based on the text in the files. We are slowly chipping away at the 700+ different ways people tried to write the license text. All of these were reviewed on the spdx mailing list by a number of different people. We now have over 60% of the kernel files covered with SPDX tags: $ ./scripts/spdxcheck.py -v 2>&1 | grep Files Files checked: 64533 Files with SPDX: 40392 Files with errors: 0 I think the majority of the "easy" fixups are now done, it's now the start of the longer-tail of crazy variants to wade through" * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429 ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc driver fixes from Greg KH: "Here are some small char and misc driver fixes for 5.2-rc4 to resolve a number of reported issues. The most "notable" one here is the kernel headers in proc^Wsysfs fixes. Those changes move the header file info into sysfs and fixes the build issues that you reported. Other than that, a bunch of small habanalabs driver fixes, some fpga driver fixes, and a few other tiny driver fixes. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: habanalabs: Read upper bits of trace buffer from RWPHI habanalabs: Fix virtual address access via debugfs for 2MB pages fpga: zynqmp-fpga: Correctly handle error pointer habanalabs: fix bug in checking huge page optimization habanalabs: Avoid using a non-initialized MMU cache mutex habanalabs: fix debugfs code uapi/habanalabs: add opcode for enable/disable device debug mode habanalabs: halt debug engines on user process close test_firmware: Use correct snprintf() limit genwqe: Prevent an integer overflow in the ioctl parport: Fix mem leak in parport_register_dev_model fpga: dfl: expand minor range when registering chrdev region fpga: dfl: Add lockdep classes for pdata->lock fpga: dfl: afu: Pass the correct device to dma_mapping_error() fpga: stratix10-soc: fix use-after-free on s10_init() w1: ds2408: Fix typo after 49695ac4 (reset on output_write retry with readback) kheaders: Do not regenerate archive if config is not changed kheaders: Move from proc to sysfs lkdtm/bugs: Adjust recursion test to avoid elision lkdtm/usercopy: Moves the KERNEL_DS test to non-canonical
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "I2C has a driver bugfix and a MAINTAINERS fix" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: Karthikeyan Ramasubramanian is MIA i2c: xiic: Add max_read_len quirk
-
git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds authored
Pull dmaengine fixes from Vinod Koul: - jz4780 transfer fix for acking descriptors early - fsl-qdma: clean registers on error - dw-axi-dmac: null pointer dereference fix - mediatek-cqdma: fix sleeping in atomic context - tegra210-adma: fix bunch os issues like crashing in driver probe, channel FIFO configuration etc. - sprd: Fixes for possible crash on descriptor status, block length overflow. For 2-stage transfer fix incorrect start, configuration and interrupt handling. * tag 'dmaengine-fix-5.2-rc4' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: sprd: Add interrupt support for 2-stage transfer dmaengine: sprd: Fix the right place to configure 2-stage transfer dmaengine: sprd: Fix block length overflow dmaengine: sprd: Fix the incorrect start for 2-stage destination channels dmaengine: sprd: Add validation of current descriptor in irq handler dmaengine: sprd: Fix the possible crash when getting descriptor status dmaengine: tegra210-adma: Fix spelling dmaengine: tegra210-adma: Fix channel FIFO configuration dmaengine: tegra210-adma: Fix crash during probe dmaengine: mediatek-cqdma: sleeping in atomic context dmaengine: dw-axi-dmac: fix null dereference when pointer first is null dmaengine: fsl-qdma: Add improvement dmaengine: jz4780: Fix transfers being ACKed too soon
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: - Allow symlink from the bfq.weight cgroup parameter to the general weight (Angelo) - Damien is new skd maintainer (Bart) - NVMe pull request from Sagi, with a few small fixes. - Ensure we set DMA segment size properly, dma-debug is now tripping on these (Christoph) - Remove useless debugfs_create() return check (Greg) - Remove redundant unlikely() check on IS_ERR() (Kefeng) - Fixup request freeing on exit (Ming) * tag 'for-linus-20190608' of git://git.kernel.dk/linux-block: block, bfq: add weight symlink to the bfq.weight cgroup parameter cgroup: let a symlink too be created with a cftype file block: free sched's request pool in blk_cleanup_queue nvme-rdma: use dynamic dma mapping per command nvme: Fix u32 overflow in the number of namespace list calculation mmc: also set max_segment_size in the device mtip32xx: also set max_segment_size in the device rsxx: don't call dma_set_max_seg_size nvme-pci: don't limit DMA segement size block: Drop unlikely before IS_ERR(_OR_NULL) block: aoe: no need to check return value of debugfs_create functions nvmet: fix data_len to 0 for bdev-backed write_zeroes MAINTAINERS: Hand over skd maintainership nvme-tcp: fix queue mapping when queue count is limited nvme-rdma: fix queue mapping when queue count is limited
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Two bug fixes, both for fairly serious problems; the UFS one looks like it could be used to exfiltrate data from the kernel, although probably only a privileged user has access to the command management interface and the missing unlock in smartpqi is long standing and probably a little used error path" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: smartpqi: unlock on error in pqi_submit_raid_request_synchronous() scsi: ufs: Check that space was properly alloced in copy_query_response
-