- 30 Apr, 2020 2 commits
-
-
Barret Rhoden authored
Under rare circumstances, task_function_call() can repeatedly fail and cause a soft lockup. There is a slight race where the process is no longer running on the cpu we targeted by the time remote_function() runs. The code will simply try again. If we are very unlucky, this will continue to fail, until a watchdog fires. This can happen in a heavily loaded, multi-core virtual machine. Reported-by: syzbot+bb4935a5c09b5ff79940@syzkaller.appspotmail.com Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200414222920.121401-1-brho@google.com
-
CodyYao-oc authored
Zhaoxin CPU has provided facilities for monitoring performance via PMU (Performance Monitor Unit), but the functionality is unused so far. Therefore, add support for zhaoxin pmu to make performance related hardware events available. The PMU is mostly an Intel Architectural PerfMon-v2 with a novel errata for the ZXC line. It supports the following events: ----------------------------------------------------------------------------------------------------------------------------------- Event | Event | Umask | Description | Select | | ----------------------------------------------------------------------------------------------------------------------------------- cpu-cycles | 82h | 00h | unhalt core clock instructions | 00h | 00h | number of instructions at retirement. cache-references | 15h | 05h | number of fillq pushs at the current cycle. cache-misses | 1ah | 05h | number of l2 miss pushed by fillq. branch-instructions | 28h | 00h | counts the number of branch instructions retired. branch-misses | 29h | 00h | mispredicted branch instructions at retirement. bus-cycles | 83h | 00h | unhalt bus clock stalled-cycles-frontend | 01h | 01h | Increments each cycle the # of Uops issued by the RAT to RS. stalled-cycles-backend | 0fh | 04h | RS0/1/2/3/45 empty L1-dcache-loads | 68h | 05h | number of retire/commit load. L1-dcache-load-misses | 4bh | 05h | retired load uops whose data source followed an L1 miss. L1-dcache-stores | 69h | 06h | number of retire/commit Store,no LEA L1-dcache-store-misses | 62h | 05h | cache lines in M state evicted out of L1D due to Snoop HitM or dirty line replacement. L1-icache-loads | 00h | 03h | number of l1i cache access for valid normal fetch,including un-cacheable access. L1-icache-load-misses | 01h | 03h | number of l1i cache miss for valid normal fetch,including un-cacheable miss. L1-icache-prefetches | 0ah | 03h | number of prefetch. L1-icache-prefetch-misses | 0bh | 03h | number of prefetch miss. dTLB-loads | 68h | 05h | number of retire/commit load dTLB-load-misses | 2ch | 05h | number of load operations miss all level tlbs and cause a tablewalk. dTLB-stores | 69h | 06h | number of retire/commit Store,no LEA dTLB-store-misses | 30h | 05h | number of store operations miss all level tlbs and cause a tablewalk. dTLB-prefetches | 64h | 05h | number of hardware pte prefetch requests dispatched out of the prefetch FIFO. dTLB-prefetch-misses | 65h | 05h | number of hardware pte prefetch requests miss the l1d data cache. iTLB-load | 00h | 00h | actually counter instructions. iTLB-load-misses | 34h | 05h | number of code operations miss all level tlbs and cause a tablewalk. ----------------------------------------------------------------------------------------------------------------------------------- Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: CodyYao-oc <CodyYao-oc@zhaoxin.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1586747669-4827-1-git-send-email-CodyYao-oc@zhaoxin.com
-
- 22 Apr, 2020 1 commit
-
-
Ingo Molnar authored
Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo: kernel + tools/perf: Alexey Budankov: - Introduce CAP_PERFMON to kernel and user space. callchains: Adrian Hunter: - Allow using Intel PT to synthesize callchains for regular events. Kan Liang: - Stitch LBR records from multiple samples to get deeper backtraces, there are caveats, see the csets for details. perf script: Andreas Gerstmayr: - Add flamegraph.py script BPF: Jiri Olsa: - Synthesize bpf_trampoline/dispatcher ksymbol events. perf stat: Arnaldo Carvalho de Melo: - Honour --timeout for forked workloads. Stephane Eranian: - Force error in fallback on :k events, to avoid counting nothing when the user asks for kernel events but is not allowed to. perf bench: Ian Rogers: - Add event synthesis benchmark. tools api fs: Stephane Eranian: - Make xxx__mountpoint() more scalable libtraceevent: He Zhe: - Handle return value of asprintf. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 21 Apr, 2020 22 commits
-
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: tools/vm: fix cross-compile build coredump: fix null pointer dereference on coredump mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path shmem: fix possible deadlocks on shmlock_user_lock vmalloc: fix remap_vmalloc_range() bounds checks mm/shmem: fix build without THP mm/ksm: fix NULL pointer dereference when KSM zero page is enabled tools/build: tweak unused value workaround checkpatch: fix a typo in the regex for $allocFunctions mm, gup: return EINTR when gup is interrupted by fatal signals mm/hugetlb: fix a addressing exception caused by huge_pte_offset MAINTAINERS: add an entry for kfifo mm/userfaultfd: disable userfaultfd-wp on x86_32 slub: avoid redzone when choosing freepointer location sh: fix build error in mm/init.c
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "Bugfixes, and a few cleanups to the newly-introduced assembly language vmentry code for AMD" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions kvm: Disable objtool frame pointer checking for vmenter.S MAINTAINERS: add a reviewer for KVM/s390 KVM: s390: Fix PV check in deliverable_irqs() kvm: Handle reads of SandyBridge RAPL PMU MSRs rather than injecting #GP KVM: Remove CREATE_IRQCHIP/SET_PIT2 race KVM: SVM: Fix __svm_vcpu_run declaration. KVM: SVM: Do not setup frame pointer in __svm_vcpu_run KVM: SVM: Fix build error due to missing release_pages() include KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARD kvm: nVMX: match comment with return type for nested_vmx_exit_reflected kvm: nVMX: reflect MTF VM-exits if injected by L1 KVM: s390: Return last valid slot if approx index is out-of-bounds KVM: Check validity of resolved slot when searching memslots KVM: VMX: Enable machine check support for 32bit targets KVM: SVM: move more vmentry code to assembly KVM: SVM: fix compilation with modular PSP and non-modular KVM
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull virtio fixes and cleanups from Michael Tsirkin: - Some bug fixes - Cleanup a couple of issues that surfaced meanwhile - Disable vhost on ARM with OABI for now - to be fixed fully later in the cycle or in the next release. * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits) vhost: disable for OABI virtio: drop vringh.h dependency virtio_blk: add a missing include virtio-balloon: Avoid using the word 'report' when referring to free page hinting virtio-balloon: make virtballoon_free_page_report() static vdpa: fix comment of vdpa_register_device() vdpa: make vhost, virtio depend on menu vdpa: allow a 32 bit vq alignment drm/virtio: fix up for include file changes remoteproc: pull in slab.h rpmsg: pull in slab.h virtio_input: pull in slab.h remoteproc: pull in slab.h virtio-rng: pull in slab.h virtgpu: pull in uaccess.h tools/virtio: make asm/barrier.h self contained tools/virtio: define aligned attribute virtio/test: fix up after IOTLB changes vhost: Create accessors for virtqueues private_data vdpasim: Return status in vdpasim_get_status ...
-
git://git.infradead.org/users/jjs/linux-tpmddLinus Torvalds authored
Pull tpm fixes from Jarkko Sakkinen: "A few bug fixes" * tag 'tpmdd-next-20200421' of git://git.infradead.org/users/jjs/linux-tpmdd: tpm/tpm_tis: Free IRQ if probing fails tpm: fix wrong return value in tpm_pcr_extend tpm: ibmvtpm: retry on H_CLOSED in tpm_ibmvtpm_send() tpm: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module
-
git://github.com/ojeda/linuxLinus Torvalds authored
Pull clang-format fixlets from Miguel Ojeda: "Two trivial clang-format changes: - Don't indent C++ namespaces (Ian Rogers) - The usual clang-format macro list update (Miguel Ojeda)" * tag 'clang-format-for-linus-v5.7-rc3' of git://github.com/ojeda/linux: clang-format: Update with the latest for_each macro list clang-format: don't indent namespaces
-
Lucas Stach authored
Commit 7ed1c190 ("tools: fix cross-compile var clobbering") moved the setup of the CC variable to tools/scripts/Makefile.include to make the behavior consistent across all the tools Makefiles. As the vm tools missed the include we end up with the wrong CC in a cross-compiling evironment. Fixes: 7ed1c190 (tools: fix cross-compile var clobbering) Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Martin Kelly <martin@martingkelly.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416104748.25243-1-l.stach@pengutronix.deSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Sudip Mukherjee authored
If the core_pattern is set to "|" and any process segfaults then we get a null pointer derefernce while trying to coredump. The call stack shows: RIP: do_coredump+0x628/0x11c0 When the core_pattern has only "|" there is no use of trying the coredump and we can check that while formating the corename and exit with an error. After this change I get: format_corename failed Aborting core Fixes: 315c6926 ("coredump: split pipe command whitespace before expanding template") Reported-by: Matthew Ruffell <matthew.ruffell@canonical.com> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Wise <pabs3@bonedaddy.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416194612.21418-1-sudipm.mukherjee@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yang Shi authored
Syzbot reported the below lockdep splat: WARNING: possible irq lock inversion dependency detected 5.6.0-rc7-syzkaller #0 Not tainted -------------------------------------------------------- syz-executor.0/10317 just changed the state of lock: ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline] ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: shmem_mfill_atomic_pte+0x1012/0x21c0 mm/shmem.c:2407 but this lock was taken by another, SOFTIRQ-safe lock in the past: (&(&xa->xa_lock)->rlock#5){..-.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&info->lock)->rlock); local_irq_disable(); lock(&(&xa->xa_lock)->rlock#5); lock(&(&info->lock)->rlock); <Interrupt> lock(&(&xa->xa_lock)->rlock#5); *** DEADLOCK *** The full report is quite lengthy, please see: https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2004152007370.13597@eggly.anvils/T/#m813b412c5f78e25ca8c6c7734886ed4de43f241d It is because CPU 0 held info->lock with IRQ enabled in userfaultfd_copy path, then CPU 1 is splitting a THP which held xa_lock and info->lock in IRQ disabled context at the same time. If softirq comes in to acquire xa_lock, the deadlock would be triggered. The fix is to acquire/release info->lock with *_irq version instead of plain spin_{lock,unlock} to make it softirq safe. Fixes: 4c27fe4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Reported-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: http://lkml.kernel.org/r/1587061357-122619-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Recent commit 71725ed1 ("mm: huge tmpfs: try to split_huge_page() when punching hole") has allowed syzkaller to probe deeper, uncovering a long-standing lockdep issue between the irq-unsafe shmlock_user_lock, the irq-safe xa_lock on mapping->i_pages, and shmem inode's info->lock which nests inside xa_lock (or tree_lock) since 4.8's shmem_uncharge(). user_shm_lock(), servicing SysV shmctl(SHM_LOCK), wants shmlock_user_lock while its caller shmem_lock() holds info->lock with interrupts disabled; but hugetlbfs_file_setup() calls user_shm_lock() with interrupts enabled, and might be interrupted by a writeback endio wanting xa_lock on i_pages. This may not risk an actual deadlock, since shmem inodes do not take part in writeback accounting, but there are several easy ways to avoid it. Requiring interrupts disabled for shmlock_user_lock would be easy, but it's a high-level global lock for which that seems inappropriate. Instead, recall that the use of info->lock to guard info->flags in shmem_lock() dates from pre-3.1 days, when races with SHMEM_PAGEIN and SHMEM_TRUNCATE could occur: nowadays it serves no purpose, the only flag added or removed is VM_LOCKED itself, and calls to shmem_lock() an inode are already serialized by the caller. Take info->lock out of the chain and the possibility of deadlock or lockdep warning goes away. Fixes: 4595ef88 ("shmem: make shmem_inode_info::lock irq-safe") Reported-by: syzbot+c8a8197c8852f566b9d9@syzkaller.appspotmail.com Reported-by: syzbot+40b71e145e73f78f81ad@syzkaller.appspotmail.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004161707410.16322@eggly.anvils Link: https://lore.kernel.org/lkml/000000000000e5838c05a3152f53@google.com/ Link: https://lore.kernel.org/lkml/0000000000003712b305a331d3b1@google.com/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jann Horn authored
remap_vmalloc_range() has had various issues with the bounds checks it promises to perform ("This function checks that addr is a valid vmalloc'ed area, and that it is big enough to cover the vma") over time, e.g.: - not detecting pgoff<<PAGE_SHIFT overflow - not detecting (pgoff<<PAGE_SHIFT)+usize overflow - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same vmalloc allocation - comparing a potentially wildly out-of-bounds pointer with the end of the vmalloc region In particular, since commit fc970227 ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer dereferences by calling mmap() on a BPF map with a size that is bigger than the distance from the start of the BPF map to the end of the address space. This could theoretically be used as a kernel ASLR bypass, by using whether mmap() with a given offset oopses or returns an error code to perform a binary search over the possible address range. To allow remap_vmalloc_range_partial() to verify that addr and addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset to remap_vmalloc_range_partial() instead of adding it to the pointer in remap_vmalloc_range(). In remap_vmalloc_range_partial(), fix the check against get_vm_area_size() by using size comparisons instead of pointer comparisons, and add checks for pgoff. Fixes: 83342314 ("[PATCH] mm: introduce remap_vmalloc_range()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Some optimizers don't notice that shmem_punch_compound() is always true (PageTransCompound() being false) without CONFIG_TRANSPARENT_HUGEPAGE==y. Use IS_ENABLED to help them to avoid the BUILD_BUG inside HPAGE_PMD_NR. Fixes: 71725ed1 ("mm: huge tmpfs: try to split_huge_page() when punching hole") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004142339170.10035@eggly.anvilsSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Muchun Song authored
find_mergeable_vma() can return NULL. In this case, it leads to a crash when we access vm_mm(its offset is 0x40) later in write_protect_page. And this case did happen on our server. The following call trace is captured in kernel 4.19 with the following patch applied and KSM zero page enabled on our server. commit e86c59b1 ("mm/ksm: improve deduplication of zero pages with colouring") So add a vma check to fix it. BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 Oops: 0000 [#1] SMP NOPTI CPU: 9 PID: 510 Comm: ksmd Kdump: loaded Tainted: G OE 4.19.36.bsk.9-amd64 #4.19.36.bsk.9 RIP: try_to_merge_one_page+0xc7/0x760 Code: 24 58 65 48 33 34 25 28 00 00 00 89 e8 0f 85 a3 06 00 00 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 46 08 a8 01 75 b8 <49> 8b 44 24 40 4c 8d 7c 24 20 b9 07 00 00 00 4c 89 e6 4c 89 ff 48 RSP: 0018:ffffadbdd9fffdb0 EFLAGS: 00010246 RAX: ffffda83ffd4be08 RBX: ffffda83ffd4be40 RCX: 0000002c6e800000 RDX: 0000000000000000 RSI: ffffda83ffd4be40 RDI: 0000000000000000 RBP: ffffa11939f02ec0 R08: 0000000094e1a447 R09: 00000000abe76577 R10: 0000000000000962 R11: 0000000000004e6a R12: 0000000000000000 R13: ffffda83b1e06380 R14: ffffa18f31f072c0 R15: ffffda83ffd4be40 FS: 0000000000000000(0000) GS:ffffa0da43b80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 0000002c77c0a003 CR4: 00000000007626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ksm_scan_thread+0x115e/0x1960 kthread+0xf5/0x130 ret_from_fork+0x1f/0x30 [songmuchun@bytedance.com: if the vma is out of date, just exit] Link: http://lkml.kernel.org/r/20200416025034.29780-1-songmuchun@bytedance.com [akpm@linux-foundation.org: add the conventional braces, replace /** with /*] Fixes: e86c59b1 ("mm/ksm: improve deduplication of zero pages with colouring") Co-developed-by: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Hugh Dickins <hughd@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416025034.29780-1-songmuchun@bytedance.com Link: http://lkml.kernel.org/r/20200414132905.83819-1-songmuchun@bytedance.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
George Burgess IV authored
Clang has -Wself-assign enabled by default under -Wall, which always gets -Werror'ed on this file, causing sync-compare-and-swap to be disabled by default. The generally-accepted way to spell "this value is intentionally unused," is casting it to `void`. This is accepted by both GCC and Clang with -Wall enabled: https://godbolt.org/z/qqZ9r3Signed-off-by: George Burgess IV <gbiv@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: http://lkml.kernel.org/r/20200414195638.156123-1-gbiv@google.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christophe JAILLET authored
Here, we look for function such as 'netdev_alloc_skb_ip_align', so a '_' is missing in the regex. To make sure: grep -r --include=*.c skbip_a * | wc ==> 0 results grep -r --include=*.c skb_ip_a * | wc ==> 112 results Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200407190029.892-1-christophe.jaillet@wanadoo.frSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
EINTR is the usual error code which other killable interfaces return. This is the case for the other fatal_signal_pending break out from the same function. Make the code consistent. ERESTARTSYS is also quite confusing because the signal is fatal and so no restart will happen before returning to the userspace. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: http://lkml.kernel.org/r/20200409071133.31734-1-mhocko@kernel.orgSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Longpeng authored
Our machine encountered a panic(addressing exception) after run for a long time and the calltrace is: RIP: hugetlb_fault+0x307/0xbe0 RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286 RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48 RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48 RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080 R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8 R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074 FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: follow_hugetlb_page+0x175/0x540 __get_user_pages+0x2a0/0x7e0 __get_user_pages_unlocked+0x15d/0x210 __gfn_to_pfn_memslot+0x3c5/0x460 [kvm] try_async_pf+0x6e/0x2a0 [kvm] tdp_page_fault+0x151/0x2d0 [kvm] ... kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm] kvm_vcpu_ioctl+0x309/0x6d0 [kvm] do_vfs_ioctl+0x3f0/0x540 SyS_ioctl+0xa1/0xc0 system_call_fastpath+0x22/0x27 For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it may return a wrong 'pmdp' if there is a race. Please look at the following code snippet: ... pud = pud_offset(p4d, addr); if (sz != PUD_SIZE && pud_none(*pud)) return NULL; /* hugepage or swap? */ if (pud_huge(*pud) || !pud_present(*pud)) return (pte_t *)pud; pmd = pmd_offset(pud, addr); if (sz != PMD_SIZE && pmd_none(*pmd)) return NULL; /* hugepage or swap? */ if (pmd_huge(*pmd) || !pmd_present(*pmd)) return (pte_t *)pmd; ... The following sequence would trigger this bug: - CPU0: sz = PUD_SIZE and *pud = 0 , continue - CPU0: "pud_huge(*pud)" is false - CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT) - CPU0: "!pud_present(*pud)" is false, continue - CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp However, we want CPU0 to return NULL or pudp in this case. We must make sure there is exactly one dereference of pud and pmd. Signed-off-by: Longpeng <longpeng2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200413010342.771-1-longpeng2@huawei.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Bartosz Golaszewski authored
Kfifo has been written by Stefani Seibold and she's implicitly expected to Ack any changes to it. She's not however officially listed as kfifo maintainer which leads to delays in patch review. This patch proposes to add an explitic entry for kfifo to MAINTAINERS file. [akpm@linux-foundation.org: alphasort F: entries, per Joe] [akpm@linux-foundation.org: remove colon, per Bartosz] Link: http://lkml.kernel.org/r/20200124174533.21815-1-brgl@bgdev.pl Link: http://lkml.kernel.org/r/20200413104250.26683-1-brgl@bgdev.plSigned-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
Userfaultfd-wp is not yet working on 32bit hosts, but it's accidentally enabled previously. Disable it. Fixes: 5a281062 ("userfaultfd: wp: add WP pagetable tracking to x86") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: http://lkml.kernel.org/r/20200413141608.109211-1-peterx@redhat.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
Marco Elver reported system crashes when booting with "slub_debug=Z". The freepointer location (s->offset) was not taking into account that the "inuse" size that includes the redzone area should not be used by the freelist pointer. Change the calculation to save the area of the object that an inline freepointer may be written into. Fixes: 3202fa62 ("slub: relocate freelist pointer to middle of object") Reported-by: Marco Elver <elver@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Marco Elver <elver@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/202004151054.BD695840@keescook Link: https://lore.kernel.org/linux-mm/20200415164726.GA234932@google.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
The closing parenthesis is missing. Fixes: bfeb022f ("mm/memory_hotplug: add pgprot_t to mhp_params") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Link: http://lkml.kernel.org/r/20200413014743.16353-1-masahiroy@kernel.orgSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Paolo Bonzini authored
Merge tag 'kvm-ppc-fixes-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master PPC KVM fix for 5.7 - Fix a regression introduced in the last merge window, which results in guests in HPT mode dying randomly.
-
Paolo Bonzini authored
Merge tag 'kvm-s390-master-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fix for 5.7 and maintainer update - Silence false positive lockdep warning - add Claudio as reviewer
-
- 20 Apr, 2020 9 commits
-
-
Paul Mackerras authored
Since cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler", it's been possible in fairly rare circumstances to load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a guest on a POWER8 host. Because that case wasn't checked for, we could misinterpret the non-present PTE as being a cache-inhibited PTE. That could mismatch with the corresponding hash PTE, which would cause the function to fail with -EFAULT a little further down. That would propagate up to the KVM_RUN ioctl() generally causing the KVM userspace (usually qemu) to fall over. This addresses the problem by catching that case and returning to the guest instead. For completeness, this fixes the radix page fault handler in the same way. For radix this didn't cause any obvious misbehaviour, because we ended up putting the non-present PTE into the guest's partition-scoped page tables, leading immediately to another hypervisor data/instruction storage interrupt, which would go through the page fault path again and fix things up. Fixes: cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler" Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402Reported-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Josh Poimboeuf authored
Frame pointers are completely broken by vmenter.S because it clobbers RBP: arch/x86/kvm/svm/vmenter.o: warning: objtool: __svm_vcpu_run()+0xe4: BP used as a scratch register That's unavoidable, so just skip checking that file when frame pointers are configured in. On the other hand, ORC can handle that code just fine, so leave objtool enabled in the !FRAME_POINTER case. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Message-Id: <01fae42917bacad18be8d2cbc771353da6603473.1587398610.git.jpoimboe@redhat.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Fixes: 199cd1d7 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jarkko Sakkinen authored
Call disable_interrupts() if we have to revert to polling in order not to unnecessarily reserve the IRQ for the life-cycle of the driver. Cc: stable@vger.kernel.org # 4.5.x Reported-by: Hans de Goede <hdegoede@redhat.com> Fixes: e3837e74 ("tpm_tis: Refactor the interrupt setup") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
-
Tianjia Zhang authored
For the algorithm that does not match the bank, a positive value EINVAL is returned here. I think this is a typo error. It is necessary to return an error value. Cc: stable@vger.kernel.org # 5.4.x Fixes: 9f75c822 ("KEYS: trusted: correctly initialize digests and fix locking issue") Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
-
George Wilson authored
tpm_ibmvtpm_send() can fail during PowerVM Live Partition Mobility resume with an H_CLOSED return from ibmvtpm_send_crq(). The PAPR says, 'The "partner partition suspended" transport event disables the associated CRQ such that any H_SEND_CRQ hcall() to the associated CRQ returns H_Closed until the CRQ has been explicitly enabled using the H_ENABLE_CRQ hcall.' This patch adds a check in tpm_ibmvtpm_send() for an H_CLOSED return from ibmvtpm_send_crq() and in that case calls tpm_ibmvtpm_resume() and retries the ibmvtpm_send_crq() once. Cc: stable@vger.kernel.org # 3.7.x Fixes: 132f7629 ("drivers/char/tpm: Add new device driver to support IBM vTPM") Reported-by: Linh Pham <phaml@us.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: George Wilson <gcwilson@linux.ibm.com> Tested-by: Linh Pham <phaml@us.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
-
Stefan Berger authored
This patch fixes the following problem when the ibmvtpm driver is built as a module: ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] undefined! make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1 make: *** [Makefile:1298: modules] Error 2 Fixes: 18b3670d ("tpm: ibmvtpm: Add support for TPM2") Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
-
Michael S. Tsirkin authored
vhost is currently broken on the some ARM configs. The reason is that the ring element addresses are passed between components with different alignments assumptions. Thus, if guest selects a pointer and host then gets and dereferences it, then alignment assumed by the host's compiler might be greater than the actual alignment of the pointer. compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration. With this OABI, compiler assumes that all structures are 4 byte aligned - which is stronger than virtio guarantees for available and used rings, which are merely 2 bytes. Thus a guest without -mabi=apcs-gnu running on top of host with -mabi=apcs-gnu will be broken. The correct fix is to force alignment of structures - however that is an intrusive fix that's best deferred until the next release. We didn't previously support such ancient systems at all - this surfaced after vdpa support prompted removing dependency of vhost on VIRTULIZATION. So for now, let's just add something along the lines of depends on !ARM || AEABI to the virtio Kconfig declaration, and add a comment that it has to do with struct member alignment. Note: we can't make VHOST and VHOST_RING themselves have a dependency since these are selected. Add a new symbol for that. We should be able to drop this dependency down the road. Fixes: 20c384f1 ("vhost: refine vhost and vringh kconfig") Suggested-by: Ard Biesheuvel <ardb@kernel.org> Suggested-by: Richard Earnshaw <Richard.Earnshaw@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Claudio Imbrenda authored
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20200417152936.772256-1-imbrenda@linux.ibm.com
-
Eric Farman authored
The diag 0x44 handler, which handles a directed yield, goes into a a codepath that does a kvm_for_each_vcpu() and ultimately deliverable_irqs(). The new check for kvm_s390_pv_cpu_is_protected() contains an assertion that the vcpu->mutex is held, which isn't going to be the case in this scenario. The result is a plethora of these messages if the lock debugging is enabled, and thus an implication that we have a problem. WARNING: CPU: 9 PID: 16167 at arch/s390/kvm/kvm-s390.h:239 deliverable_irqs+0x1c6/0x1d0 [kvm] ...snip... Call Trace: [<000003ff80429bf2>] deliverable_irqs+0x1ca/0x1d0 [kvm] ([<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm]) [<000003ff8042ba82>] kvm_s390_vcpu_has_irq+0x2a/0xa8 [kvm] [<000003ff804101e2>] kvm_arch_dy_runnable+0x22/0x38 [kvm] [<000003ff80410284>] kvm_vcpu_on_spin+0x8c/0x1d0 [kvm] [<000003ff80436888>] kvm_s390_handle_diag+0x3b0/0x768 [kvm] [<000003ff80425af4>] kvm_handle_sie_intercept+0x1cc/0xcd0 [kvm] [<000003ff80422bb0>] __vcpu_run+0x7b8/0xfd0 [kvm] [<000003ff80423de6>] kvm_arch_vcpu_ioctl_run+0xee/0x3e0 [kvm] [<000003ff8040ccd8>] kvm_vcpu_ioctl+0x2c8/0x8d0 [kvm] [<00000001504ced06>] ksys_ioctl+0xae/0xe8 [<00000001504cedaa>] __s390x_sys_ioctl+0x2a/0x38 [<0000000150cb9034>] system_call+0xd8/0x2d8 2 locks held by CPU 2/KVM/16167: #0: 00000001951980c0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x90/0x8d0 [kvm] #1: 000000019599c0f0 (&kvm->srcu){....}, at: __vcpu_run+0x4bc/0xfd0 [kvm] Last Breaking-Event-Address: [<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm] irq event stamp: 11967 hardirqs last enabled at (11975): [<00000001502992f2>] console_unlock+0x4ca/0x650 hardirqs last disabled at (11982): [<0000000150298ee8>] console_unlock+0xc0/0x650 softirqs last enabled at (7940): [<0000000150cba6ca>] __do_softirq+0x422/0x4d8 softirqs last disabled at (7929): [<00000001501cd688>] do_softirq_own_stack+0x70/0x80 Considering what's being done here, let's fix this by removing the mutex assertion rather than acquiring the mutex for every other vcpu. Fixes: 201ae986 ("KVM: s390: protvirt: Implement interrupt injection") Signed-off-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20200415190353.63625-1-farman@linux.ibm.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 19 Apr, 2020 6 commits
-
-
Linus Torvalds authored
-
Brian Geffon authored
When remapping a mapping where a portion of a VMA is remapped into another portion of the VMA it can cause the VMA to become split. During the copy_vma operation the VMA can actually be remerged if it's an anonymous VMA whose pages have not yet been faulted. This isn't normally a problem because at the end of the remap the original portion is unmapped causing it to become split again. However, MREMAP_DONTUNMAP leaves that original portion in place which means that the VMA which was split and then remerged is not actually split at the end of the mremap. This patch fixes a bug where we don't detect that the VMAs got remerged and we end up putting back VM_ACCOUNT on the next mapping which is completely unreleated. When that next mapping is unmapped it results in incorrectly unaccounting for the memory which was never accounted, and eventually we will underflow on the memory comittment. There is also another issue which is similar, we're currently accouting for the number of pages in the new_vma but that's wrong. We need to account for the length of the remap operation as that's all that is being added. If there was a mapping already at that location its comittment would have been adjusted as part of the munmap at the start of the mremap. A really simple repro can be seen in: https://gist.github.com/bgaff/e101ce99da7d9a8c60acc641d07f312c Fixes: e346b381 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Brian Geffon <bgeffon@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds authored
Pull clk fixes from Stephen Boyd: "Two build fixes for a couple clk drivers and a fix for the Unisoc serial clk where we want to keep it on for earlycon" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: sprd: don't gate uart console clock clk: mmp2: fix link error without mmp2 clk: asm9260: fix __clk_hw_register_fixed_rate_with_accuracy typo
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 and objtool fixes from Thomas Gleixner: "A set of fixes for x86 and objtool: objtool: - Ignore the double UD2 which is emitted in BUG() when CONFIG_UBSAN_TRAP is enabled. - Support clang non-section symbols in objtool ORC dump - Fix switch table detection in .text.unlikely - Make the BP scratch register warning more robust. x86: - Increase microcode maximum patch size for AMD to cope with new CPUs which have a larger patch size. - Fix a crash in the resource control filesystem when the removal of the default resource group is attempted. - Preserve Code and Data Prioritization enabled state accross CPU hotplug. - Update split lock cpu matching to use the new X86_MATCH macros. - Change the split lock enumeration as Intel finaly decided that the IA32_CORE_CAPABILITIES bits are not architectural contrary to what the SDM claims. !@#%$^! - Add Tremont CPU models to the split lock detection cpu match. - Add a missing static attribute to make sparse happy" * tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Add Tremont family CPU models x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural x86/resctrl: Preserve CDP enable over CPU hotplug x86/resctrl: Fix invalid attempt at removing the default resource group x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL() x86/umip: Make umip_insns static x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE objtool: Make BP scratch register warning more robust objtool: Fix switch table detection in .text.unlikely objtool: Support Clang non-section symbols in ORC generation objtool: Support Clang non-section symbols in ORC dump objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull time namespace fix from Thomas Gleixner: "An update for the proc interface of time namespaces: Use symbolic names instead of clockid numbers. The usability nuisance of numbers was noticed by Michael when polishing the man page" * tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf tooling fixes and updates from Thomas Gleixner: - Fix the header line of perf stat output for '--metric-only --per-socket' - Fix the python build with clang - The usual tools UAPI header synchronization * tag 'perf-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools headers: Synchronize linux/bits.h with the kernel sources tools headers: Adopt verbatim copy of compiletime_assert() from kernel sources tools headers: Update x86's syscall_64.tbl with the kernel sources tools headers UAPI: Sync drm/i915_drm.h with the kernel sources tools headers UAPI: Update tools's copy of drm.h headers tools headers kvm: Sync linux/kvm.h with the kernel sources tools headers UAPI: Sync linux/fscrypt.h with the kernel sources tools include UAPI: Sync linux/vhost.h with the kernel sources tools arch x86: Sync asm/cpufeatures.h with the kernel sources tools headers UAPI: Sync linux/mman.h with the kernel tools headers UAPI: Sync sched.h with the kernel tools headers: Update linux/vdso.h and grab a copy of vdso/const.h perf stat: Fix no metric header if --per-socket and --metric-only set perf python: Check if clang supports -fno-semantic-interposition tools arch x86: Sync the msr-index.h copy with the kernel sources
-