1. 15 Jul, 2021 34 commits
    • Linus Torvalds's avatar
      Merge tag 'Wimplicit-fallthrough-clang-5.14-rc2' of... · e9338abf
      Linus Torvalds authored
      Merge tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull fallthrough fixes from Gustavo Silva:
       "This fixes many fall-through warnings when building with Clang and
        -Wimplicit-fallthrough, and also enables -Wimplicit-fallthrough for
        Clang, globally.
      
        It's also important to notice that since we have adopted the use of
        the pseudo-keyword macro fallthrough, we also want to avoid having
        more /* fall through */ comments being introduced. Contrary to GCC,
        Clang doesn't recognize any comments as implicit fall-through markings
        when the -Wimplicit-fallthrough option is enabled.
      
        So, in order to avoid having more comments being introduced, we use
        the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang,
        will cause a warning in case a code comment is intended to be used as
        a fall-through marking. The patch for Makefile also enforces this.
      
        We had almost 4,000 of these issues for Clang in the beginning, and
        there might be a couple more out there when building some
        architectures with certain configurations. However, with the recent
        fixes I think we are in good shape and it is now possible to enable
        the warning for Clang"
      
      * tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
        Makefile: Enable -Wimplicit-fallthrough for Clang
        powerpc/smp: Fix fall-through warning for Clang
        dmaengine: mpc512x: Fix fall-through warning for Clang
        usb: gadget: fsl_qe_udc: Fix fall-through warning for Clang
        powerpc/powernv: Fix fall-through warning for Clang
        MIPS: Fix unreachable code issue
        MIPS: Fix fall-through warnings for Clang
        ASoC: Mediatek: MT8183: Fix fall-through warning for Clang
        power: supply: Fix fall-through warnings for Clang
        dmaengine: ti: k3-udma: Fix fall-through warning for Clang
        s390: Fix fall-through warnings for Clang
        dmaengine: ipu: Fix fall-through warning for Clang
        iommu/arm-smmu-v3: Fix fall-through warning for Clang
        mmc: jz4740: Fix fall-through warning for Clang
        PCI: Fix fall-through warning for Clang
        scsi: libsas: Fix fall-through warning for Clang
        video: fbdev: Fix fall-through warning for Clang
        math-emu: Fix fall-through warning
        cpufreq: Fix fall-through warning for Clang
        drm/msm: Fix fall-through warning in msm_gem_new_impl()
        ...
      e9338abf
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · dd9c7df9
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 patches.
      
        Subsystems affected by this patch series: mm (kasan, pagealloc, rmap,
        hmm, and hugetlb), and hfs"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/hugetlb: fix refs calculation from unaligned @vaddr
        hfs: add lock nesting notation to hfs_find_init
        hfs: fix high memory mapping in hfs_bnode_read
        hfs: add missing clean-up in hfs_fill_super
        lib/test_hmm: remove set but unused page variable
        mm: fix the try_to_unmap prototype for !CONFIG_MMU
        mm/page_alloc: further fix __alloc_pages_bulk() return value
        mm/page_alloc: correct return value when failing at preparing
        mm/page_alloc: avoid page allocator recursion with pagesets.lock held
        Revert "mm/page_alloc: make should_fail_alloc_page() static"
        kasan: fix build by including kernel.h
        kasan: add memzero init for unaligned size at DEBUG
        mm: move helper to check slub_debug_enabled
      dd9c7df9
    • Randy Dunlap's avatar
      EDAC/igen6: fix core dependency AGAIN · a1c9ca5f
      Randy Dunlap authored
      My previous patch had a typo/thinko which prevents this driver
      from being enabled: change X64_64 to X86_64.
      
      Fixes: 0a9ece9b ("EDAC/igen6: fix core dependency")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: linux-edac@vger.kernel.org
      Cc: bowsingbetee <bowsingbetee@protonmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1c9ca5f
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 405386b0
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
      
       - Allow again loading KVM on 32-bit non-PAE builds
      
       - Fixes for host SMIs on AMD
      
       - Fixes for guest SMIs on AMD
      
       - Fixes for selftests on s390 and ARM
      
       - Fix memory leak
      
       - Enforce no-instrumentation area on vmentry when hardware breakpoints
         are in use.
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
        KVM: selftests: smm_test: Test SMM enter from L2
        KVM: nSVM: Restore nested control upon leaving SMM
        KVM: nSVM: Fix L1 state corruption upon return from SMM
        KVM: nSVM: Introduce svm_copy_vmrun_state()
        KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN
        KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA
        KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities
        KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails
        KVM: SVM: add module param to control the #SMI interception
        KVM: SVM: remove INIT intercept handler
        KVM: SVM: #SMI interception must not skip the instruction
        KVM: VMX: Remove vmx_msr_index from vmx.h
        KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run()
        KVM: selftests: Address extra memslot parameters in vm_vaddr_alloc
        kvm: debugfs: fix memory leak in kvm_create_vm_debugfs
        KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
        KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio
        KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler
        KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs
        KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR
        ...
      405386b0
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · f3523a22
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Revert a patch which caused boot failures with QCOM IOMMU
      
       - Two fixes for Intel VT-d context table handling
      
       - Physical address decoding fix for Rockchip IOMMU
      
       - Add a reviewer for AMD IOMMU
      
      * tag 'iommu-fixes-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        MAINTAINERS: Add Suravee Suthikulpanit as Reviewer for AMD IOMMU (AMD-Vi)
        iommu/rockchip: Fix physical address decoding
        iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries
        iommu/vt-d: Global devTLB flush when present context entry changed
        iommu/qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"
      f3523a22
    • Joao Martins's avatar
      mm/hugetlb: fix refs calculation from unaligned @vaddr · d08af0a5
      Joao Martins authored
      Commit 82e5d378 ("mm/hugetlb: refactor subpage recording")
      refactored the count of subpages but missed an edge case when @vaddr is
      not aligned to PAGE_SIZE e.g.  when close to vma->vm_end.  It would then
      errousnly set @refs to 0 and record_subpages_vmas() wouldn't set the
      @pages array element to its value, consequently causing the reported
      null-deref by syzbot.
      
      Fix it by aligning down @vaddr by PAGE_SIZE in @refs calculation.
      
      Link: https://lkml.kernel.org/r/20210713152440.28650-1-joao.m.martins@oracle.com
      Fixes: 82e5d378 ("mm/hugetlb: refactor subpage recording")
      Reported-by: syzbot+a3fcd59df1b372066f5a@syzkaller.appspotmail.com
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d08af0a5
    • Desmond Cheong Zhi Xi's avatar
      hfs: add lock nesting notation to hfs_find_init · b3b2177a
      Desmond Cheong Zhi Xi authored
      Syzbot reports a possible recursive lock in [1].
      
      This happens due to missing lock nesting information.  From the logs, we
      see that a call to hfs_fill_super is made to mount the hfs filesystem.
      While searching for the root inode, the lock on the catalog btree is
      grabbed.  Then, when the parent of the root isn't found, a call to
      __hfs_bnode_create is made to create the parent of the root.  This
      eventually leads to a call to hfs_ext_read_extent which grabs a lock on
      the extents btree.
      
      Since the order of locking is catalog btree -> extents btree, this lock
      hierarchy does not lead to a deadlock.
      
      To tell lockdep that this locking is safe, we add nesting notation to
      distinguish between catalog btrees, extents btrees, and attributes
      btrees (for HFS+).  This has already been done in hfsplus.
      
      Link: https://syzkaller.appspot.com/bug?id=f007ef1d7a31a469e3be7aeb0fde0769b18585db [1]
      Link: https://lkml.kernel.org/r/20210701030756.58760-4-desmondcheongzx@gmail.comSigned-off-by: default avatarDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Reported-by: syzbot+b718ec84a87b7e73ade4@syzkaller.appspotmail.com
      Tested-by: syzbot+b718ec84a87b7e73ade4@syzkaller.appspotmail.com
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3b2177a
    • Desmond Cheong Zhi Xi's avatar
      hfs: fix high memory mapping in hfs_bnode_read · 54a5ead6
      Desmond Cheong Zhi Xi authored
      Pages that we read in hfs_bnode_read need to be kmapped into kernel
      address space.  However, currently only the 0th page is kmapped.  If the
      given offset + length exceeds this 0th page, then we have an invalid
      memory access.
      
      To fix this, we kmap relevant pages one by one and copy their relevant
      portions of data.
      
      An example of invalid memory access occurring without this fix can be seen
      in the following crash report:
      
        ==================================================================
        BUG: KASAN: use-after-free in memcpy include/linux/fortify-string.h:191 [inline]
        BUG: KASAN: use-after-free in hfs_bnode_read+0xc4/0xe0 fs/hfs/bnode.c:26
        Read of size 2 at addr ffff888125fdcffe by task syz-executor5/4634
      
        CPU: 0 PID: 4634 Comm: syz-executor5 Not tainted 5.13.0-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:79 [inline]
         dump_stack+0x195/0x1f8 lib/dump_stack.c:120
         print_address_description.constprop.0+0x1d/0x110 mm/kasan/report.c:233
         __kasan_report mm/kasan/report.c:419 [inline]
         kasan_report.cold+0x7b/0xd4 mm/kasan/report.c:436
         check_region_inline mm/kasan/generic.c:180 [inline]
         kasan_check_range+0x154/0x1b0 mm/kasan/generic.c:186
         memcpy+0x24/0x60 mm/kasan/shadow.c:65
         memcpy include/linux/fortify-string.h:191 [inline]
         hfs_bnode_read+0xc4/0xe0 fs/hfs/bnode.c:26
         hfs_bnode_read_u16 fs/hfs/bnode.c:34 [inline]
         hfs_bnode_find+0x880/0xcc0 fs/hfs/bnode.c:365
         hfs_brec_find+0x2d8/0x540 fs/hfs/bfind.c:126
         hfs_brec_read+0x27/0x120 fs/hfs/bfind.c:165
         hfs_cat_find_brec+0x19a/0x3b0 fs/hfs/catalog.c:194
         hfs_fill_super+0xc13/0x1460 fs/hfs/super.c:419
         mount_bdev+0x331/0x3f0 fs/super.c:1368
         hfs_mount+0x35/0x40 fs/hfs/super.c:457
         legacy_get_tree+0x10c/0x220 fs/fs_context.c:592
         vfs_get_tree+0x93/0x300 fs/super.c:1498
         do_new_mount fs/namespace.c:2905 [inline]
         path_mount+0x13f5/0x20e0 fs/namespace.c:3235
         do_mount fs/namespace.c:3248 [inline]
         __do_sys_mount fs/namespace.c:3456 [inline]
         __se_sys_mount fs/namespace.c:3433 [inline]
         __x64_sys_mount+0x2b8/0x340 fs/namespace.c:3433
         do_syscall_64+0x37/0xc0 arch/x86/entry/common.c:47
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x45e63a
        Code: 48 c7 c2 bc ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d2 e8 88 04 00 00 0f 1f 84 00 00 00 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
        RSP: 002b:00007f9404d410d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
        RAX: ffffffffffffffda RBX: 0000000020000248 RCX: 000000000045e63a
        RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007f9404d41120
        RBP: 00007f9404d41120 R08: 00000000200002c0 R09: 0000000020000000
        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
        R13: 0000000000000003 R14: 00000000004ad5d8 R15: 0000000000000000
      
        The buggy address belongs to the page:
        page:00000000dadbcf3e refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x125fdc
        flags: 0x2fffc0000000000(node=0|zone=2|lastcpupid=0x3fff)
        raw: 02fffc0000000000 ffffea000497f748 ffffea000497f6c8 0000000000000000
        raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
      
        Memory state around the buggy address:
         ffff888125fdce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
         ffff888125fdcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        >ffff888125fdcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                                        ^
         ffff888125fdd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
         ffff888125fdd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ==================================================================
      
      Link: https://lkml.kernel.org/r/20210701030756.58760-3-desmondcheongzx@gmail.comSigned-off-by: default avatarDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54a5ead6
    • Desmond Cheong Zhi Xi's avatar
      hfs: add missing clean-up in hfs_fill_super · 16ee572e
      Desmond Cheong Zhi Xi authored
      Patch series "hfs: fix various errors", v2.
      
      This series ultimately aims to address a lockdep warning in
      hfs_find_init reported by Syzbot [1].
      
      The work done for this led to the discovery of another bug, and the
      Syzkaller repro test also reveals an invalid memory access error after
      clearing the lockdep warning.  Hence, this series is broken up into
      three patches:
      
      1. Add a missing call to hfs_find_exit for an error path in
         hfs_fill_super
      
      2. Fix memory mapping in hfs_bnode_read by fixing calls to kmap
      
      3. Add lock nesting notation to tell lockdep that the observed locking
         hierarchy is safe
      
      This patch (of 3):
      
      Before exiting hfs_fill_super, the struct hfs_find_data used in
      hfs_find_init should be passed to hfs_find_exit to be cleaned up, and to
      release the lock held on the btree.
      
      The call to hfs_find_exit is missing from an error path.  We add it back
      in by consolidating calls to hfs_find_exit for error paths.
      
      Link: https://syzkaller.appspot.com/bug?id=f007ef1d7a31a469e3be7aeb0fde0769b18585db [1]
      Link: https://lkml.kernel.org/r/20210701030756.58760-1-desmondcheongzx@gmail.com
      Link: https://lkml.kernel.org/r/20210701030756.58760-2-desmondcheongzx@gmail.comSigned-off-by: default avatarDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16ee572e
    • Alistair Popple's avatar
      lib/test_hmm: remove set but unused page variable · c52114d9
      Alistair Popple authored
      The HMM selftests use atomic_check_access() to check atomic access to a
      page has been revoked.  It doesn't matter if the page mapping has been
      removed from the mirrored page tables as that also implies atomic access
      has been revoked.  Therefore remove the unused page variable to fix this
      compiler warning:
      
        lib/test_hmm.c:631:16: warning: variable `page' set but not used [-Wunused-but-set-variable]
      
      Link: https://lkml.kernel.org/r/20210706025603.4059-1-apopple@nvidia.com
      Fixes: b659baea ("mm: selftests for exclusive device memory")
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Reported-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Acked-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c52114d9
    • Christoph Hellwig's avatar
      mm: fix the try_to_unmap prototype for !CONFIG_MMU · ab7965de
      Christoph Hellwig authored
      Adjust the nommu stub of try_to_unmap to match the changed protype for the
      full version.  Turn it into an inline instead of a macro to generally
      improve the type checking.
      
      Link: https://lkml.kernel.org/r/20210705053944.885828-1-hch@lst.de
      Fixes: 1fb08ac6 ("mm: rmap: make try_to_unmap() void function")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab7965de
    • Chuck Lever's avatar
      mm/page_alloc: further fix __alloc_pages_bulk() return value · 06147843
      Chuck Lever authored
      The author of commit b3b64ebd ("mm/page_alloc: do bulk array
      bounds check after checking populated elements") was possibly
      confused by the mixture of return values throughout the function.
      
      The API contract is clear that the function "Returns the number of pages
      on the list or array." It does not list zero as a unique return value with
      a special meaning.  Therefore zero is a plausible return value only if
      @nr_pages is zero or less.
      
      Clean up the return logic to make it clear that the returned value is
      always the total number of pages in the array/list, not the number of
      pages that were allocated during this call.
      
      The only change in behavior with this patch is the value returned if
      prepare_alloc_pages() fails.  To match the API contract, the number of
      pages currently in the array/list is returned in this case.
      
      The call site in __page_pool_alloc_pages_slow() also seems to be confused
      on this matter.  It should be attended to by someone who is familiar with
      that code.
      
      [mel@techsingularity.net: Return nr_populated if 0 pages are requested]
      
      Link: https://lkml.kernel.org/r/20210713152100.10381-4-mgorman@techsingularity.netSigned-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Cc: Zhang Qiang <Qiang.Zhang@windriver.com>
      Cc: Yanfei Xu <yanfei.xu@windriver.com>
      Cc: Matteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      06147843
    • Yanfei Xu's avatar
      mm/page_alloc: correct return value when failing at preparing · e5c15cea
      Yanfei Xu authored
      If the array passed in is already partially populated, we should return
      "nr_populated" even failing at preparing arguments stage.
      
      Link: https://lkml.kernel.org/r/20210713152100.10381-3-mgorman@techsingularity.netSigned-off-by: default avatarYanfei Xu <yanfei.xu@windriver.com>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Link: https://lore.kernel.org/r/20210709102855.55058-1-yanfei.xu@windriver.comSigned-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5c15cea
    • Mel Gorman's avatar
      mm/page_alloc: avoid page allocator recursion with pagesets.lock held · 187ad460
      Mel Gorman authored
      Syzbot is reporting potential deadlocks due to pagesets.lock when
      PAGE_OWNER is enabled.  One example from Desmond Cheong Zhi Xi is as
      follows
      
        __alloc_pages_bulk()
          local_lock_irqsave(&pagesets.lock, flags) <---- outer lock here
          prep_new_page():
            post_alloc_hook():
              set_page_owner():
                __set_page_owner():
                  save_stack():
                    stack_depot_save():
                      alloc_pages():
                        alloc_page_interleave():
                          __alloc_pages():
                            get_page_from_freelist():
                              rm_queue():
                                rm_queue_pcplist():
                                  local_lock_irqsave(&pagesets.lock, flags);
                                  *** DEADLOCK ***
      
      Zhang, Qiang also reported
      
        BUG: sleeping function called from invalid context at mm/page_alloc.c:5179
        in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
        .....
        __dump_stack lib/dump_stack.c:79 [inline]
        dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
        ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153
        prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179
        __alloc_pages+0x12f/0x500 mm/page_alloc.c:5375
        alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2147
        alloc_pages+0x238/0x2a0 mm/mempolicy.c:2270
        stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303
        save_stack+0x15e/0x1e0 mm/page_owner.c:120
        __set_page_owner+0x50/0x290 mm/page_owner.c:181
        prep_new_page mm/page_alloc.c:2445 [inline]
        __alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313
        alloc_pages_bulk_array_node include/linux/gfp.h:557 [inline]
        vm_area_alloc_pages mm/vmalloc.c:2775 [inline]
        __vmalloc_area_node mm/vmalloc.c:2845 [inline]
        __vmalloc_node_range+0x39d/0x960 mm/vmalloc.c:2947
        __vmalloc_node mm/vmalloc.c:2996 [inline]
        vzalloc+0x67/0x80 mm/vmalloc.c:3066
      
      There are a number of ways it could be fixed.  The page owner code could
      be audited to strip GFP flags that allow sleeping but it'll impair the
      functionality of PAGE_OWNER if allocations fail.  The bulk allocator could
      add a special case to release/reacquire the lock for prep_new_page and
      lookup PCP after the lock is reacquired at the cost of performance.  The
      pages requiring prep could be tracked using the least significant bit and
      looping through the array although it is more complicated for the list
      interface.  The options are relatively complex and the second one still
      incurs a performance penalty when PAGE_OWNER is active so this patch takes
      the simple approach -- disable bulk allocation of PAGE_OWNER is active.
      The caller will be forced to allocate one page at a time incurring a
      performance penalty but PAGE_OWNER is already a performance penalty.
      
      Link: https://lkml.kernel.org/r/20210708081434.GV3840@techsingularity.net
      Fixes: dbbee9d5 ("mm/page_alloc: convert per-cpu list protection to local_lock")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Reported-by: default avatar"Zhang, Qiang" <Qiang.Zhang@windriver.com>
      Reported-by: syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com
      Tested-by: syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      187ad460
    • Matteo Croce's avatar
      Revert "mm/page_alloc: make should_fail_alloc_page() static" · 54aa3866
      Matteo Croce authored
      This reverts commit f7173090.
      
      Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:
      
          LD      vmlinux
          BTFIDS  vmlinux
        FAILED unresolved symbol should_fail_alloc_page
        make: *** [Makefile:1199: vmlinux] Error 255
        make: *** Deleting file 'vmlinux'
      
      Link: https://lkml.kernel.org/r/20210708191128.153796-1-mcroce@linux.microsoft.com
      Fixes: f7173090 ("mm/page_alloc: make should_fail_alloc_page() static")
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54aa3866
    • Marco Elver's avatar
      kasan: fix build by including kernel.h · 2db710cc
      Marco Elver authored
      The <linux/kasan.h> header relies on _RET_IP_ being defined, and had been
      receiving that definition via inclusion of bug.h which includes kernel.h.
      However, since f39650de ("kernel.h: split out panic and oops helpers")
      that is no longer the case and get the following build error when building
      CONFIG_KASAN_HW_TAGS on arm64:
      
        In file included from arch/arm64/mm/kasan_init.c:10:
        include/linux/kasan.h: In function 'kasan_slab_free':
        include/linux/kasan.h:230:39: error: '_RET_IP_' undeclared (first use in this function)
          230 |   return __kasan_slab_free(s, object, _RET_IP_, init);
      
      Fix it by including kernel.h from kasan.h.
      
      Link: https://lkml.kernel.org/r/20210705072716.2125074-1-elver@google.com
      Fixes: f39650de ("kernel.h: split out panic and oops helpers")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2db710cc
    • Yee Lee's avatar
      kasan: add memzero init for unaligned size at DEBUG · 77a63c69
      Yee Lee authored
      Issue: when SLUB debug is on, hwtag kasan_unpoison() would overwrite the
      redzone of object with unaligned size.
      
      An additional memzero_explicit() path is added to replacing init by hwtag
      instruction for those unaligned size at SLUB debug mode.
      
      The penalty is acceptable since they are only enabled in debug mode, not
      production builds.  A block of comment is added for explanation.
      
      Link: https://lkml.kernel.org/r/20210705103229.8505-3-yee.lee@mediatek.comSigned-off-by: default avatarYee Lee <yee.lee@mediatek.com>
      Suggested-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Nicholas Tang <nicholas.tang@mediatek.com>
      Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77a63c69
    • Marco Elver's avatar
      mm: move helper to check slub_debug_enabled · 0d4a062a
      Marco Elver authored
      Move the helper to check slub_debug_enabled, so that we can confine the
      use of #ifdef outside slub.c as well.
      
      Link: https://lkml.kernel.org/r/20210705103229.8505-2-yee.lee@mediatek.comSigned-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarYee Lee <yee.lee@mediatek.com>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Cc: Nicholas Tang <nicholas.tang@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0d4a062a
    • Vitaly Kuznetsov's avatar
      KVM: selftests: smm_test: Test SMM enter from L2 · d951b221
      Vitaly Kuznetsov authored
      Two additional tests are added:
      - SMM triggered from L2 does not currupt L1 host state.
      - Save/restore during SMM triggered from L2 does not corrupt guest/host
        state.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-7-vkuznets@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d951b221
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Restore nested control upon leaving SMM · bb00bd9c
      Vitaly Kuznetsov authored
      If the VM was migrated while in SMM, no nested state was saved/restored,
      and therefore svm_leave_smm has to load both save and control area
      of the vmcb12. Save area is already loaded from HSAVE area,
      so now load the control area as well from the vmcb12.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-6-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bb00bd9c
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Fix L1 state corruption upon return from SMM · 37be407b
      Vitaly Kuznetsov authored
      VMCB split commit 4995a368 ("KVM: SVM: Use a separate vmcb for the
      nested L2 guest") broke return from SMM when we entered there from guest
      (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
      manifests itself like this:
      
        kvm_exit:             reason EXIT_RSM rip 0x7ffbb280 info 0 0
        kvm_emulate_insn:     0:7ffbb280: 0f aa
        kvm_smm_transition:   vcpu 0: leaving SMM, smbase 0x7ffb3000
        kvm_nested_vmrun:     rip: 0x000000007ffbb280 vmcb: 0x0000000008224000
          nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000
          npt: on
        kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002
          intercepts: fd44bfeb 0000217f 00000000
        kvm_entry:            vcpu 0, rip 0xffffffffffbbe119
        kvm_exit:             reason EXIT_NPF rip 0xffffffffffbbe119 info
          200000006 1ab000
        kvm_nested_vmexit:    vcpu 0 reason npf rip 0xffffffffffbbe119 info1
          0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000
          error_code 0x00000000
        kvm_page_fault:       address 1ab000 error_code 6
        kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000
          int_info 0 int_info_err 0
        kvm_entry:            vcpu 0, rip 0x7ffbb280
        kvm_exit:             reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0
        kvm_emulate_insn:     0:7ffbb280: 0f aa
        kvm_inj_exception:    #GP (0x0)
      
      Note: return to L2 succeeded but upon first exit to L1 its RIP points to
      'RSM' instruction but we're not in SMM.
      
      The problem appears to be that VMCB01 gets irreversibly destroyed during
      SMM execution. Previously, we used to have 'hsave' VMCB where regular
      (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just
      switch to VMCB01 from VMCB02.
      
      Pre-split (working) flow looked like:
      - SMM is triggered during L2's execution
      - L2's state is pushed to SMRAM
      - nested_svm_vmexit() restores L1's state from 'hsave'
      - SMM -> RSM
      - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have
        pre-SMM (and pre L2 VMRUN) L1's state there
      - L2's state is restored from SMRAM
      - upon first exit L1's state is restored from L1.
      
      This was always broken with regards to svm_get_nested_state()/
      svm_set_nested_state(): 'hsave' was never a part of what's being
      save and restored so migration happening during SMM triggered from L2 would
      never restore L1's state correctly.
      
      Post-split flow (broken) looks like:
      - SMM is triggered during L2's execution
      - L2's state is pushed to SMRAM
      - nested_svm_vmexit() switches to VMCB01 from VMCB02
      - SMM -> RSM
      - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01
        is already lost.
      - L2's state is restored from SMRAM
      - upon first exit L1's state is restored from VMCB01 but it is corrupted
       (reflects the state during 'RSM' execution).
      
      VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest
      and host state so when we switch back to VMCS02 L1's state is intact there.
      
      To resolve the issue we need to save L1's state somewhere. We could've
      created a third VMCB for SMM but that would require us to modify saved
      state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA)
      seems appropriate: L0 is free to save any (or none) of L1's state there.
      Currently, KVM does 'none'.
      
      Note, for nested state migration to succeed, both source and destination
      hypervisors must have the fix. We, however, don't need to create a new
      flag indicating the fact that HSAVE area is now populated as migration
      during SMM triggered from L2 was always broken.
      
      Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      37be407b
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Introduce svm_copy_vmrun_state() · 0a758290
      Vitaly Kuznetsov authored
      Separate the code setting non-VMLOAD-VMSAVE state from
      svm_set_nested_state() into its own function. This is going to be
      re-used from svm_enter_smm()/svm_leave_smm().
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-4-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0a758290
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN · fb79f566
      Vitaly Kuznetsov authored
      APM states that "The address written to the VM_HSAVE_PA MSR, which holds
      the address of the page used to save the host state on a VMRUN, must point
      to a hypervisor-owned page. If this check fails, the WRMSR will fail with
      a #GP(0) exception. Note that a value of 0 is not considered valid for the
      VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will
      fail with a #GP(0) exception."
      
      svm_set_msr() already checks that the supplied address is valid, so only
      check for '0' is missing. Add it to nested_svm_vmrun().
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-3-vkuznets@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fb79f566
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA · fce7e152
      Vitaly Kuznetsov authored
      APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when
      the supplied address is not page-aligned or is outside of "maximum
      supported physical address for this implementation".
      page_address_valid() check seems suitable. Also, forcefully page-align
      the address when it's written from VMM.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-2-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      [Add comment about behavior for host-provided values. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fce7e152
    • Sean Christopherson's avatar
      KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities · c7a1b2b6
      Sean Christopherson authored
      Use IS_ERR() instead of checking for a NULL pointer when querying for
      sev_pin_memory() failures.  sev_pin_memory() always returns an error code
      cast to a pointer, or a valid pointer; it never returns NULL.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Steve Rutherford <srutherford@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Ashish Kalra <ashish.kalra@amd.com>
      Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
      Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210506175826.2166383-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c7a1b2b6
    • Sean Christopherson's avatar
      KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails · b4a69392
      Sean Christopherson authored
      Return -EFAULT if copy_to_user() fails; if accessing user memory faults,
      copy_to_user() returns the number of bytes remaining, not an error code.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Steve Rutherford <srutherford@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Ashish Kalra <ashish.kalra@amd.com>
      Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210506175826.2166383-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b4a69392
    • Maxim Levitsky's avatar
      KVM: SVM: add module param to control the #SMI interception · 4b639a9f
      Maxim Levitsky authored
      In theory there are no side effects of not intercepting #SMI,
      because then #SMI becomes transparent to the OS and the KVM.
      
      Plus an observation on recent Zen2 CPUs reveals that these
      CPUs ignore #SMI interception and never deliver #SMI VMexits.
      
      This is also useful to test nested KVM to see that L1
      handles #SMIs correctly in case when L1 doesn't intercept #SMI.
      
      Finally the default remains the same, the SMI are intercepted
      by default thus this patch doesn't have any effect unless
      non default module param value is used.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4b639a9f
    • Maxim Levitsky's avatar
      KVM: SVM: remove INIT intercept handler · 896707c2
      Maxim Levitsky authored
      Kernel never sends real INIT even to CPUs, other than on boot.
      
      Thus INIT interception is an error which should be caught
      by a check for an unknown VMexit reason.
      
      On top of that, the current INIT VM exit handler skips
      the current instruction which is wrong.
      That was added in commit 5ff3a351 ("KVM: x86: Move trivial
      instruction-based exit handlers to common code").
      
      Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-3-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      896707c2
    • Maxim Levitsky's avatar
      KVM: SVM: #SMI interception must not skip the instruction · 991afbbe
      Maxim Levitsky authored
      Commit 5ff3a351 ("KVM: x86: Move trivial instruction-based
      exit handlers to common code"), unfortunately made a mistake of
      treating nop_on_interception and nop_interception in the same way.
      
      Former does truly nothing while the latter skips the instruction.
      
      SMI VM exit handler should do nothing.
      (SMI itself is handled by the host when we do STGI)
      
      Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      991afbbe
    • Yu Zhang's avatar
      KVM: VMX: Remove vmx_msr_index from vmx.h · c0e1303e
      Yu Zhang authored
      vmx_msr_index was used to record the list of MSRs which can be lazily
      restored when kvm returns to userspace. It is now reimplemented as
      kvm_uret_msrs_list, a common x86 list which is only used inside x86.c.
      So just remove the obsolete declaration in vmx.h.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Message-Id: <20210707235702.31595-1-yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c0e1303e
    • Lai Jiangshan's avatar
      KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run() · f85d4016
      Lai Jiangshan authored
      When the host is using debug registers but the guest is not using them
      nor is the guest in guest-debug state, the kvm code does not reset
      the host debug registers before kvm_x86->run().  Rather, it relies on
      the hardware vmentry instruction to automatically reset the dr7 registers
      which ensures that the host breakpoints do not affect the guest.
      
      This however violates the non-instrumentable nature around VM entry
      and exit; for example, when a host breakpoint is set on vcpu->arch.cr2,
      
      Another issue is consistency.  When the guest debug registers are active,
      the host breakpoints are reset before kvm_x86->run(). But when the
      guest debug registers are inactive, the host breakpoints are delayed to
      be disabled.  The host tracing tools may see different results depending
      on what the guest is doing.
      
      To fix the problems, we clear %db7 unconditionally before kvm_x86->run()
      if the host has set any breakpoints, no matter if the guest is using
      them or not.
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210628172632.81029-1-jiangshanlai@gmail.com>
      Cc: stable@vger.kernel.org
      [Only clear %db7 instead of reloading all debug registers. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f85d4016
    • Ricardo Koller's avatar
      KVM: selftests: Address extra memslot parameters in vm_vaddr_alloc · 6f2f86ec
      Ricardo Koller authored
      Commit a75a895e ("KVM: selftests: Unconditionally use memslot 0 for
      vaddr allocations") removed the memslot parameters from vm_vaddr_alloc.
      It addressed all callers except one under lib/aarch64/, due to a race
      with commit e3db7579 ("KVM: selftests: Add exception handling
      support for aarch64")
      
      Fix the vm_vaddr_alloc call in lib/aarch64/processor.c.
      Reported-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarRicardo Koller <ricarkol@google.com>
      Message-Id: <20210702201042.4036162-1-ricarkol@google.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6f2f86ec
    • Pavel Skripkin's avatar
      kvm: debugfs: fix memory leak in kvm_create_vm_debugfs · 004d62eb
      Pavel Skripkin authored
      In commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      loop for filling debugfs_stat_data was copy-pasted 2 times, but
      in the second loop pointers are saved over pointers allocated
      in the first loop.  All this causes is a memory leak, fix it.
      
      Fixes: bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Reviewed-by: default avatarJing Zhang <jingzhangos@google.com>
      Message-Id: <20210701195500.27097-1-paskripkin@gmail.com>
      Reviewed-by: default avatarJing Zhang <jingzhangos@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      004d62eb
    • Suravee Suthikulpanit's avatar
  2. 14 Jul, 2021 6 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8096acd7
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski.
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - sock: fix parameter order in sock_setsockopt()
      
        Current release - new code bugs:
      
         - netfilter: nft_last:
             - fix incorrect arithmetic when restoring last used
             - honor NFTA_LAST_SET on restoration
      
        Previous releases - regressions:
      
         - udp: properly flush normal packet at GRO time
      
         - sfc: ensure correct number of XDP queues; don't allow enabling the
           feature if there isn't sufficient resources to Tx from any CPU
      
         - dsa: sja1105: fix address learning getting disabled on the CPU port
      
         - mptcp: addresses a rmem accounting issue that could keep packets in
           subflow receive buffers longer than necessary, delaying MPTCP-level
           ACKs
      
         - ip_tunnel: fix mtu calculation for ETHER tunnel devices
      
         - do not reuse skbs allocated from skbuff_fclone_cache in the napi
           skb cache, we'd try to return them to the wrong slab cache
      
         - tcp: consistently disable header prediction for mptcp
      
        Previous releases - always broken:
      
         - bpf: fix subprog poke descriptor tracking use-after-free
      
         - ipv6:
             - allocate enough headroom in ip6_finish_output2() in case
               iptables TEE is used
             - tcp: drop silly ICMPv6 packet too big messages to avoid
               expensive and pointless lookups (which may serve as a DDOS
               vector)
             - make sure fwmark is copied in SYNACK packets
             - fix 'disable_policy' for forwarded packets (align with IPv4)
      
         - netfilter: conntrack:
             - do not renew entry stuck in tcp SYN_SENT state
             - do not mark RST in the reply direction coming after SYN packet
               for an out-of-sync entry
      
         - mptcp: cleanly handle error conditions with MP_JOIN and syncookies
      
         - mptcp: fix double free when rejecting a join due to port mismatch
      
         - validate lwtstate->data before returning from skb_tunnel_info()
      
         - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
      
         - mt76: mt7921: continue to probe driver when fw already downloaded
      
         - bonding: fix multiple issues with offloading IPsec to (thru?) bond
      
         - stmmac: ptp: fix issues around Qbv support and setting time back
      
         - bcmgenet: always clear wake-up based on energy detection
      
        Misc:
      
         - sctp: move 198 addresses from unusable to private scope
      
         - ptp: support virtual clocks and timestamping
      
         - openvswitch: optimize operation for key comparison"
      
      * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
        net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
        sfc: add logs explaining XDP_TX/REDIRECT is not available
        sfc: ensure correct number of XDP queues
        sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
        net: fddi: fix UAF in fza_probe
        net: dsa: sja1105: fix address learning getting disabled on the CPU port
        net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
        net: Use nlmsg_unicast() instead of netlink_unicast()
        octeontx2-pf: Fix uninitialized boolean variable pps
        ipv6: allocate enough headroom in ip6_finish_output2()
        net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
        net: bridge: multicast: fix MRD advertisement router port marking race
        net: bridge: multicast: fix PIM hello router port marking race
        net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
        dsa: fix for_each_child.cocci warnings
        virtio_net: check virtqueue_add_sgs() return value
        mptcp: properly account bulk freed memory
        selftests: mptcp: fix case multiple subflows limited by server
        mptcp: avoid processing packet if a subflow reset
        mptcp: fix syncookie process if mptcp can not_accept new subflow
        ...
      8096acd7
    • Christian Brauner's avatar
      fs: add vfs_parse_fs_param_source() helper · d1d488d8
      Christian Brauner authored
      Add a simple helper that filesystems can use in their parameter parser
      to parse the "source" parameter. A few places open-coded this function
      and that already caused a bug in the cgroup v1 parser that we fixed.
      Let's make it harder to get this wrong by introducing a helper which
      performs all necessary checks.
      
      Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1d488d8
    • Christian Brauner's avatar
      cgroup: verify that source is a string · 3b046272
      Christian Brauner authored
      The following sequence can be used to trigger a UAF:
      
          int fscontext_fd = fsopen("cgroup");
          int fd_null = open("/dev/null, O_RDONLY);
          int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
          close_range(3, ~0U, 0);
      
      The cgroup v1 specific fs parser expects a string for the "source"
      parameter.  However, it is perfectly legitimate to e.g.  specify a file
      descriptor for the "source" parameter.  The fs parser doesn't know what
      a filesystem allows there.  So it's a bug to assume that "source" is
      always of type fs_value_is_string when it can reasonably also be
      fs_value_is_file.
      
      This assumption in the cgroup code causes a UAF because struct
      fs_parameter uses a union for the actual value.  Access to that union is
      guarded by the param->type member.  Since the cgroup paramter parser
      didn't check param->type but unconditionally moved param->string into
      fc->source a close on the fscontext_fd would trigger a UAF during
      put_fs_context() which frees fc->source thereby freeing the file stashed
      in param->file causing a UAF during a close of the fd_null.
      
      Fix this by verifying that param->type is actually a string and report
      an error if not.
      
      In follow up patches I'll add a new generic helper that can be used here
      and by other filesystems instead of this error-prone copy-pasta fix.
      But fixing it in here first makes backporting a it to stable a lot
      easier.
      
      Fixes: 8d2451f4 ("cgroup1: switch to option-by-option parsing")
      Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@kernel.org>
      Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b046272
    • Like Xu's avatar
      KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM · 7234c362
      Like Xu authored
      The AMD platform does not support the functions Ah CPUID leaf. The returned
      results for this entry should all remain zero just like the native does:
      
      AMD host:
         0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
      (uncanny) AMD guest:
         0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00008000
      
      Fixes: cadbaa03 ("perf/x86/intel: Make anythread filter support conditional")
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20210628074354.33848-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7234c362
    • Kefeng Wang's avatar
      KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio · 23fa2e46
      Kefeng Wang authored
      BUG: KASAN: use-after-free in kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
      Read of size 8 at addr ffff0000c03a2500 by task syz-executor083/4269
      
      CPU: 5 PID: 4269 Comm: syz-executor083 Not tainted 5.10.0 #7
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
       show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x164 lib/dump_stack.c:118
       print_address_description+0x78/0x5c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report+0x148/0x1e4 mm/kasan/report.c:562
       check_memory_region_inline mm/kasan/generic.c:183 [inline]
       __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
       kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
       kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Allocated by task 4269:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
       kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
       kmem_cache_alloc_trace include/linux/slab.h:450 [inline]
       kmalloc include/linux/slab.h:552 [inline]
       kzalloc include/linux/slab.h:664 [inline]
       kvm_vm_ioctl_register_coalesced_mmio+0x78/0x1cc arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:146
       kvm_vm_ioctl+0x7e8/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3746
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Freed by task 4269:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track+0x38/0x6c mm/kasan/common.c:56
       kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
       __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
       kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
       slab_free_hook mm/slub.c:1544 [inline]
       slab_free_freelist_hook mm/slub.c:1577 [inline]
       slab_free mm/slub.c:3142 [inline]
       kfree+0x104/0x38c mm/slub.c:4124
       coalesced_mmio_destructor+0x94/0xa4 arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:102
       kvm_iodevice_destructor include/kvm/iodev.h:61 [inline]
       kvm_io_bus_unregister_dev+0x248/0x280 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:4374
       kvm_vm_ioctl_unregister_coalesced_mmio+0x158/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:186
       kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      If kvm_io_bus_unregister_dev() return -ENOMEM, we already call kvm_iodevice_destructor()
      inside this function to delete 'struct kvm_coalesced_mmio_dev *dev' from list
      and free the dev, but kvm_iodevice_destructor() is called again, it will lead
      the above issue.
      
      Let's check the the return value of kvm_io_bus_unregister_dev(), only call
      kvm_iodevice_destructor() if the return value is 0.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Message-Id: <20210626070304.143456-1-wangkefeng.wang@huawei.com>
      Cc: stable@vger.kernel.org
      Fixes: 5d3c4c79 ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed", 2021-04-20)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      23fa2e46
    • Sean Christopherson's avatar
      KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler · 76ff371b
      Sean Christopherson authored
      Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
      non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
      hits the NPT in hardware.  Clearing the bit for non-SEV guests causes KVM
      to mishandle #NPFs with that collide with the host's C-bit.
      
      Although the APM doesn't explicitly state that the C-bit is not reserved
      for non-SEV, Tom Lendacky confirmed that the following snippet about the
      effective reduction due to the C-bit does indeed apply only to SEV guests.
      
        Note that because guest physical addresses are always translated
        through the nested page tables, the size of the guest physical address
        space is not impacted by any physical address space reduction indicated
        in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
        the guest physical address space is effectively reduced by 1 bit.
      
      And for SEV guests, the APM clearly states that the bit is dropped before
      walking the nested page tables.
      
        If the C-bit is an address bit, this bit is masked from the guest
        physical address when it is translated through the nested page tables.
        Consequently, the hypervisor does not need to be aware of which pages
        the guest has chosen to mark private.
      
      Note, the bogus C-bit clearing was removed from legacy #PF handler in
      commit 6d1b867d ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
      interception").
      
      Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address")
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210625020354.431829-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      76ff371b