1. 06 Jun, 2021 1 commit
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 773ac53b
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "A bunch of x86/urgent stuff accumulated for the last two weeks so
        lemme unload it to you.
      
        It should be all totally risk-free, of course. :-)
      
         - Fix out-of-spec hardware (1st gen Hygon) which does not implement
           MSR_AMD64_SEV even though the spec clearly states so, and check
           CPUID bits first.
      
         - Send only one signal to a task when it is a SEGV_PKUERR si_code
           type.
      
         - Do away with all the wankery of reserving X amount of memory in the
           first megabyte to prevent BIOS corrupting it and simply and
           unconditionally reserve the whole first megabyte.
      
         - Make alternatives NOP optimization work at an arbitrary position
           within the patched sequence because the compiler can put
           single-byte NOPs for alignment anywhere in the sequence (32-bit
           retpoline), vs our previous assumption that the NOPs are only
           appended.
      
         - Force-disable ENQCMD[S] instructions support and remove
           update_pasid() because of insufficient protection against FPU state
           modification in an interrupt context, among other xstate horrors
           which are being addressed at the moment. This one limits the
           fallout until proper enablement.
      
         - Use cpu_feature_enabled() in the idxd driver so that it can be
           build-time disabled through the defines in disabled-features.h.
      
         - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
           LVT value is read before APIC initialization so that softlockups
           during boot do not happen at least on one machine.
      
         - Mark all legacy interrupts as legacy vectors when the IO-APIC is
           disabled and when all legacy interrupts are routed through the PIC"
      
      * tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sev: Check SME/SEV support in CPUID first
        x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
        x86/setup: Always reserve the first 1M of RAM
        x86/alternative: Optimize single-byte NOPs at an arbitrary position
        x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
        dmaengine: idxd: Use cpu_feature_enabled()
        x86/thermal: Fix LVT thermal setup for SMI delivery mode
        x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
      773ac53b
  2. 05 Jun, 2021 18 commits
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · f5b6eb1e
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Some more bugfixes from I2C for v5.13. Usual stuff"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
        i2c: qcom-geni: Add shutdown callback for i2c
        i2c: tegra-bpmp: Demote kernel-doc abuses
        i2c: altera: Fix formatting issue in struct and demote unworthy kernel-doc headers
      f5b6eb1e
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · e5220dd1
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 patches.
      
        Subsystems affected by this patch series: mips, mm (kfence, debug,
        pagealloc, memory-hotplug, hugetlb, kasan, and hugetlb), init, proc,
        lib, ocfs2, and mailmap"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mailmap: use private address for Michel Lespinasse
        ocfs2: fix data corruption by fallocate
        lib: crc64: fix kernel-doc warning
        mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
        mm/kasan/init.c: fix doc warning
        proc: add .gitignore for proc-subset-pid selftest
        hugetlb: pass head page to remove_hugetlb_page()
        drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
        mm/page_alloc: fix counting of free pages after take off from buddy
        mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
        pid: take a reference when initializing `cad_pid`
        kfence: use TASK_IDLE when awaiting allocation
        Revert "MIPS: make userspace mapping young by default"
      e5220dd1
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · af8d9eb8
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - Build with '-mno-relax' when using LLVM's linker, which doesn't
         support linker relaxation.
      
       - A fix to build without SiFive's errata.
      
       - A fix to use PAs during init_resources()
      
       - A fix to avoid W+X mappings during boot.
      
      * tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: Fix memblock_free() usages in init_resources()
        riscv: skip errata_cip_453.o if CONFIG_ERRATA_SIFIVE_CIP_453 is disabled
        riscv: mm: Fix W+X mappings at boot
        riscv: Use -mno-relax when using lld linker
      af8d9eb8
    • Michel Lespinasse's avatar
    • Junxiao Bi's avatar
      ocfs2: fix data corruption by fallocate · 6bba4471
      Junxiao Bi authored
      When fallocate punches holes out of inode size, if original isize is in
      the middle of last cluster, then the part from isize to the end of the
      cluster will be zeroed with buffer write, at that time isize is not yet
      updated to match the new size, if writeback is kicked in, it will invoke
      ocfs2_writepage()->block_write_full_page() where the pages out of inode
      size will be dropped.  That will cause file corruption.  Fix this by
      zero out eof blocks when extending the inode size.
      
      Running the following command with qemu-image 4.2.1 can get a corrupted
      coverted image file easily.
      
          qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
                   -O qcow2 -o compat=1.1 $qcow_image.conv
      
      The usage of fallocate in qemu is like this, it first punches holes out
      of inode size, then extend the inode size.
      
          fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
          fallocate(11, 0, 2276196352, 65536) = 0
      
      v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
      v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/
      
      Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6bba4471
    • YueHaibing's avatar
      lib: crc64: fix kernel-doc warning · 415f0c83
      YueHaibing authored
      Fix W=1 kernel build warning:
      
        lib/crc64.c:40: warning:
         bad line:         or the previous crc64 value if computing incrementally.
      
      Link: https://lkml.kernel.org/r/20210601135851.15444-1-yuehaibing@huawei.comSigned-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarColy Li <colyli@suse.de>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      415f0c83
    • Mina Almasry's avatar
      mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY · d84cf06e
      Mina Almasry authored
      The userfaultfd hugetlb tests cause a resv_huge_pages underflow.  This
      happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on
      an index for which we already have a page in the cache.  When this
      happens, we allocate a second page, double consuming the reservation,
      and then fail to insert the page into the cache and return -EEXIST.
      
      To fix this, we first check if there is a page in the cache which
      already consumed the reservation, and return -EEXIST immediately if so.
      
      There is still a rare condition where we fail to copy the page contents
      AND race with a call for hugetlb_no_page() for this index and again we
      will underflow resv_huge_pages.  That is fixed in a more complicated
      patch not targeted for -stable.
      
      Test:
      
        Hacked the code locally such that resv_huge_pages underflows produce a
        warning, then:
      
        ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
      	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
        ./tools/testing/selftests/vm/userfaultfd hugetlb 10
      	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
      
      Both tests succeed and produce no warnings.  After the test runs number
      of free/resv hugepages is correct.
      
      [mike.kravetz@oracle.com: changelog fixes]
      
      Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com
      Fixes: 8fb5debc ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d84cf06e
    • Yu Kuai's avatar
      mm/kasan/init.c: fix doc warning · 7b6889f5
      Yu Kuai authored
      Fix gcc W=1 warning:
      
        mm/kasan/init.c:228: warning: Function parameter or member 'shadow_start' not described in 'kasan_populate_early_shadow'
        mm/kasan/init.c:228: warning: Function parameter or member 'shadow_end' not described in 'kasan_populate_early_shadow'
      
      Link: https://lkml.kernel.org/r/20210603140700.3045298-1-yukuai3@huawei.comSigned-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Acked-by: default avatarAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b6889f5
    • David Matlack's avatar
      proc: add .gitignore for proc-subset-pid selftest · 263e88d6
      David Matlack authored
      This new selftest needs an entry in the .gitignore file otherwise git
      will try to track the binary.
      
      Link: https://lkml.kernel.org/r/20210601164305.11776-1-dmatlack@google.com
      Fixes: 268af17a ("selftests: proc: test subset=pid")
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      263e88d6
    • Naoya Horiguchi's avatar
      hugetlb: pass head page to remove_hugetlb_page() · 0c5da357
      Naoya Horiguchi authored
      When memory_failure() or soft_offline_page() is called on a tail page of
      some hugetlb page, "BUG: unable to handle page fault" error can be
      triggered.
      
      remove_hugetlb_page() dereferences page->lru, so it's assumed that the
      page points to a head page, but one of the caller,
      dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
      which could be a tail page.  So pass 'head' to it, instead.
      
      Link: https://lkml.kernel.org/r/20210526235257.2769473-1-nao.horiguchi@gmail.com
      Fixes: 6eb4e88a ("hugetlb: create remove_hugetlb_page() to separate functionality")
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0c5da357
    • David Hildenbrand's avatar
      drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64 · 92813053
      David Hildenbrand authored
      offline_pages() properly checks for memory holes and bails out.
      However, we do a page_zone(pfn_to_page(start_pfn)) before calling
      offline_pages() when offlining a memory block.
      
      We should not unconditionally call page_zone(pfn_to_page(start_pfn)) on
      aarch64 in offlining code, otherwise we can trigger a BUG when hitting a
      memory hole:
      
         kernel BUG at include/linux/mm.h:1383!
         Internal error: Oops - BUG: 0 [#1] SMP
         Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb nvme i2c_algo_bit mlx5_core i2c_core nvme_core firmware_class
         CPU: 13 PID: 1694 Comm: ranbug Not tainted 5.12.0-next-20210524+ #4
         Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
         pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
         pc : memory_subsys_offline+0x1f8/0x250
         lr : memory_subsys_offline+0x1f8/0x250
         Call trace:
           memory_subsys_offline+0x1f8/0x250
           device_offline+0x154/0x1d8
           online_store+0xa4/0x118
           dev_attr_store+0x44/0x78
           sysfs_kf_write+0xe8/0x138
           kernfs_fop_write_iter+0x26c/0x3d0
           new_sync_write+0x2bc/0x4f8
           vfs_write+0x718/0xc88
           ksys_write+0xf8/0x1e0
           __arm64_sys_write+0x74/0xa8
           invoke_syscall.constprop.0+0x78/0x1e8
           do_el0_svc+0xe4/0x298
           el0_svc+0x20/0x30
           el0_sync_handler+0xb0/0xb8
           el0_sync+0x178/0x180
         Kernel panic - not syncing: Oops - BUG: Fatal exception
         SMP: stopping secondary CPUs
         Kernel Offset: disabled
         CPU features: 0x00000251,20000846
         Memory Limit: none
      
      If nr_vmemmap_pages is set, we know that we are dealing with hotplugged
      memory that doesn't have any holes.  So call
      page_zone(pfn_to_page(start_pfn)) only when really necessary -- when
      nr_vmemmap_pages is set and we actually adjust the present pages.
      
      Link: https://lkml.kernel.org/r/20210526075226.5572-1-david@redhat.com
      Fixes: a08a2ae3 ("mm,memory_hotplug: allocate memmap from the added memory range")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarQian Cai (QUIC) <quic_qiancai@quicinc.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92813053
    • Ding Hui's avatar
      mm/page_alloc: fix counting of free pages after take off from buddy · bac9c6fa
      Ding Hui authored
      Recently we found that there is a lot MemFree left in /proc/meminfo
      after do a lot of pages soft offline, it's not quite correct.
      
      Before Oscar's rework of soft offline for free pages [1], if we soft
      offline free pages, these pages are left in buddy with HWPoison flag,
      and NR_FREE_PAGES is not updated immediately.  So the difference between
      NR_FREE_PAGES and real number of available free pages is also even big
      at the beginning.
      
      However, with the workload running, when we catch HWPoison page in any
      alloc functions subsequently, we will remove it from buddy, meanwhile
      update the NR_FREE_PAGES and try again, so the NR_FREE_PAGES will get
      more and more closer to the real number of available free pages.
      (regardless of unpoison_memory())
      
      Now, for offline free pages, after a successful call
      take_page_off_buddy(), the page is no longer belong to buddy allocator,
      and will not be used any more, but we missed accounting NR_FREE_PAGES in
      this situation, and there is no chance to be updated later.
      
      Do update in take_page_off_buddy() like rmqueue() does, but avoid double
      counting if some one already set_migratetype_isolate() on the page.
      
      [1]: commit 06be6ff3 ("mm,hwpoison: rework soft offline for free pages")
      
      Link: https://lkml.kernel.org/r/20210526075247.11130-1-dinghui@sangfor.com.cn
      Fixes: 06be6ff3 ("mm,hwpoison: rework soft offline for free pages")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Suggested-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bac9c6fa
    • Gerald Schaefer's avatar
      mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() · 04f7ce3f
      Gerald Schaefer authored
      In pmd/pud_advanced_tests(), the vaddr is aligned up to the next pmd/pud
      entry, and so it does not match the given pmdp/pudp and (aligned down)
      pfn any more.
      
      For s390, this results in memory corruption, because the IDTE
      instruction used e.g.  in xxx_get_and_clear() will take the vaddr for
      some calculations, in combination with the given pmdp.  It will then end
      up with a wrong table origin, ending on ...ff8, and some of those
      wrongly set low-order bits will also select a wrong pagetable level for
      the index addition.  IDTE could therefore invalidate (or 0x20) something
      outside of the page tables, depending on the wrongly picked index, which
      in turn depends on the random vaddr.
      
      As result, we sometimes see "BUG task_struct (Not tainted): Padding
      overwritten" on s390, where one 0x5a padding value got overwritten with
      0x7a.
      
      Fix this by aligning down, similar to how the pmd/pud_aligned pfns are
      calculated.
      
      Link: https://lkml.kernel.org/r/20210525130043.186290-2-gerald.schaefer@linux.ibm.com
      Fixes: a5c3b9ff ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: <stable@vger.kernel.org>	[5.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04f7ce3f
    • Mark Rutland's avatar
      pid: take a reference when initializing `cad_pid` · 0711f0d7
      Mark Rutland authored
      During boot, kernel_init_freeable() initializes `cad_pid` to the init
      task's struct pid.  Later on, we may change `cad_pid` via a sysctl, and
      when this happens proc_do_cad_pid() will increment the refcount on the
      new pid via get_pid(), and will decrement the refcount on the old pid
      via put_pid().  As we never called get_pid() when we initialized
      `cad_pid`, we decrement a reference we never incremented, can therefore
      free the init task's struct pid early.  As there can be dangling
      references to the struct pid, we can later encounter a use-after-free
      (e.g.  when delivering signals).
      
      This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to
      have been around since the conversion of `cad_pid` to struct pid in
      commit 9ec52099 ("[PATCH] replace cad_pid by a struct pid") from the
      pre-KASAN stone age of v2.6.19.
      
      Fix this by getting a reference to the init task's struct pid when we
      assign it to `cad_pid`.
      
      Full KASAN splat below.
      
         ==================================================================
         BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline]
         BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
         Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273
      
         CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1
         Hardware name: linux,dummy-virt (DT)
         Call trace:
          ns_of_pid include/linux/pid.h:153 [inline]
          task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
          do_notify_parent+0x308/0xe60 kernel/signal.c:1950
          exit_notify kernel/exit.c:682 [inline]
          do_exit+0x2334/0x2bd0 kernel/exit.c:845
          do_group_exit+0x108/0x2c8 kernel/exit.c:922
          get_signal+0x4e4/0x2a88 kernel/signal.c:2781
          do_signal arch/arm64/kernel/signal.c:882 [inline]
          do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936
          work_pending+0xc/0x2dc
      
         Allocated by task 0:
          slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516
          slab_alloc_node mm/slub.c:2907 [inline]
          slab_alloc mm/slub.c:2915 [inline]
          kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920
          alloc_pid+0xdc/0xc00 kernel/pid.c:180
          copy_process+0x2794/0x5e18 kernel/fork.c:2129
          kernel_clone+0x194/0x13c8 kernel/fork.c:2500
          kernel_thread+0xd4/0x110 kernel/fork.c:2552
          rest_init+0x44/0x4a0 init/main.c:687
          arch_call_rest_init+0x1c/0x28
          start_kernel+0x520/0x554 init/main.c:1064
          0x0
      
         Freed by task 270:
          slab_free_hook mm/slub.c:1562 [inline]
          slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600
          slab_free mm/slub.c:3161 [inline]
          kmem_cache_free+0x224/0x8e0 mm/slub.c:3177
          put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114
          put_pid+0x30/0x48 kernel/pid.c:109
          proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401
          proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591
          proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617
          call_write_iter include/linux/fs.h:1977 [inline]
          new_sync_write+0x3ac/0x510 fs/read_write.c:518
          vfs_write fs/read_write.c:605 [inline]
          vfs_write+0x9c4/0x1018 fs/read_write.c:585
          ksys_write+0x124/0x240 fs/read_write.c:658
          __do_sys_write fs/read_write.c:670 [inline]
          __se_sys_write fs/read_write.c:667 [inline]
          __arm64_sys_write+0x78/0xb0 fs/read_write.c:667
          __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
          invoke_syscall arch/arm64/kernel/syscall.c:49 [inline]
          el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129
          do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168
          el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416
          el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432
          el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701
      
         The buggy address belongs to the object at ffff23794dda0000
          which belongs to the cache pid of size 224
         The buggy address is located 4 bytes inside of
          224-byte region [ffff23794dda0000, ffff23794dda00e0)
         The buggy address belongs to the page:
         page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0
         head:(____ptrval____) order:1 compound_mapcount:0
         flags: 0x3fffc0000010200(slab|head)
         raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080
         raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
         page dumped because: kasan: bad access detected
      
         Memory state around the buggy address:
          ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
         >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
          ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
          ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
         ==================================================================
      
      Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com
      Fixes: 9ec52099 ("[PATCH] replace cad_pid by a struct pid")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0711f0d7
    • Marco Elver's avatar
      kfence: use TASK_IDLE when awaiting allocation · 8fd0e995
      Marco Elver authored
      Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
      allocation counts towards load.  However, for KFENCE, this does not make
      any sense, since there is no busy work we're awaiting.
      
      Instead, use TASK_IDLE via wait_event_idle() to not count towards load.
      
      BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565
      Link: https://lkml.kernel.org/r/20210521083209.3740269-1-elver@google.com
      Fixes: 407f1d8c ("kfence: await for allocation using wait_event")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: <stable@vger.kernel.org>	[5.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fd0e995
    • Thomas Bogendoerfer's avatar
      Revert "MIPS: make userspace mapping young by default" · 50c25ee9
      Thomas Bogendoerfer authored
      This reverts commit f685a533.
      
      The MIPS cache flush logic needs to know whether the mapping was already
      established to decide how to flush caches.  This is done by checking the
      valid bit in the PTE.  The commit above breaks this logic by setting the
      valid in the PTE in new mappings, which causes kernel crashes.
      
      Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de
      Fixes: f685a533 ("MIPS: make userspace mapping young by default")
      Reported-by: default avatarZhou Yanjie <zhouyanjie@wanyeetech.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Huang Pei <huangpei@loongson.cn>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50c25ee9
    • Linus Torvalds's avatar
      Merge tag 'net-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9d32fa5d
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from bpf, wireless, netfilter and
        wireguard trees.
      
        The bpf vs lockdown+audit fix is the most notable.
      
        Things haven't slowed down just yet, both in terms of regressions in
        current release and largish fixes for older code, but we usually see a
        slowdown only after -rc5.
      
        Current release - regressions:
      
         - virtio-net: fix page faults and crashes when XDP is enabled
      
         - mlx5e: fix HW timestamping with CQE compression, and make sure they
           are only allowed to coexist with capable devices
      
         - stmmac:
            - fix kernel panic due to NULL pointer dereference of
              mdio_bus_data
            - fix double clk unprepare when no PHY device is connected
      
        Current release - new code bugs:
      
         - mt76: a few fixes for the recent MT7921 devices and runtime power
           management
      
        Previous releases - regressions:
      
         - ice:
            - track AF_XDP ZC enabled queues in bitmap to fix copy mode Tx
            - fix allowing VF to request more/less queues via virtchnl
            - correct supported and advertised autoneg by using PHY
              capabilities
            - allow all LLDP packets from PF to Tx
      
         - kbuild: quote OBJCOPY var to avoid a pahole call break the build
      
        Previous releases - always broken:
      
         - bpf, lockdown, audit: fix buggy SELinux lockdown permission checks
      
         - mt76: address the recent FragAttack vulnerabilities not covered by
           generic fixes
      
         - ipv6: fix KASAN: slab-out-of-bounds Read in
           fib6_nh_flush_exceptions
      
         - Bluetooth:
            - fix the erroneous flush_work() order, to avoid double free
            - use correct lock to prevent UAF of hdev object
      
         - nfc: fix NULL ptr dereference in llcp_sock_getname() after failed
           connect
      
         - ieee802154: multiple fixes to error checking and return values
      
         - igb: fix XDP with PTP enabled
      
         - intel: add correct exception tracing for XDP
      
         - tls: fix use-after-free when TLS offload device goes down and back
           up
      
         - ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
      
         - netfilter: nft_ct: skip expectations for confirmed conntrack
      
         - mptcp: fix falling back to TCP in presence of out of order packets
           early in connection lifetime
      
         - wireguard: switch from O(n) to a O(1) algorithm for maintaining
           peers, fixing stalls and a large memory leak in the process
      
        Misc:
      
         - devlink: correct VIRTUAL port to not have phys_port attributes
      
         - Bluetooth: fix VIRTIO_ID_BT assigned number
      
         - net: return the correct errno code ENOBUF -> ENOMEM
      
         - wireguard:
            - peer: allocate in kmem_cache saving 25% on peer memory
            - do not use -O3"
      
      * tag 'net-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
        cxgb4: avoid link re-train during TC-MQPRIO configuration
        sch_htb: fix refcount leak in htb_parent_to_leaf_offload
        wireguard: allowedips: free empty intermediate nodes when removing single node
        wireguard: allowedips: allocate nodes in kmem_cache
        wireguard: allowedips: remove nodes in O(1)
        wireguard: allowedips: initialize list head in selftest
        wireguard: peer: allocate in kmem_cache
        wireguard: use synchronize_net rather than synchronize_rcu
        wireguard: do not use -O3
        wireguard: selftests: make sure rp_filter is disabled on vethc
        wireguard: selftests: remove old conntrack kconfig value
        virtchnl: Add missing padding to virtchnl_proto_hdrs
        ice: Allow all LLDP packets from PF to Tx
        ice: report supported and advertised autoneg using PHY capabilities
        ice: handle the VF VSI rebuild failure
        ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
        ice: Fix allowing VF to request more/less queues via virtchnl
        virtio-net: fix for skb_over_panic inside big mode
        ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
        fib: Return the correct errno code
        ...
      9d32fa5d
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.13-2021-06-04' of... · 2cb26c15
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.13-2021-06-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix NULL pointer dereference in 'perf probe' when handling
         DW_AT_const_value when looking for a variable, which is valid.
      
       - Fix for capability querying of perf_event_attr.cgroup support in
         older kernels.
      
       - Add missing cloning of evsel->use_config_name.
      
       - Honor event config name on --no-merge in 'perf stat'.
      
       - Fix some memory leaks found using ASAN.
      
       - Fix the perf entry for perf_event_attr setup with make LIBPFM4=1 on
         s390 z/VM.
      
       - Update MIPS UAPI perf_regs.h file.
      
       - Fix 'perf stat' BPF counter load return check.
      
      * tag 'perf-tools-fixes-for-v5.13-2021-06-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf env: Fix memory leak of bpf_prog_info_linear member
        perf symbol-elf: Fix memory leak by freeing sdt_note.args
        perf stat: Honor event config name on --no-merge
        perf evsel: Add missing cloning of evsel->use_config_name
        perf test: Test 17 fails with make LIBPFM4=1 on s390 z/VM
        perf stat: Fix error return code in bperf__load()
        perf record: Move probing cgroup sampling support
        perf probe: Fix NULL pointer dereference in convert_variable_location()
        perf tools: Copy uapi/asm/perf_regs.h from the kernel for MIPS
      2cb26c15
  3. 04 Jun, 2021 21 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · ff609107
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - Fix MSIs for platforms with "msi-map" device-tree property, which we
         broke in v5.13-rc1 (Jean-Philippe Brucker)
      
       - Add Krzysztof Wilczyński as PCI reviewer (Lorenzo Pieralisi)
      
      * tag 'pci-v5.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/MSI: Fix MSIs for generic hosts that use device-tree's "msi-map"
        MAINTAINERS: Add Krzysztof as PCI host/endpoint controllers reviewer
      ff609107
    • Rahul Lakkireddy's avatar
      cxgb4: avoid link re-train during TC-MQPRIO configuration · 3822d067
      Rahul Lakkireddy authored
      When configuring TC-MQPRIO offload, only turn off netdev carrier and
      don't bring physical link down in hardware. Otherwise, when the
      physical link is brought up again after configuration, it gets
      re-trained and stalls ongoing traffic.
      
      Also, when firmware is no longer accessible or crashed, avoid sending
      FLOWC and waiting for reply that will never come.
      
      Fix following hung_task_timeout_secs trace seen in these cases.
      
      INFO: task tc:20807 blocked for more than 122 seconds.
            Tainted: G S                5.13.0-rc3+ #122
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:tc   state:D stack:14768 pid:20807 ppid: 19366 flags:0x00000000
      Call Trace:
       __schedule+0x27b/0x6a0
       schedule+0x37/0xa0
       schedule_preempt_disabled+0x5/0x10
       __mutex_lock.isra.14+0x2a0/0x4a0
       ? netlink_lookup+0x120/0x1a0
       ? rtnl_fill_ifinfo+0x10f0/0x10f0
       __netlink_dump_start+0x70/0x250
       rtnetlink_rcv_msg+0x28b/0x380
       ? rtnl_fill_ifinfo+0x10f0/0x10f0
       ? rtnl_calcit.isra.42+0x120/0x120
       netlink_rcv_skb+0x4b/0xf0
       netlink_unicast+0x1a0/0x280
       netlink_sendmsg+0x216/0x440
       sock_sendmsg+0x56/0x60
       __sys_sendto+0xe9/0x150
       ? handle_mm_fault+0x6d/0x1b0
       ? do_user_addr_fault+0x1c5/0x620
       __x64_sys_sendto+0x1f/0x30
       do_syscall_64+0x3c/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f7f73218321
      RSP: 002b:00007ffd19626208 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 000055b7c0a8b240 RCX: 00007f7f73218321
      RDX: 0000000000000028 RSI: 00007ffd19626210 RDI: 0000000000000003
      RBP: 000055b7c08680ff R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000055b7c085f5f6
      R13: 000055b7c085f60a R14: 00007ffd19636470 R15: 00007ffd196262a0
      
      Fixes: b1396c2b ("cxgb4: parse and configure TC-MQPRIO offload")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3822d067
    • Yunjian Wang's avatar
      sch_htb: fix refcount leak in htb_parent_to_leaf_offload · 944d671d
      Yunjian Wang authored
      The commit ae81feb7 ("sch_htb: fix null pointer dereference
      on a null new_q") fixes a NULL pointer dereference bug, but it
      is not correct.
      
      Because htb_graft_helper properly handles the case when new_q
      is NULL, and after the previous patch by skipping this call
      which creates an inconsistency : dev_queue->qdisc will still
      point to the old qdisc, but cl->parent->leaf.q will point to
      the new one (which will be noop_qdisc, because new_q was NULL).
      The code is based on an assumption that these two pointers are
      the same, so it can lead to refcount leaks.
      
      The correct fix is to add a NULL pointer check to protect
      qdisc_refcount_inc inside htb_parent_to_leaf_offload.
      
      Fixes: ae81feb7 ("sch_htb: fix null pointer dereference on a null new_q")
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Suggested-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      944d671d
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 26821ecd
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-06-04
      
      This series contains updates to virtchnl header file and ice driver.
      
      Brett fixes VF being unable to request a different number of queues then
      allocated and adds clearing of VF_MBX_ATQLEN register for VF reset.
      
      Haiyue handles error of rebuilding VF VSI during reset.
      
      Paul fixes reporting of autoneg to use the PHY capabilities.
      
      Dave allows LLDP packets without priority of TC_PRIO_CONTROL to be
      transmitted.
      
      Geert Uytterhoeven adds explicit padding to virtchnl_proto_hdrs
      structure in the virtchnl header file.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26821ecd
    • David S. Miller's avatar
      Merge branch 'wireguard-fixes' · 6fd815bb
      David S. Miller authored
      Jason A. Donenfeld says:
      
      ====================
      wireguard fixes for 5.13-rc5
      
      Here are bug fixes to WireGuard for 5.13-rc5:
      
      1-2,6) These are small, trivial tweaks to our test harness.
      
      3) Linus thinks -O3 is still dangerous to enable. The code gen wasn't so
         much different with -O2 either.
      
      4) We were accidentally calling synchronize_rcu instead of
         synchronize_net while holding the rtnl_lock, resulting in some rather
         large stalls that hit production machines.
      
      5) Peer allocation was wasting literally hundreds of megabytes on real
         world deployments, due to oddly sized large objects not fitting
         nicely into a kmalloc slab.
      
      7-9) We move from an insanely expensive O(n) algorithm to a fast O(1)
           algorithm, and cleanup a massive memory leak in the process, in
           which allowed ips churn would leave danging nodes hanging around
           without cleanup until the interface was removed. The O(1) algorithm
           eliminates packet stalls and high latency issues, in addition to
           bringing operations that took as much as 10 minutes down to less
           than a second.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fd815bb
    • Jason A. Donenfeld's avatar
      wireguard: allowedips: free empty intermediate nodes when removing single node · bf7b042d
      Jason A. Donenfeld authored
      When removing single nodes, it's possible that that node's parent is an
      empty intermediate node, in which case, it too should be removed.
      Otherwise the trie fills up and never is fully emptied, leading to
      gradual memory leaks over time for tries that are modified often. There
      was originally code to do this, but was removed during refactoring in
      2016 and never reworked. Now that we have proper parent pointers from
      the previous commits, we can implement this properly.
      
      In order to reduce branching and expensive comparisons, we want to keep
      the double pointer for parent assignment (which lets us easily chain up
      to the root), but we still need to actually get the parent's base
      address. So encode the bit number into the last two bits of the pointer,
      and pack and unpack it as needed. This is a little bit clumsy but is the
      fastest and less memory wasteful of the compromises. Note that we align
      the root struct here to a minimum of 4, because it's embedded into a
      larger struct, and we're relying on having the bottom two bits for our
      flag, which would only be 16-bit aligned on m68k.
      
      The existing macro-based helpers were a bit unwieldy for adding the bit
      packing to, so this commit replaces them with safer and clearer ordinary
      functions.
      
      We add a test to the randomized/fuzzer part of the selftests, to free
      the randomized tries by-peer, refuzz it, and repeat, until it's supposed
      to be empty, and then then see if that actually resulted in the whole
      thing being emptied. That combined with kmemcheck should hopefully make
      sure this commit is doing what it should. Along the way this resulted in
      various other cleanups of the tests and fixes for recent graphviz.
      
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf7b042d
    • Jason A. Donenfeld's avatar
      wireguard: allowedips: allocate nodes in kmem_cache · dc680de2
      Jason A. Donenfeld authored
      The previous commit moved from O(n) to O(1) for removal, but in the
      process introduced an additional pointer member to a struct that
      increased the size from 60 to 68 bytes, putting nodes in the 128-byte
      slab. With deployed systems having as many as 2 million nodes, this
      represents a significant doubling in memory usage (128 MiB -> 256 MiB).
      Fix this by using our own kmem_cache, that's sized exactly right. This
      also makes wireguard's memory usage more transparent in tools like
      slabtop and /proc/slabinfo.
      
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc680de2
    • Jason A. Donenfeld's avatar
      wireguard: allowedips: remove nodes in O(1) · f634f418
      Jason A. Donenfeld authored
      Previously, deleting peers would require traversing the entire trie in
      order to rebalance nodes and safely free them. This meant that removing
      1000 peers from a trie with a half million nodes would take an extremely
      long time, during which we're holding the rtnl lock. Large-scale users
      were reporting 200ms latencies added to the networking stack as a whole
      every time their userspace software would queue up significant removals.
      That's a serious situation.
      
      This commit fixes that by maintaining a double pointer to the parent's
      bit pointer for each node, and then using the already existing node list
      belonging to each peer to go directly to the node, fix up its pointers,
      and free it with RCU. This means removal is O(1) instead of O(n), and we
      don't use gobs of stack.
      
      The removal algorithm has the same downside as the code that it fixes:
      it won't collapse needlessly long runs of fillers.  We can enhance that
      in the future if it ever becomes a problem. This commit documents that
      limitation with a TODO comment in code, a small but meaningful
      improvement over the prior situation.
      
      Currently the biggest flaw, which the next commit addresses, is that
      because this increases the node size on 64-bit machines from 60 bytes to
      68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
      using twice as much memory per node, because of power-of-two
      allocations, which is a big bummer. We'll need to figure something out
      there.
      
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f634f418
    • Jason A. Donenfeld's avatar
      wireguard: allowedips: initialize list head in selftest · 46cfe8ee
      Jason A. Donenfeld authored
      The randomized trie tests weren't initializing the dummy peer list head,
      resulting in a NULL pointer dereference when used. Fix this by
      initializing it in the randomized trie test, just like we do for the
      static unit test.
      
      While we're at it, all of the other strings like this have the word
      "self-test", so add it to the missing place here.
      
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46cfe8ee
    • Jason A. Donenfeld's avatar
      wireguard: peer: allocate in kmem_cache · a4e9f8e3
      Jason A. Donenfeld authored
      With deployments having upwards of 600k peers now, this somewhat heavy
      structure could benefit from more fine-grained allocations.
      Specifically, instead of using a 2048-byte slab for a 1544-byte object,
      we can now use 1544-byte objects directly, thus saving almost 25%
      per-peer, or with 600k peers, that's a savings of 303 MiB. This also
      makes wireguard's memory usage more transparent in tools like slabtop
      and /proc/slabinfo.
      
      Fixes: 8b5553ac ("wireguard: queueing: get rid of per-peer ring buffers")
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4e9f8e3
    • Jason A. Donenfeld's avatar
      wireguard: use synchronize_net rather than synchronize_rcu · 24b70eee
      Jason A. Donenfeld authored
      Many of the synchronization points are sometimes called under the rtnl
      lock, which means we should use synchronize_net rather than
      synchronize_rcu. Under the hood, this expands to using the expedited
      flavor of function in the event that rtnl is held, in order to not stall
      other concurrent changes.
      
      This fixes some very, very long delays when removing multiple peers at
      once, which would cause some operations to take several minutes.
      
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24b70eee
    • Jason A. Donenfeld's avatar
      wireguard: do not use -O3 · cc5060ca
      Jason A. Donenfeld authored
      Apparently, various versions of gcc have O3-related miscompiles. Looking
      at the difference between -O2 and -O3 for gcc 11 doesn't indicate
      miscompiles, but the difference also doesn't seem so significant for
      performance that it's worth risking.
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com/
      Link: https://lore.kernel.org/lkml/CAHmME9otB5Wwxp7H8bR_i2uH2esEMvoBMC8uEXBMH9p0q1s6Bw@mail.gmail.com/Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc5060ca
    • Jason A. Donenfeld's avatar
      wireguard: selftests: make sure rp_filter is disabled on vethc · f8873d11
      Jason A. Donenfeld authored
      Some distros may enable strict rp_filter by default, which will prevent
      vethc from receiving the packets with an unrouteable reverse path address.
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8873d11
    • Jason A. Donenfeld's avatar
      wireguard: selftests: remove old conntrack kconfig value · acf2492b
      Jason A. Donenfeld authored
      On recent kernels, this config symbol is no longer used.
      Reported-by: default avatarRui Salvaterra <rsalvaterra@gmail.com>
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acf2492b
    • Roja Rani Yarubandi's avatar
      i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops · 57648e86
      Roja Rani Yarubandi authored
      Mark bus as suspended during system suspend to block the future
      transfers. Implement geni_i2c_resume_noirq() to resume the bus.
      
      Fixes: 37692de5 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
      Signed-off-by: default avatarRoja Rani Yarubandi <rojay@codeaurora.org>
      Reviewed-by: default avatarStephen Boyd <swboyd@chromium.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      57648e86
    • Roja Rani Yarubandi's avatar
      i2c: qcom-geni: Add shutdown callback for i2c · 9f78c607
      Roja Rani Yarubandi authored
      If the hardware is still accessing memory after SMMU translation
      is disabled (as part of smmu shutdown callback), then the
      IOVAs (I/O virtual address) which it was using will go on the bus
      as the physical addresses which will result in unknown crashes
      like NoC/interconnect errors.
      
      So, implement shutdown callback for i2c driver to suspend the bus
      during system "reboot" or "shutdown".
      
      Fixes: 37692de5 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
      Signed-off-by: default avatarRoja Rani Yarubandi <rojay@codeaurora.org>
      Reviewed-by: default avatarStephen Boyd <swboyd@chromium.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      9f78c607
    • Linus Torvalds's avatar
      Merge tag 'sound-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 16f0596f
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A couple of small fixes are found in the ALSA core side at this time;
        a fix in the new LED handling code and a long-standing (and likely no
        one would notice) ioctl bug.
      
        The rest are usual HD-audio fixes, mostly device-specific quirks but
        also one major regression fix that was introduced in 5.13"
      
      * tag 'sound-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda: update the power_state during the direct-complete
        ALSA: timer: Fix master timer notification
        ALSA: control led: fix memory leak in snd_ctl_led_register
        ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
        ALSA: hda/cirrus: Set Initial DMIC volume to -26 dB
        ALSA: hda: Fix a regression in Capture Switch mixer read
        ALSA: hda: Add AlderLake-M PCI ID
      16f0596f
    • Pu Wen's avatar
      x86/sev: Check SME/SEV support in CPUID first · 009767db
      Pu Wen authored
      The first two bits of the CPUID leaf 0x8000001F EAX indicate whether SEV
      or SME is supported, respectively. It's better to check whether SEV or
      SME is actually supported before accessing the MSR_AMD64_SEV to check
      whether SEV or SME is enabled.
      
      This is both a bare-metal issue and a guest/VM issue. Since the first
      generation Hygon Dhyana CPU doesn't support the MSR_AMD64_SEV, reading that
      MSR results in a #GP - either directly from hardware in the bare-metal
      case or via the hypervisor (because the RDMSR is actually intercepted)
      in the guest/VM case, resulting in a failed boot. And since this is very
      early in the boot phase, rdmsrl_safe()/native_read_msr_safe() can't be
      used.
      
      So check the CPUID bits first, before accessing the MSR.
      
       [ tlendacky: Expand and improve commit message. ]
       [ bp: Massage commit message. ]
      
      Fixes: eab696d8 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests")
      Signed-off-by: default avatarPu Wen <puwen@hygon.cn>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Cc: <stable@vger.kernel.org> # v5.10+
      Link: https://lkml.kernel.org/r/20210602070207.2480-1-puwen@hygon.cn
      009767db
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2021-06-04-1' of git://anongit.freedesktop.org/drm/drm · 3a3c5ab3
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Two big regression reverts in here, one for fbdev and one i915.
        Otherwise it's mostly amdgpu display fixes, and tegra fixes.
      
        fb:
         - revert broken fb_defio patch
      
        amdgpu:
         - Display fixes
         - FRU EEPROM error handling fix
         - RAS fix
         - PSP fix
         - Releasing pinned BO fix
      
        i915:
         - Revert conversion to io_mapping_map_user() which lead to BUG_ON()
         - Fix check for error valued returns in a selftest
      
        tegra:
         - SOR power domain race condition fix
         - build warning fix
         - runtime pm ref leak fix
         - modifier fix"
      
      * tag 'drm-fixes-2021-06-04-1' of git://anongit.freedesktop.org/drm/drm:
        amd/display: convert DRM_DEBUG_ATOMIC to drm_dbg_atomic
        drm/amdgpu: make sure we unpin the UVD BO
        drm/amd/amdgpu:save psp ring wptr to avoid attack
        drm/amd/display: Fix potential memory leak in DMUB hw_init
        drm/amdgpu: Don't query CE and UE errors
        drm/amd/display: Fix overlay validation by considering cursors
        drm/amdgpu: refine amdgpu_fru_get_product_info
        drm/amdgpu: add judgement for dc support
        drm/amd/display: Fix GPU scaling regression by FS video support
        drm/amd/display: Allow bandwidth validation for 0 streams.
        Revert "i915: use io_mapping_map_user"
        drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
        Revert "fb_defio: Remove custom address_space_operations"
        drm/tegra: Correct DRM_FORMAT_MOD_NVIDIA_SECTOR_LAYOUT
        drm/tegra: sor: Fix AUX device reference leak
        drm/tegra: Get ref for DP AUX channel, not its ddc adapter
        drm/tegra: Fix shift overflow in tegra_shared_plane_atomic_update
        drm/tegra: sor: Fully initialize SOR before registration
        gpu: host1x: Split up client initalization and registration
        drm/tegra: sor: Do not leak runtime PM reference
      3a3c5ab3
    • Geert Uytterhoeven's avatar
      virtchnl: Add missing padding to virtchnl_proto_hdrs · 519d8ab1
      Geert Uytterhoeven authored
      On m68k (Coldfire M547x):
      
            CC      drivers/net/ethernet/intel/i40e/i40e_main.o
          In file included from drivers/net/ethernet/intel/i40e/i40e_prototype.h:9,
      		     from drivers/net/ethernet/intel/i40e/i40e.h:41,
      		     from drivers/net/ethernet/intel/i40e/i40e_main.c:12:
          include/linux/avf/virtchnl.h:153:36: warning: division by zero [-Wdiv-by-zero]
            153 |  { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) }
      	  |                                    ^
          include/linux/avf/virtchnl.h:844:1: note: in expansion of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’
            844 | VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
      	  | ^~~~~~~~~~~~~~~~~~~~~~~~~
          include/linux/avf/virtchnl.h:844:33: error: enumerator value for ‘virtchnl_static_assert_virtchnl_proto_hdrs’ is not an integer constant
            844 | VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
      	  |                                 ^~~~~~~~~~~~~~~~~~~
      
      On m68k, integers are aligned on addresses that are multiples of two,
      not four, bytes.  Hence the size of a structure containing integers may
      not be divisible by 4.
      
      Fix this by adding explicit padding.
      
      Fixes: 1f7ea1cd ("ice: Enable FDIR Configure for AVF")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      519d8ab1
    • Dave Ertman's avatar
      ice: Allow all LLDP packets from PF to Tx · f9f83202
      Dave Ertman authored
      Currently in the ice driver, the check whether to
      allow a LLDP packet to egress the interface from the
      PF_VSI is being based on the SKB's priority field.
      It checks to see if the packets priority is equal to
      TC_PRIO_CONTROL.  Injected LLDP packets do not always
      meet this condition.
      
      SCAPY defaults to a sk_buff->protocol value of ETH_P_ALL
      (0x0003) and does not set the priority field.  There will
      be other injection methods (even ones used by end users)
      that will not correctly configure the socket so that
      SKB fields are correctly populated.
      
      Then ethernet header has to have to correct value for
      the protocol though.
      
      Add a check to also allow packets whose ethhdr->h_proto
      matches ETH_P_LLDP (0x88CC).
      
      Fixes: 0c3a6101 ("ice: Allow egress control packets from PF_VSI")
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f9f83202