1. 22 Apr, 2022 24 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 7200095f
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "There's no real pattern to the fixes, but the main one fixes our
        pmd_leaf() definition to resolve a NULL dereference on the migration
        path.
      
         - Fix PMU event validation in the absence of any event counters
      
         - Fix allmodconfig build using clang in conjunction with binutils
      
         - Fix definitions of pXd_leaf() to handle PROT_NONE entries
      
         - More typo fixes"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: mm: fix p?d_leaf()
        arm64: fix typos in comments
        arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
        arm_pmu: Validate single/group leader events
      7200095f
    • Linus Torvalds's avatar
      Merge tag 'xarray-5.18a' of git://git.infradead.org/users/willy/xarray · 22f19f67
      Linus Torvalds authored
      Pull xarray fixes from Matthew Wilcox:
       "Syzbot found a nasty race between large page splitting and page
        lookup. Details in the commit log, but fortunately it has a reliable
        reproducer. I thought it better to send this one to you straight away.
      
        Also fix the test suite build for kmem_cache_alloc_lru()"
      
      * tag 'xarray-5.18a' of git://git.infradead.org/users/willy/xarray:
        XArray: Disallow sibling entries of nodes
        tools: Add kmem_cache_alloc_lru()
      22f19f67
    • Linus Torvalds's avatar
      Merge tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 88c5060d
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Four fixes, two of them for stable:
      
         - fcollapse fix
      
         - reconnect lock fix
      
         - DFS oops fix
      
         - minor cleanup patch"
      
      * tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: destage any unwritten data to the server before calling copychunk_write
        cifs: use correct lock type in cifs_reconnect()
        cifs: fix NULL ptr dereference in refresh_mounts()
        cifs: Use kzalloc instead of kmalloc/memset
      88c5060d
    • Linus Torvalds's avatar
      Merge tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 279b83c6
      Linus Torvalds authored
      Pull mount_setattr fix from Christian Brauner:
       "The recent cleanup in e257039f ("mount_setattr(): clean the
        control flow and calling conventions") switched the mount attribute
        codepaths from do-while to for loops as they are more idiomatic when
        walking mounts.
      
        However, we did originally choose do-while constructs because if we
        request a mount or mount tree to be made read-only we need to hold
        writers in the following way: The mount attribute code will grab
        lock_mount_hash() and then call mnt_hold_writers() which will
        _unconditionally_ set MNT_WRITE_HOLD on the mount.
      
        Any callers that need write access have to call mnt_want_write(). They
        will immediately see that MNT_WRITE_HOLD is set on the mount and the
        caller will then either spin (on non-preempt-rt) or wait on
        lock_mount_hash() (on preempt-rt).
      
        The fact that MNT_WRITE_HOLD is set unconditionally means that once
        mnt_hold_writers() returns we need to _always_ pair it with
        mnt_unhold_writers() in both the failure and success paths.
      
        The do-while constructs did take care of this. But Al's change to a
        for loop in the failure path stops on the first mount we failed to
        change mount attributes _without_ going into the loop to call
        mnt_unhold_writers().
      
        This in turn means that once we failed to make a mount read-only via
        mount_setattr() - i.e. there are already writers on that mount - we
        will block any writers indefinitely. Fix this by ensuring that the for
        loop always unsets MNT_WRITE_HOLD including the first mount we failed
        to change to read-only. Also sprinkle a few comments into the cleanup
        code to remind people about what is happening including myself. After
        all, I didn't catch it during review.
      
        This is only relevant on mainline and was reported by syzbot. Details
        about the syzbot reports are all in the commit message"
      
      * tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        fs: unset MNT_WRITE_HOLD on failure
      279b83c6
    • Linus Torvalds's avatar
      Merge tag 'sound-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 2d230968
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "At this time, the majority of changes are for pending ASoC fixes while
        a few usual HD-audio and USB-audio quirks are found.
      
        Almost all patches are small device-specific fixes, and nothing
        worrisome stands out, so far"
      
      * tag 'sound-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (37 commits)
        ALSA: hda/realtek: Add quirk for Clevo NP70PNP
        ALSA: hda: intel-dsp-config: Add RaptorLake PCI IDs
        ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook 845/865 G9
        ALSA: usb-audio: Clear MIDI port active flag after draining
        ALSA: usb-audio: add mapping for MSI MAG X570S Torpedo MAX.
        ALSA: hda/i915: Fix one too many pci_dev_put()
        ALSA: hda/hdmi: add HDMI codec VID for Raptorlake-P
        ALSA: hda/hdmi: fix warning about PCM count when used with SOF
        sound/oss/dmasound: fix 'dmasound_setup' defined but not used
        firmware: cs_dsp: Fix overrun of unterminated control name string
        ASoC: codecs: Fix an error handling path in (rx|tx|va)_macro_probe()
        ASoC: Intel: sof_es8336: Add a quirk for Huawei Matebook D15
        ASoC: Intel: sof_es8336: add a quirk for headset at mic1 port
        ASoC: Intel: sof_es8336: support a separate gpio to control headphone
        ASoC: Intel: sof_es8336: simplify speaker gpio naming
        ASoC: wm8731: Disable the regulator when probing fails
        ASoC: Intel: soc-acpi: correct device endpoints for max98373
        ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use
        ASoC: SOF: topology: Fix memory leak in sof_control_load()
        ASoC: SOF: topology: cleanup dailinks on widget unload
        ...
      2d230968
    • Matthew Wilcox (Oracle)'s avatar
      XArray: Disallow sibling entries of nodes · 63b1898f
      Matthew Wilcox (Oracle) authored
      There is a race between xas_split() and xas_load() which can result in
      the wrong page being returned, and thus data corruption.  Fortunately,
      it's hard to hit (syzbot took three months to find it) and often guarded
      with VM_BUG_ON().
      
      The anatomy of this race is:
      
      thread A			thread B
      order-9 page is stored at index 0x200
      				lookup of page at index 0x274
      page split starts
      				load of sibling entry at offset 9
      stores nodes at offsets 8-15
      				load of entry at offset 8
      
      The entry at offset 8 turns out to be a node, and so we descend into it,
      and load the page at index 0x234 instead of 0x274.  This is hard to fix
      on the split side; we could replace the entire node that contains the
      order-9 page instead of replacing the eight entries.  Fixing it on
      the lookup side is easier; just disallow sibling entries that point
      to nodes.  This cannot ever be a useful thing as the descent would not
      know the correct offset to use within the new node.
      
      The test suite continues to pass, but I have not added a new test for
      this bug.
      
      Reported-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com
      Tested-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com
      Fixes: 6b24ca4a ("mm: Use multi-index entries in the page cache")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      63b1898f
    • Matthew Wilcox (Oracle)'s avatar
      tools: Add kmem_cache_alloc_lru() · b9663a6f
      Matthew Wilcox (Oracle) authored
      Turn kmem_cache_alloc() into a wrapper around kmem_cache_alloc_lru().
      
      Fixes: 9bbdc0f3 ("xarray: use kmem_cache_alloc_lru to allocate xa_node")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reported-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarLi Wang <liwang@redhat.com>
      b9663a6f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 281b9d9a
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 patches.
      
        Subsystems affected by this patch series: mm (memory-failure, memcg,
        userfaultfd, hugetlbfs, mremap, oom-kill, kasan, hmm), and kcov"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
        kcov: don't generate a warning on vm_insert_page()'s failure
        MAINTAINERS: add Vincenzo Frascino to KASAN reviewers
        oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
        selftest/vm: add skip support to mremap_test
        selftest/vm: support xfail in mremap_test
        selftest/vm: verify remap destination address in mremap_test
        selftest/vm: verify mmap addr in mremap_test
        mm, hugetlb: allow for "high" userspace addresses
        userfaultfd: mark uffd_wp regardless of VM_WRITE flag
        memcg: sync flush only if periodic flush is delayed
        mm/memory-failure.c: skip huge_zero_page in memory_failure()
        mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
      281b9d9a
    • Nicholas Piggin's avatar
      mm/vmalloc: huge vmalloc backing pages should be split rather than compound · 3b8000ae
      Nicholas Piggin authored
      Huge vmalloc higher-order backing pages were allocated with __GFP_COMP
      in order to allow the sub-pages to be refcounted by callers such as
      "remap_vmalloc_page [sic]" (remap_vmalloc_range).
      
      However a similar problem exists for other struct page fields callers
      use, for example fb_deferred_io_fault() takes a vmalloc'ed page and
      not only refcounts it but uses ->lru, ->mapping, ->index.
      
      This is not compatible with compound sub-pages, and can cause bad page
      state issues like
      
        BUG: Bad page state in process swapper/0  pfn:00743
        page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743
        flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff)
        raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000
        raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
        page dumped because: corrupted mapping in tail page
        Modules linked in:
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810
        Call Trace:
          dump_stack_lvl+0x74/0xa8 (unreliable)
          bad_page+0x12c/0x170
          free_tail_pages_check+0xe8/0x190
          free_pcp_prepare+0x31c/0x4e0
          free_unref_page+0x40/0x1b0
          __vunmap+0x1d8/0x420
          ...
      
      The correct approach is to use split high-order pages for the huge
      vmalloc backing. These allow callers to treat them in exactly the same
      way as individually-allocated order-0 pages.
      
      Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Cc: Paul Menzel <pmenzel@molgen.mpg.de>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b8000ae
    • Muchun Song's avatar
      arm64: mm: fix p?d_leaf() · 23bc8f69
      Muchun Song authored
      The pmd_leaf() is used to test a leaf mapped PMD, however, it misses
      the PROT_NONE mapped PMD on arm64.  Fix it.  A real world issue [1]
      caused by this was reported by Qian Cai. Also fix pud_leaf().
      
      Link: https://patchwork.kernel.org/comment/24798260/ [1]
      Fixes: 8aa82df3 ("arm64: mm: add p?d_leaf() definitions")
      Reported-by: default avatarQian Cai <quic_qiancai@quicinc.com>
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Link: https://lore.kernel.org/r/20220422060033.48711-1-songmuchun@bytedance.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      23bc8f69
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm · d569e869
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Extra quiet after Easter, only have minor i915 and msm pulls. However
        I haven't seen a PR from our misc tree in a little while, I've cc'ed
        all the suspects. Once that unblocks I expect a bit larger bunch of
        patches to arrive.
      
        Otherwise as I said, one msm revert and two i915 fixes.
      
        msm:
      
         - revert iommu change that broke some platforms.
      
        i915:
      
         - Unset enable_psr2_sel_fetch if PSR2 detection fails
      
         - Fix to detect when VRR is turned off from panel settings"
      
      * tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm:
        drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails
        drm/msm: Revert "drm/msm: Stop using iommu_present()"
        drm/i915/display/vrr: Reset VRR capable property on a long hpd
      d569e869
    • Alistair Popple's avatar
      mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() · 31956166
      Alistair Popple authored
      In some cases it is possible for mmu_interval_notifier_remove() to race
      with mn_tree_inv_end() allowing it to return while the notifier data
      structure is still in use.  Consider the following sequence:
      
        CPU0 - mn_tree_inv_end()            CPU1 - mmu_interval_notifier_remove()
        ----------------------------------- ------------------------------------
                                            spin_lock(subscriptions->lock);
                                            seq = subscriptions->invalidate_seq;
        spin_lock(subscriptions->lock);     spin_unlock(subscriptions->lock);
        subscriptions->invalidate_seq++;
                                            wait_event(invalidate_seq != seq);
                                            return;
        interval_tree_remove(interval_sub); kfree(interval_sub);
        spin_unlock(subscriptions->lock);
        wake_up_all();
      
      As the wait_event() condition is true it will return immediately.  This
      can lead to use-after-free type errors if the caller frees the data
      structure containing the interval notifier subscription while it is
      still on a deferred list.  Fix this by taking the appropriate lock when
      reading invalidate_seq to ensure proper synchronisation.
      
      I observed this whilst running stress testing during some development.
      You do have to be pretty unlucky, but it leads to the usual problems of
      use-after-free (memory corruption, kernel crash, difficult to diagnose
      WARN_ON, etc).
      
      Link: https://lkml.kernel.org/r/20220420043734.476348-1-apopple@nvidia.com
      Fixes: 99cb252f ("mm/mmu_notifier: add an interval tree notifier")
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31956166
    • Aleksandr Nogikh's avatar
      kcov: don't generate a warning on vm_insert_page()'s failure · ecc04463
      Aleksandr Nogikh authored
      vm_insert_page()'s failure is not an unexpected condition, so don't do
      WARN_ONCE() in such a case.
      
      Instead, print a kernel message and just return an error code.
      
      This flaw has been reported under an OOM condition by sysbot [1].
      
      The message is mainly for the benefit of the test log, in this case the
      fuzzer's log so that humans inspecting the log can figure out what was
      going on.  KCOV is a testing tool, so I think being a little more chatty
      when KCOV unexpectedly is about to fail will save someone debugging
      time.
      
      We don't want the WARN, because it's not a kernel bug that syzbot should
      report, and failure can happen if the fuzzer tries hard enough (as
      above).
      
      Link: https://lkml.kernel.org/r/Ylkr2xrVbhQYwNLf@elver.google.com [1]
      Link: https://lkml.kernel.org/r/20220401182512.249282-1-nogikh@google.com
      Fixes: b3d7fe86 ("kcov: properly handle subsequent mmap calls"),
      Signed-off-by: default avatarAleksandr Nogikh <nogikh@google.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Taras Madan <tarasmadan@google.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ecc04463
    • Vincenzo Frascino's avatar
      MAINTAINERS: add Vincenzo Frascino to KASAN reviewers · 415fccf8
      Vincenzo Frascino authored
      Add my email address to KASAN reviewers list to make sure that I am
      Cc'ed in all the KASAN changes that may affect arm64 MTE.
      
      Link: https://lkml.kernel.org/r/20220419170640.21404-1-vincenzo.frascino@arm.comSigned-off-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      415fccf8
    • Nico Pache's avatar
      oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup · e4a38402
      Nico Pache authored
      The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which
      can be targeted by the oom reaper.  This mapping is used to store the
      futex robust list head; the kernel does not keep a copy of the robust
      list and instead references a userspace address to maintain the
      robustness during a process death.
      
      A race can occur between exit_mm and the oom reaper that allows the oom
      reaper to free the memory of the futex robust list before the exit path
      has handled the futex death:
      
          CPU1                               CPU2
          --------------------------------------------------------------------
          page_fault
          do_exit "signal"
          wake_oom_reaper
                                              oom_reaper
                                              oom_reap_task_mm (invalidates mm)
          exit_mm
          exit_mm_release
          futex_exit_release
          futex_cleanup
          exit_robust_list
          get_user (EFAULT- can't access memory)
      
      If the get_user EFAULT's, the kernel will be unable to recover the
      waiters on the robust_list, leaving userspace mutexes hung indefinitely.
      
      Delay the OOM reaper, allowing more time for the exit path to perform
      the futex cleanup.
      
      Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer
      
      Based on a patch by Michal Hocko.
      
      Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1]
      Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com
      Fixes: 21292580 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
      Signed-off-by: default avatarJoel Savitz <jsavitz@redhat.com>
      Signed-off-by: default avatarNico Pache <npache@redhat.com>
      Co-developed-by: default avatarJoel Savitz <jsavitz@redhat.com>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Herton R. Krzesinski <herton@redhat.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joel Savitz <jsavitz@redhat.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4a38402
    • Sidhartha Kumar's avatar
      selftest/vm: add skip support to mremap_test · 80df2fb9
      Sidhartha Kumar authored
      Allow the mremap test to be skipped due to errors such as failing to
      parse the mmap_min_addr sysctl.
      
      Link: https://lkml.kernel.org/r/20220420215721.4868-4-sidhartha.kumar@oracle.comSigned-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80df2fb9
    • Sidhartha Kumar's avatar
      e5508fc5
    • Sidhartha Kumar's avatar
      selftest/vm: verify remap destination address in mremap_test · 18d609da
      Sidhartha Kumar authored
      Because mremap does not have a MAP_FIXED_NOREPLACE flag, it can destroy
      existing mappings.  This causes a segfault when regions such as text are
      remapped and the permissions are changed.
      
      Verify the requested mremap destination address does not overlap any
      existing mappings by using mmap's MAP_FIXED_NOREPLACE flag.  Keep
      incrementing the destination address until a valid mapping is found or
      fail the current test once the max address is reached.
      
      Link: https://lkml.kernel.org/r/20220420215721.4868-2-sidhartha.kumar@oracle.comSigned-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18d609da
    • Sidhartha Kumar's avatar
      selftest/vm: verify mmap addr in mremap_test · 9c85a9ba
      Sidhartha Kumar authored
      Avoid calling mmap with requested addresses that are less than the
      system's mmap_min_addr.  When run as root, mmap returns EACCES when
      trying to map addresses < mmap_min_addr.  This is not one of the error
      codes for the condition to retry the mmap in the test.
      
      Rather than arbitrarily retrying on EACCES, don't attempt an mmap until
      addr > vm.mmap_min_addr.
      
      Add a munmap call after an alignment check as the mappings are retained
      after the retry and can reach the vm.max_map_count sysctl.
      
      Link: https://lkml.kernel.org/r/20220420215721.4868-1-sidhartha.kumar@oracle.comSigned-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c85a9ba
    • Christophe Leroy's avatar
      mm, hugetlb: allow for "high" userspace addresses · 5f24d5a5
      Christophe Leroy authored
      This is a fix for commit f6795053 ("mm: mmap: Allow for "high"
      userspace addresses") for hugetlb.
      
      This patch adds support for "high" userspace addresses that are
      optionally supported on the system and have to be requested via a hint
      mechanism ("high" addr parameter to mmap).
      
      Architectures such as powerpc and x86 achieve this by making changes to
      their architectural versions of hugetlb_get_unmapped_area() function.
      However, arm64 uses the generic version of that function.
      
      So take into account arch_get_mmap_base() and arch_get_mmap_end() in
      hugetlb_get_unmapped_area().  To allow that, move those two macros out
      of mm/mmap.c into include/linux/sched/mm.h
      
      If these macros are not defined in architectural code then they default
      to (TASK_SIZE) and (base) so should not introduce any behavioural
      changes to architectures that do not define them.
      
      For the time being, only ARM64 is affected by this change.
      
      Catalin (ARM64) said
       "We should have fixed hugetlb_get_unmapped_area() as well when we added
        support for 52-bit VA. The reason for commit f6795053 was to
        prevent normal mmap() from returning addresses above 48-bit by default
        as some user-space had hard assumptions about this.
      
        It's a slight ABI change if you do this for hugetlb_get_unmapped_area()
        but I doubt anyone would notice. It's more likely that the current
        behaviour would cause issues, so I'd rather have them consistent.
      
        Basically when arm64 gained support for 52-bit addresses we did not
        want user-space calling mmap() to suddenly get such high addresses,
        otherwise we could have inadvertently broken some programs (similar
        behaviour to x86 here). Hence we added commit f6795053. But we
        missed hugetlbfs which could still get such high mmap() addresses. So
        in theory that's a potential regression that should have bee addressed
        at the same time as commit f6795053 (and before arm64 enabled
        52-bit addresses)"
      
      Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu
      Fixes: f6795053 ("mm: mmap: Allow for "high" userspace addresses")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>	[5.0.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f24d5a5
    • Nadav Amit's avatar
      userfaultfd: mark uffd_wp regardless of VM_WRITE flag · 0e88904c
      Nadav Amit authored
      When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
      currently only marked as write-protected if the VMA has VM_WRITE flag
      set.  This seems incorrect or at least would be unexpected by the users.
      
      Consider the following sequence of operations that are being performed
      on a certain page:
      
      	mprotect(PROT_READ)
      	UFFDIO_COPY(UFFDIO_COPY_MODE_WP)
      	mprotect(PROT_READ|PROT_WRITE)
      
      At this point the user would expect to still get UFFD notification when
      the page is accessed for write, but the user would not get one, since
      the PTE was not marked as UFFD_WP during UFFDIO_COPY.
      
      Fix it by always marking PTEs as UFFD_WP regardless on the
      write-permission in the VMA flags.
      
      Link: https://lkml.kernel.org/r/20220217211602.2769-1-namit@vmware.com
      Fixes: 292924b2 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit")
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e88904c
    • Shakeel Butt's avatar
      memcg: sync flush only if periodic flush is delayed · 9b301615
      Shakeel Butt authored
      Daniel Dao has reported [1] a regression on workloads that may trigger a
      lot of refaults (anon and file).  The underlying issue is that flushing
      rstat is expensive.  Although rstat flush are batched with (nr_cpus *
      MEMCG_BATCH) stat updates, it seems like there are workloads which
      genuinely do stat updates larger than batch value within short amount of
      time.  Since the rstat flush can happen in the performance critical
      codepaths like page faults, such workload can suffer greatly.
      
      This patch fixes this regression by making the rstat flushing
      conditional in the performance critical codepaths.  More specifically,
      the kernel relies on the async periodic rstat flusher to flush the stats
      and only if the periodic flusher is delayed by more than twice the
      amount of its normal time window then the kernel allows rstat flushing
      from the performance critical codepaths.
      
      Now the question: what are the side-effects of this change? The worst
      that can happen is the refault codepath will see 4sec old lruvec stats
      and may cause false (or missed) activations of the refaulted page which
      may under-or-overestimate the workingset size.  Though that is not very
      concerning as the kernel can already miss or do false activations.
      
      There are two more codepaths whose flushing behavior is not changed by
      this patch and we may need to come to them in future.  One is the
      writeback stats used by dirty throttling and second is the deactivation
      heuristic in the reclaim.  For now keeping an eye on them and if there
      is report of regression due to these codepaths, we will reevaluate then.
      
      Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1]
      Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com
      Fixes: 1f828223 ("memcg: flush lruvec stats in the refault")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reported-by: default avatarDaniel Dao <dqminh@cloudflare.com>
      Tested-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Frank Hofmann <fhofmann@cloudflare.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b301615
    • Xu Yu's avatar
      mm/memory-failure.c: skip huge_zero_page in memory_failure() · d173d541
      Xu Yu authored
      Kernel panic when injecting memory_failure for the global
      huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
      
        Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
        page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
        head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
        flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
        raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
        raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
        page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
        ------------[ cut here ]------------
        kernel BUG at mm/huge_memory.c:2499!
        invalid opcode: 0000 [#1] PREEMPT SMP PTI
        CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
        Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
        RIP: 0010:split_huge_page_to_list+0x66a/0x880
        Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
        RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
        RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
        RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
        R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
        R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
        FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         try_to_split_thp_page+0x3a/0x130
         memory_failure+0x128/0x800
         madvise_inject_error.cold+0x8b/0xa1
         __x64_sys_madvise+0x54/0x60
         do_syscall_64+0x35/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7fc3754f8bf9
        Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
        RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
        RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
        RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
        RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
        R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
        R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
      
      This makes huge_zero_page bail out explicitly before split in
      memory_failure(), thus the panic above won't happen again.
      
      Link: https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com
      Fixes: 6a46079c ("HWPOISON: The high level memory error handler in the VM v7")
      Signed-off-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Suggested-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d173d541
    • Naoya Horiguchi's avatar
      mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() · 405ce051
      Naoya Horiguchi authored
      There is a race condition between memory_failure_hugetlb() and hugetlb
      free/demotion, which causes setting PageHWPoison flag on the wrong page.
      The one simple result is that wrong processes can be killed, but another
      (more serious) one is that the actual error is left unhandled, so no one
      prevents later access to it, and that might lead to more serious results
      like consuming corrupted data.
      
      Think about the below race window:
      
        CPU 1                                   CPU 2
        memory_failure_hugetlb
        struct page *head = compound_head(p);
                                                hugetlb page might be freed to
                                                buddy, or even changed to another
                                                compound page.
      
        get_hwpoison_page -- page is not what we want now...
      
      The current code first does prechecks roughly and then reconfirms after
      taking refcount, but it's found that it makes code overly complicated,
      so move the prechecks in a single hugetlb_lock range.
      
      A newly introduced function, try_memory_failure_hugetlb(), always takes
      hugetlb_lock (even for non-hugetlb pages).  That can be improved, but
      memory_failure() is rare in principle, so should not be a big problem.
      
      Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev
      Fixes: 761ad8d7 ("mm: hwpoison: introduce memory_failure_hugetlb()")
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reported-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      405ce051
  2. 21 Apr, 2022 12 commits
    • Dave Airlie's avatar
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine · b05a5683
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "A bunch of driver fixes:
      
         - idxd device RO checks and device cleanup
      
         - dw-edma unaligned access and alignment
      
         - qcom: missing minItems in binding
      
         - mediatek pm usage fix
      
         - imx init script"
      
      * tag 'dmaengine-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
        dt-bindings: dmaengine: qcom: gpi: Add minItems for interrupts
        dmaengine: idxd: skip clearing device context when device is read-only
        dmaengine: idxd: add RO check for wq max_transfer_size write
        dmaengine: idxd: add RO check for wq max_batch_size write
        dmaengine: idxd: fix retry value to be constant for duration of function call
        dmaengine: idxd: match type for retries var in idxd_enqcmds()
        dmaengine: dw-edma: Fix inconsistent indenting
        dmaengine: dw-edma: Fix unaligned 64bit access
        dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources
        dmaengine: imx-sdma: Fix error checking in sdma_event_remap
        dma: at_xdmac: fix a missing check on list iterator
        dmaengine: imx-sdma: fix init of uart scripts
        dmaengine: idxd: fix device cleanup on disable
      b05a5683
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2022-04-20' of... · e827d149
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2022-04-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      - Unset enable_psr2_sel_fetch if PSR2 detection fails
      - Fix to detect when VRR is turned off from panel settings
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/YmAKuHwon7hGyIoC@jlahtine-mobl.ger.corp.intel.com
      e827d149
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 59f0c244
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from xfrm and can.
      
        Current release - regressions:
      
         - rxrpc: restore removed timer deletion
      
        Current release - new code bugs:
      
         - gre: fix device lookup for l3mdev use-case
      
         - xfrm: fix egress device lookup for l3mdev use-case
      
        Previous releases - regressions:
      
         - sched: cls_u32: fix netns refcount changes in u32_change()
      
         - smc: fix sock leak when release after smc_shutdown()
      
         - xfrm: limit skb_page_frag_refill use to a single page
      
         - eth: atlantic: invert deep par in pm functions, preventing null
           derefs
      
         - eth: stmmac: use readl_poll_timeout_atomic() in atomic state
      
        Previous releases - always broken:
      
         - gre: fix skb_under_panic on xmit
      
         - openvswitch: fix OOB access in reserve_sfa_size()
      
         - dsa: hellcreek: calculate checksums in tagger
      
         - eth: ice: fix crash in switchdev mode
      
         - eth: igc:
            - fix infinite loop in release_swfw_sync
            - fix scheduling while atomic"
      
      * tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
        drivers: net: hippi: Fix deadlock in rr_close()
        selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets
        selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets
        nfc: MAINTAINERS: add Bug entry
        net: stmmac: Use readl_poll_timeout_atomic() in atomic state
        doc/ip-sysctl: add bc_forwarding
        netlink: reset network and mac headers in netlink_dump()
        net: mscc: ocelot: fix broken IP multicast flooding
        net: dsa: hellcreek: Calculate checksums in tagger
        net: atlantic: invert deep par in pm functions, preventing null derefs
        can: isotp: stop timeout monitoring when no first frame was sent
        bonding: do not discard lowest hash bit for non layer3+4 hashing
        net: lan966x: Make sure to release ptp interrupt
        ipv6: make ip6_rt_gc_expire an atomic_t
        net: Handle l3mdev in ip_tunnel_init_flow
        l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu
        net/sched: cls_u32: fix possible leak in u32_init_knode()
        net/sched: cls_u32: fix netns refcount changes in u32_change()
        powerpc: Update MAINTAINERS for ibmvnic and VAS
        net: restore alpha order to Ethernet devices in config
        ...
      59f0c244
    • Tim Crawford's avatar
      ALSA: hda/realtek: Add quirk for Clevo NP70PNP · 86222af0
      Tim Crawford authored
      Fixes headset detection on Clevo NP70PNP.
      Signed-off-by: default avatarTim Crawford <tcrawford@system76.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220421170412.3697-1-tcrawford@system76.comSigned-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      86222af0
    • Gongjun Song's avatar
    • Christian Brauner's avatar
      fs: unset MNT_WRITE_HOLD on failure · 0014edae
      Christian Brauner authored
      After mnt_hold_writers() has been called we will always have set MNT_WRITE_HOLD
      and consequently we always need to pair mnt_hold_writers() with
      mnt_unhold_writers(). After the recent cleanup in [1] where Al switched from a
      do-while to a for loop the cleanup currently fails to unset MNT_WRITE_HOLD for
      the first mount that was changed. Fix this and make sure that the first mount
      will be cleaned up and add some comments to make it more obvious.
      
      Link: https://lore.kernel.org/lkml/0000000000007cc21d05dd0432b8@google.com
      Link: https://lore.kernel.org/lkml/00000000000080e10e05dd043247@google.com
      Link: https://lore.kernel.org/r/20220420131925.2464685-1-brauner@kernel.org
      Fixes: e257039f ("mount_setattr(): clean the control flow and calling conventions") [1]
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reported-by: syzbot+10a16d1c43580983f6a2@syzkaller.appspotmail.com
      Reported-by: syzbot+306090cfa3294f0bbfb3@syzkaller.appspotmail.com
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      0014edae
    • Duoming Zhou's avatar
      drivers: net: hippi: Fix deadlock in rr_close() · bc6de287
      Duoming Zhou authored
      There is a deadlock in rr_close(), which is shown below:
      
         (Thread 1)                |      (Thread 2)
                                   | rr_open()
      rr_close()                   |  add_timer()
       spin_lock_irqsave() //(1)   |  (wait a time)
       ...                         | rr_timer()
       del_timer_sync()            |  spin_lock_irqsave() //(2)
       (wait timer to stop)        |  ...
      
      We hold rrpriv->lock in position (1) of thread 1 and
      use del_timer_sync() to wait timer to stop, but timer handler
      also need rrpriv->lock in position (2) of thread 2.
      As a result, rr_close() will block forever.
      
      This patch extracts del_timer_sync() from the protection of
      spin_lock_irqsave(), which could let timer handler to obtain
      the needed lock.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Link: https://lore.kernel.org/r/20220417125519.82618-1-duoming@zju.edu.cnSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bc6de287
    • Andy Chi's avatar
      ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook 845/865 G9 · b3fbe536
      Andy Chi authored
      On HP EliteBook 845 G9 and EliteBook 865 G9, the audio LEDs can be enabled by
      ALC285_FIXUP_HP_MUTE_LED. So use it accordingly.
      Signed-off-by: default avatarAndy Chi <andy.chi@canonical.com>
      Fixes: 07bcab93 ("ALSA: hda/realtek: Add support for HP Laptops")
      Link: https://lore.kernel.org/r/20220421063606.39772-1-andy.chi@canonical.comSigned-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      b3fbe536
    • Ronnie Sahlberg's avatar
      cifs: destage any unwritten data to the server before calling copychunk_write · f5d0f921
      Ronnie Sahlberg authored
      because the copychunk_write might cover a region of the file that has not yet
      been sent to the server and thus fail.
      
      A simple way to reproduce this is:
      truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile
      
      the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with
      the 'fcollapse 16k 24k' due to write-back caching.
      
      fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call
      and it will fail serverside since the file is still 0b in size serverside
      until the writes have been destaged.
      To avoid this we must ensure that we destage any unwritten data to the
      server before calling COPYCHUNK_WRITE.
      
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373Reported-by: default avatarXiaoli Feng <xifeng@redhat.com>
      Signed-off-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      f5d0f921
    • Paulo Alcantara's avatar
      cifs: use correct lock type in cifs_reconnect() · cd70a3e8
      Paulo Alcantara authored
      TCP_Server_Info::origin_fullpath and TCP_Server_Info::leaf_fullpath
      are protected by refpath_lock mutex and not cifs_tcp_ses_lock
      spinlock.
      Signed-off-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      cd70a3e8
    • Paulo Alcantara's avatar
      cifs: fix NULL ptr dereference in refresh_mounts() · 41f10081
      Paulo Alcantara authored
      Either mount(2) or automount might not have server->origin_fullpath
      set yet while refresh_cache_worker() is attempting to refresh DFS
      referrals.  Add missing NULL check and locking around it.
      
      This fixes bellow crash:
      
      [ 1070.276835] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
      [ 1070.277676] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      [ 1070.278219] CPU: 1 PID: 8506 Comm: kworker/u8:1 Not tainted 5.18.0-rc3 #10
      [ 1070.278701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
      [ 1070.279495] Workqueue: cifs-dfscache refresh_cache_worker [cifs]
      [ 1070.280044] RIP: 0010:strcasecmp+0x34/0x150
      [ 1070.280359] Code: 00 00 00 fc ff df 41 54 55 48 89 fd 53 48 83 ec 10 eb 03 4c 89 fe 48 89 ef 48 83 c5 01 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 08 84 c0 0f 85 bc 00 00 00 0f b6 45 ff 44
      [ 1070.281729] RSP: 0018:ffffc90008367958 EFLAGS: 00010246
      [ 1070.282114] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
      [ 1070.282691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [ 1070.283273] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff873eda27
      [ 1070.283857] R10: ffffc900083679a0 R11: 0000000000000001 R12: ffff88812624c000
      [ 1070.284436] R13: dffffc0000000000 R14: ffff88810e6e9a88 R15: ffff888119bb9000
      [ 1070.284990] FS:  0000000000000000(0000) GS:ffff888151200000(0000) knlGS:0000000000000000
      [ 1070.285625] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1070.286100] CR2: 0000561a4d922418 CR3: 000000010aecc000 CR4: 0000000000350ee0
      [ 1070.286683] Call Trace:
      [ 1070.286890]  <TASK>
      [ 1070.287070]  refresh_cache_worker+0x895/0xd20 [cifs]
      [ 1070.287475]  ? __refresh_tcon.isra.0+0xfb0/0xfb0 [cifs]
      [ 1070.287905]  ? __lock_acquire+0xcd1/0x6960
      [ 1070.288247]  ? is_dynamic_key+0x1a0/0x1a0
      [ 1070.288591]  ? lockdep_hardirqs_on_prepare+0x410/0x410
      [ 1070.289012]  ? lock_downgrade+0x6f0/0x6f0
      [ 1070.289318]  process_one_work+0x7bd/0x12d0
      [ 1070.289637]  ? worker_thread+0x160/0xec0
      [ 1070.289970]  ? pwq_dec_nr_in_flight+0x230/0x230
      [ 1070.290318]  ? _raw_spin_lock_irq+0x5e/0x90
      [ 1070.290619]  worker_thread+0x5ac/0xec0
      [ 1070.290891]  ? process_one_work+0x12d0/0x12d0
      [ 1070.291199]  kthread+0x2a5/0x350
      [ 1070.291430]  ? kthread_complete_and_exit+0x20/0x20
      [ 1070.291770]  ret_from_fork+0x22/0x30
      [ 1070.292050]  </TASK>
      [ 1070.292223] Modules linked in: bpfilter cifs cifs_arc4 cifs_md4
      [ 1070.292765] ---[ end trace 0000000000000000 ]---
      [ 1070.293108] RIP: 0010:strcasecmp+0x34/0x150
      [ 1070.293471] Code: 00 00 00 fc ff df 41 54 55 48 89 fd 53 48 83 ec 10 eb 03 4c 89 fe 48 89 ef 48 83 c5 01 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 08 84 c0 0f 85 bc 00 00 00 0f b6 45 ff 44
      [ 1070.297718] RSP: 0018:ffffc90008367958 EFLAGS: 00010246
      [ 1070.298622] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
      [ 1070.299428] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [ 1070.300296] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff873eda27
      [ 1070.301204] R10: ffffc900083679a0 R11: 0000000000000001 R12: ffff88812624c000
      [ 1070.301932] R13: dffffc0000000000 R14: ffff88810e6e9a88 R15: ffff888119bb9000
      [ 1070.302645] FS:  0000000000000000(0000) GS:ffff888151200000(0000) knlGS:0000000000000000
      [ 1070.303462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1070.304131] CR2: 0000561a4d922418 CR3: 000000010aecc000 CR4: 0000000000350ee0
      [ 1070.305004] Kernel panic - not syncing: Fatal exception
      [ 1070.305711] Kernel Offset: disabled
      [ 1070.305971] ---[ end Kernel panic - not syncing: Fatal exception ]---
      Signed-off-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      41f10081
  3. 20 Apr, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa · b2534357
      Linus Torvalds authored
      Pull xtensa fixes from Max Filippov:
      
       - fix patching CPU selection in patch_text
      
       - fix potential deadlock in ISS platform serial driver
      
       - fix potential register clobbering in coprocessor exception handler
      
      * tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix a7 clobbering in coprocessor context load/store
        arch: xtensa: platforms: Fix deadlock in rs_close()
        xtensa: patch_text: Fixup last cpu should be master
      b2534357
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 10c5f102
      Linus Torvalds authored
      Pull erofs fixes from Gao Xiang:
       "One patch to fix a use-after-free race related to the on-stack
        z_erofs_decompressqueue, which happens very rarely but needs to be
        fixed properly soon.
      
        The other patch fixes some sysfs Sphinx warnings"
      
      * tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors
        erofs: fix use-after-free of on-stack io[]
      10c5f102
    • Linus Torvalds's avatar
      Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array" · 906f9040
      Linus Torvalds authored
      This reverts commit 5a519c8f.
      
      It turns out that making the pipe almost arbitrarily large has some
      rather unexpected downsides.  The kernel test robot reports a kernel
      warning that is due to pipe->max_usage now growing to the point where
      the iter_file_splice_write() buffer allocation can no longer be
      satisfied as a slab allocation, and the
      
              int nbufs = pipe->max_usage;
              struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
                                              GFP_KERNEL);
      
      code sequence there will now always fail as a result.
      
      That code could be modified to use kvcalloc() too, but I feel very
      uncomfortable making those kinds of changes for a very niche use case
      that really should have other options than make these kinds of
      fundamental changes to pipe behavior.
      
      Maybe the CRIU process dumping should be multi-threaded, and use
      multiple pipes and multiple cores, rather than try to use one larger
      pipe to minimize splice() calls.
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      906f9040
    • Mikulas Patocka's avatar
      x86: __memcpy_flushcache: fix wrong alignment if size > 2^32 · a6823e4e
      Mikulas Patocka authored
      The first "if" condition in __memcpy_flushcache is supposed to align the
      "dest" variable to 8 bytes and copy data up to this alignment.  However,
      this condition may misbehave if "size" is greater than 4GiB.
      
      The statement min_t(unsigned, size, ALIGN(dest, 8) - dest); casts both
      arguments to unsigned int and selects the smaller one.  However, the
      cast truncates high bits in "size" and it results in misbehavior.
      
      For example:
      
      	suppose that size == 0x100000001, dest == 0x200000002
      	min_t(unsigned, size, ALIGN(dest, 8) - dest) == min_t(0x1, 0xe) == 0x1;
      	...
      	dest += 0x1;
      
      so we copy just one byte "and" dest remains unaligned.
      
      This patch fixes the bug by replacing unsigned with size_t.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a6823e4e