1. 12 Jun, 2023 11 commits
    • Luís Henriques's avatar
      ocfs2: fix use-after-free when unmounting read-only filesystem · 50d92788
      Luís Henriques authored
      It's trivial to trigger a use-after-free bug in the ocfs2 quotas code using
      fstest generic/452.  After a read-only remount, quotas are suspended and
      ocfs2_mem_dqinfo is freed through ->ocfs2_local_free_info().  When unmounting
      the filesystem, an UAF access to the oinfo will eventually cause a crash.
       
      BUG: KASAN: slab-use-after-free in timer_delete+0x54/0xc0
      Read of size 8 at addr ffff8880389a8208 by task umount/669
      ...
      Call Trace:
       <TASK>
       ...
       timer_delete+0x54/0xc0
       try_to_grab_pending+0x31/0x230
       __cancel_work_timer+0x6c/0x270
       ocfs2_disable_quotas.isra.0+0x3e/0xf0 [ocfs2]
       ocfs2_dismount_volume+0xdd/0x450 [ocfs2]
       generic_shutdown_super+0xaa/0x280
       kill_block_super+0x46/0x70
       deactivate_locked_super+0x4d/0xb0
       cleanup_mnt+0x135/0x1f0
       ...
       </TASK>
      
      Allocated by task 632:
       kasan_save_stack+0x1c/0x40
       kasan_set_track+0x21/0x30
       __kasan_kmalloc+0x8b/0x90
       ocfs2_local_read_info+0xe3/0x9a0 [ocfs2]
       dquot_load_quota_sb+0x34b/0x680
       dquot_load_quota_inode+0xfe/0x1a0
       ocfs2_enable_quotas+0x190/0x2f0 [ocfs2]
       ocfs2_fill_super+0x14ef/0x2120 [ocfs2]
       mount_bdev+0x1be/0x200
       legacy_get_tree+0x6c/0xb0
       vfs_get_tree+0x3e/0x110
       path_mount+0xa90/0xe10
       __x64_sys_mount+0x16f/0x1a0
       do_syscall_64+0x43/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      Freed by task 650:
       kasan_save_stack+0x1c/0x40
       kasan_set_track+0x21/0x30
       kasan_save_free_info+0x2a/0x50
       __kasan_slab_free+0xf9/0x150
       __kmem_cache_free+0x89/0x180
       ocfs2_local_free_info+0x2ba/0x3f0 [ocfs2]
       dquot_disable+0x35f/0xa70
       ocfs2_susp_quotas.isra.0+0x159/0x1a0 [ocfs2]
       ocfs2_remount+0x150/0x580 [ocfs2]
       reconfigure_super+0x1a5/0x3a0
       path_mount+0xc8a/0xe10
       __x64_sys_mount+0x16f/0x1a0
       do_syscall_64+0x43/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      Link: https://lkml.kernel.org/r/20230522102112.9031-1-lhenriques@suse.deSigned-off-by: default avatarLuís Henriques <lhenriques@suse.de>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Tested-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      50d92788
    • Lorenzo Stoakes's avatar
      lib/test_vmalloc.c: avoid garbage in page array · 9f6c6ad1
      Lorenzo Stoakes authored
      It turns out that alloc_pages_bulk_array() does not treat the page_array
      parameter as an output parameter, but rather reads the array and skips any
      entries that have already been allocated.
      
      This is somewhat unexpected and breaks this test, as we allocate the pages
      array uninitialised on the assumption it will be overwritten.
      
      As a result, the test was referencing uninitialised data and causing the
      PFN to not be valid and thus a WARN_ON() followed by a null pointer deref
      and panic.
      
      In addition, this is an array of pointers not of struct page objects, so we
      need only allocate an array with elements of pointer size.
      
      We solve both problems by simply using kcalloc() and referencing
      sizeof(struct page *) rather than sizeof(struct page).
      
      Link: https://lkml.kernel.org/r/20230524082424.10022-1-lstoakes@gmail.com
      Fixes: 869cb29a ("lib/test_vmalloc.c: add vm_map_ram()/vm_unmap_ram() test case")
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9f6c6ad1
    • Ryusuke Konishi's avatar
      nilfs2: fix possible out-of-bounds segment allocation in resize ioctl · fee5eaec
      Ryusuke Konishi authored
      Syzbot reports that in its stress test for resize ioctl, the log writing
      function nilfs_segctor_do_construct hits a WARN_ON in
      nilfs_segctor_truncate_segments().
      
      It turned out that there is a problem with the current implementation of
      the resize ioctl, which changes the writable range on the device (the
      range of allocatable segments) at the end of the resize process.
      
      This order is necessary for file system expansion to avoid corrupting the
      superblock at trailing edge.  However, in the case of a file system
      shrink, if log writes occur after truncating out-of-bounds trailing
      segments and before the resize is complete, segments may be allocated from
      the truncated space.
      
      The userspace resize tool was fine as it limits the range of allocatable
      segments before performing the resize, but it can run into this issue if
      the resize ioctl is called alone.
      
      Fix this issue by changing nilfs_sufile_resize() to update the range of
      allocatable segments immediately after successful truncation of segment
      space in case of file system shrink.
      
      Link: https://lkml.kernel.org/r/20230524094348.3784-1-konishi.ryusuke@gmail.com
      Fixes: 4e33f9ea ("nilfs2: implement resize ioctl")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+33494cd0df2ec2931851@syzkaller.appspotmail.com
      Closes: https://lkml.kernel.org/r/0000000000005434c405fbbafdc5@google.comTested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fee5eaec
    • Ricardo Ribalda's avatar
      riscv/purgatory: remove PGO flags · 88ac3bbc
      Ricardo Ribalda authored
      If profile-guided optimization is enabled, the purgatory ends up with
      multiple .text sections.  This is not supported by kexec and crashes the
      system.
      
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-4-b05c520b7296@chromium.org
      Fixes: 93045705 ("kernel/kexec_file.c: split up __kexec_load_puragory")
      Signed-off-by: default avatarRicardo Ribalda <ribalda@chromium.org>
      Acked-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Cc: <stable@vger.kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Philipp Rudo <prudo@redhat.com>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Rix <trix@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      88ac3bbc
    • Ricardo Ribalda's avatar
      powerpc/purgatory: remove PGO flags · 20188bac
      Ricardo Ribalda authored
      If profile-guided optimization is enabled, the purgatory ends up with
      multiple .text sections.  This is not supported by kexec and crashes the
      system.
      
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-3-b05c520b7296@chromium.org
      Fixes: 93045705 ("kernel/kexec_file.c: split up __kexec_load_puragory")
      Signed-off-by: default avatarRicardo Ribalda <ribalda@chromium.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: <stable@vger.kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Philipp Rudo <prudo@redhat.com>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Rix <trix@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      20188bac
    • Ricardo Ribalda's avatar
      x86/purgatory: remove PGO flags · 97b6b9cb
      Ricardo Ribalda authored
      If profile-guided optimization is enabled, the purgatory ends up with
      multiple .text sections.  This is not supported by kexec and crashes the
      system.
      
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-2-b05c520b7296@chromium.org
      Fixes: 93045705 ("kernel/kexec_file.c: split up __kexec_load_puragory")
      Signed-off-by: default avatarRicardo Ribalda <ribalda@chromium.org>
      Cc: <stable@vger.kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Philipp Rudo <prudo@redhat.com>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Rix <trix@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      97b6b9cb
    • Ricardo Ribalda's avatar
      kexec: support purgatories with .text.hot sections · 8652d44f
      Ricardo Ribalda authored
      Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7.
      
      When upreving llvm I realised that kexec stopped working on my test
      platform.
      
      The reason seems to be that due to PGO there are multiple .text sections
      on the purgatory, and kexec does not supports that.
      
      
      This patch (of 4):
      
      Clang16 links the purgatory text in two sections when PGO is in use:
      
        [ 1] .text             PROGBITS         0000000000000000  00000040
             00000000000011a1  0000000000000000  AX       0     0     16
        [ 2] .rela.text        RELA             0000000000000000  00003498
             0000000000000648  0000000000000018   I      24     1     8
        ...
        [17] .text.hot.        PROGBITS         0000000000000000  00003220
             000000000000020b  0000000000000000  AX       0     0     1
        [18] .rela.text.hot.   RELA             0000000000000000  00004428
             0000000000000078  0000000000000018   I      24    17     8
      
      And both of them have their range [sh_addr ... sh_addr+sh_size] on the
      area pointed by `e_entry`.
      
      This causes that image->start is calculated twice, once for .text and
      another time for .text.hot. The second calculation leaves image->start
      in a random location.
      
      Because of this, the system crashes immediately after:
      
      kexec_core: Starting new kernel
      
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-0-b05c520b7296@chromium.org
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-1-b05c520b7296@chromium.org
      Fixes: 93045705 ("kernel/kexec_file.c: split up __kexec_load_puragory")
      Signed-off-by: default avatarRicardo Ribalda <ribalda@chromium.org>
      Reviewed-by: default avatarRoss Zwisler <zwisler@google.com>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Reviewed-by: default avatarPhilipp Rudo <prudo@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Rix <trix@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8652d44f
    • Peter Xu's avatar
      mm/uffd: allow vma to merge as much as possible · 5543d3c4
      Peter Xu authored
      We used to not pass in the pgoff correctly when register/unregister uffd
      regions, it caused incorrect behavior on vma merging and can cause
      mergeable vmas being separate after ioctls return.
      
      For example, when we have:
      
        vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)
      
      Then someone unregisters uffd on range (5-9), it should logically become:
      
        vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)
      
      But with current code we'll have:
      
        vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)
      
      This patch allows such merge to happen correctly before ioctl returns.
      
      This behavior seems to have existed since the 1st day of uffd.  Since
      pgoff for vma_merge() is only used to identify the possibility of vma
      merging, meanwhile here what we did was always passing in a pgoff smaller
      than what we should, so there should have no other side effect besides not
      merging it.  Let's still tentatively copy stable for this, even though I
      don't see anything will go wrong besides vma being split (which is mostly
      not user visible).
      
      Link: https://lkml.kernel.org/r/20230517190916.3429499-3-peterx@redhat.com
      Fixes: 86039bd3 ("userfaultfd: add new syscall to provide memory externalization")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reported-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5543d3c4
    • Peter Xu's avatar
      mm/uffd: fix vma operation where start addr cuts part of vma · 270aa010
      Peter Xu authored
      Patch series "mm/uffd: Fix vma merge/split", v2.
      
      This series contains two patches that fix vma merge/split for userfaultfd
      on two separate issues.
      
      Patch 1 fixes a regression since 6.1+ due to something we overlooked when
      converting to maple tree apis.  The plan is we use patch 1 to replace the
      commit "2f628010799e (mm: userfaultfd: avoid passing an invalid range to
      vma_merge())" in mm-hostfixes-unstable tree if possible, so as to bring
      uffd vma operations back aligned with the rest code again.
      
      Patch 2 fixes a long standing issue that vma can be left unmerged even if
      we can for either uffd register or unregister.
      
      Many thanks to Lorenzo on either noticing this issue from the assert
      movement patch, looking at this problem, and also provided a reproducer on
      the unmerged vma issue [1].
      
      [1] https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e
      
      
      This patch (of 2):
      
      It seems vma merging with uffd paths is broken with either
      register/unregister, where right now we can feed wrong parameters to
      vma_merge() and it's found by recent patch which moved asserts upwards in
      vma_merge() by Lorenzo Stoakes:
      
      https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
      
      It's possible that "start" is contained within vma but not clamped to its
      start.  We need to convert this into either "cannot merge" case or "can
      merge" case 4 which permits subdivision of prev by assigning vma to prev. 
      As we loop, each subsequent VMA will be clamped to the start.
      
      This patch will eliminate the report and make sure vma_merge() calls will
      become legal again.
      
      One thing to mention is that the "Fixes: 29417d29" below is there only
      to help explain where the warning can start to trigger, the real commit to
      fix should be 69dbe6da.  Commit 29417d29 helps us to identify the
      issue, but unfortunately we may want to keep it in Fixes too just to ease
      kernel backporters for easier tracking.
      
      Link: https://lkml.kernel.org/r/20230517190916.3429499-1-peterx@redhat.com
      Link: https://lkml.kernel.org/r/20230517190916.3429499-2-peterx@redhat.com
      Fixes: 69dbe6da ("userfaultfd: use maple tree iterator to iterate VMAs")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      270aa010
    • Arnd Bergmann's avatar
      radix-tree: move declarations to header · bde1597d
      Arnd Bergmann authored
      The xarray.c file contains the only call to radix_tree_node_rcu_free(),
      and it comes with its own extern declaration for it.  This means the
      function definition causes a missing-prototype warning:
      
      lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes]
      
      Instead, move the declaration for this function to a new header that can
      be included by both, and do the same for the radix_tree_node_cachep
      variable that has the same underlying problem but does not cause a warning
      with gcc.
      
      [zhangpeng.00@bytedance.com: fix building radix tree test suite]
        Link: https://lkml.kernel.org/r/20230521095450.21332-1-zhangpeng.00@bytedance.com
      Link: https://lkml.kernel.org/r/20230516194212.548910-1-arnd@kernel.orgSigned-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarPeng Zhang <zhangpeng.00@bytedance.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bde1597d
    • Ryusuke Konishi's avatar
      nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key() · 2f012f2b
      Ryusuke Konishi authored
      A syzbot fault injection test reported that nilfs_btnode_create_block, a
      helper function that allocates a new node block for b-trees, causes a
      kernel BUG for disk images where the file system block size is smaller
      than the page size.
      
      This was due to unexpected flags on the newly allocated buffer head, and
      it turned out to be because the buffer flags were not cleared by
      nilfs_btnode_abort_change_key() after an error occurred during a b-tree
      update operation and the buffer was later reused in that state.
      
      Fix this issue by using nilfs_btnode_delete() to abandon the unused
      preallocated buffer in nilfs_btnode_abort_change_key().
      
      Link: https://lkml.kernel.org/r/20230513102428.10223-1-konishi.ryusuke@gmail.comSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+b0a35a5c1f7e846d3b09@syzkaller.appspotmail.com
      Closes: https://lkml.kernel.org/r/000000000000d1d6c205ebc4d512@google.comTested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2f012f2b
  2. 29 May, 2023 9 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 8b817fde
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "User events:
      
         - Use long instead of int for storing the enable set/clear bit, as it
           was found that big endian machines could end up using the wrong
           bits.
      
         - Split allocating mm and attaching it. This keeps the allocation
           separate from the registration and avoids various races.
      
         - Remove RCU locking around pin_user_pages_remote() as that can
           schedule. The RCU protection is no longer needed with the above
           split of mm allocation and attaching.
      
         - Rename the "link" fields of the various structs to something more
           meaningful.
      
         - Add comments around user_event_mm struct usage and locking
           requirements.
      
        Timerlat tracer:
      
         - Fix missed wakeup of timerlat thread caused by the timerlat
           interrupt triggering when tracing is off. The timer interrupt
           handler needs to always wake up the timerlat thread regardless if
           tracing is enabled or not, otherwise, it will never wake up.
      
        Histograms:
      
         - Fix regression of breaking the "stacktrace" modifier for variables.
           That modifier cannot be used for values, but can be used for
           variables that are passed from one histogram to the next. This was
           broken when adding the restriction to values as the variable logic
           used the same code.
      
         - Rename the special field "stacktrace" to "common_stacktrace".
      
           Special fields (that are not actually part of the event, but can
           act just like event fields, like 'comm' and 'timestamp') should be
           prefixed with 'common_' for consistency. To keep backward
           compatibility, 'stacktrace' can still be used (as with the special
           field 'cpu'), but can be overridden if the event has a field called
           'stacktrace'.
      
         - Update the synthetic event selftests to use the new name (synthetic
           events are created by histograms)
      
        Tracing bootup selftests:
      
         - Reorganize the code to keep artifacts of the selftests not compiled
           in when selftests are not configured.
      
         - Add various cond_resched() around the selftest code, as the
           softlock watchdog was triggering much more often. It appears that
           the kernel runs slower now with full debugging enabled.
      
         - While debugging ftrace with ftrace (using an instance ring buffer
           instead of the top level one), I found that the selftests were
           disabling prints to the debug instance.
      
           This should not happen, as the selftests only disable printing to
           the main buffer as the selftests examine the main buffer to see if
           it has what it expects, and prints can make the tests fail.
      
           Make the selftests only disable printing to the toplevel buffer,
           and leave the instance buffers alone"
      
      * tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Have function_graph selftest call cond_resched()
        tracing: Only make selftest conditionals affect the global_trace
        tracing: Make tracing_selftest_running/delete nops when not used
        tracing: Have tracer selftests call cond_resched() before running
        tracing: Move setting of tracing_selftest_running out of register_tracer()
        tracing/selftests: Update synthetic event selftest to use common_stacktrace
        tracing: Rename stacktrace field to common_stacktrace
        tracing/histograms: Allow variables to have some modifiers
        tracing/user_events: Document user_event_mm one-shot list usage
        tracing/user_events: Rename link fields for clarity
        tracing/user_events: Remove RCU lock while pinning pages
        tracing/user_events: Split up mm alloc and attach
        tracing/timerlat: Always wakeup the timerlat thread
        tracing/user_events: Use long vs int for atomic bit ops
      8b817fde
    • Linus Torvalds's avatar
      Merge tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 7a6c8e51
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "Fix an alignment crash in x86/aria"
      
      * tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
      7a6c8e51
    • Linus Torvalds's avatar
      Revert "module: error out early on concurrent load of the same module file" · ac2263b5
      Linus Torvalds authored
      This reverts commit 9828ed3f.
      
      Sadly, it does seem to cause failures to load modules. Johan Hovold reports:
      
       "This change breaks module loading during boot on the Lenovo Thinkpad
        X13s (aarch64).
      
        Specifically it results in indefinite probe deferral of the display
        and USB (ethernet) which makes it a pain to debug. Typing in the dark
        to acquire some logs reveals that other modules are missing as well"
      
      Since this was applied late as a "let's try this", I'm reverting it
      asap, and we can try to figure out what goes wrong later.  The excessive
      parallel module loading problem is annoying, but not noticeable in
      normal situations, and this was only meant as an optimistic workaround
      for a user-space bug.
      
      One possible solution may be to do the optimistic exclusive open first,
      and then use a lock to serialize loading if that fails.
      Reported-by: default avatarJohan Hovold <johan@kernel.org>
      Link: https://lore.kernel.org/lkml/ZHRpH-JXAxA6DnzR@hovoldconsulting.com/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac2263b5
    • Steven Rostedt (Google)'s avatar
      tracing: Have function_graph selftest call cond_resched() · a2d910f0
      Steven Rostedt (Google) authored
      When all kernel debugging is enabled (lockdep, KSAN, etc), the function
      graph enabling and disabling can take several seconds to complete. The
      function_graph selftest enables and disables function graph tracing
      several times. With full debugging enabled, the soft lockup watchdog was
      triggering because the selftest was running without ever scheduling.
      
      Add cond_resched() throughout the test to make sure it does not trigger
      the soft lockup detector.
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-6-rostedt@goodmis.orgSigned-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a2d910f0
    • Steven Rostedt (Google)'s avatar
      tracing: Only make selftest conditionals affect the global_trace · ac9d2cb1
      Steven Rostedt (Google) authored
      The tracing_selftest_running and tracing_selftest_disabled variables were
      to keep trace_printk() and other writes from affecting the tracing
      selftests, as the tracing selftests would examine the ring buffer to see
      if it contained what it expected or not. trace_printk() and friends could
      add to the ring buffer and cause the selftests to fail (and then disable
      the tracer that was being tested). To keep that from happening, these
      variables were added and would keep trace_printk() and friends from
      writing to the ring buffer while the tests were going on.
      
      But this was only the top level ring buffer (owned by the global_trace
      instance). There is no reason to prevent writing into ring buffers of
      other instances via the trace_array_printk() and friends. For the
      functions that could be used by other instances, check if the global_trace
      is the tracer instance that is being written to before deciding to not
      allow the write.
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-5-rostedt@goodmis.orgSigned-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ac9d2cb1
    • Steven Rostedt (Google)'s avatar
      tracing: Make tracing_selftest_running/delete nops when not used · a3ae76d7
      Steven Rostedt (Google) authored
      There's no reason to test the condition variables tracing_selftest_running
      or tracing_selftest_delete when tracing selftests are not enabled. Make
      them define 0s when not the selftests are not configured in.
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-4-rostedt@goodmis.orgSigned-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a3ae76d7
    • Steven Rostedt (Google)'s avatar
      tracing: Have tracer selftests call cond_resched() before running · 9da705d4
      Steven Rostedt (Google) authored
      As there are more and more internal selftests being added to the Linux
      kernel (KSAN, lockdep, etc) the selftests are taking longer to run when
      these are enabled. Add a cond_resched() to the calling of
      do_run_tracer_selftest() to force a schedule if NEED_RESCHED is set,
      otherwise the soft lockup watchdog may trigger on boot up.
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-3-rostedt@goodmis.orgSigned-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      9da705d4
    • Steven Rostedt (Google)'s avatar
      tracing: Move setting of tracing_selftest_running out of register_tracer() · e8352cf5
      Steven Rostedt (Google) authored
      The variables tracing_selftest_running and tracing_selftest_disabled are
      only used for when CONFIG_FTRACE_STARTUP_TEST is enabled. Make them only
      visible within the selftest code. The setting of those variables are in
      the register_tracer() call, and set in a location where they do not need
      to be. Create a wrapper around run_tracer_selftest() called
      do_run_tracer_selftest() which sets those variables, and have
      register_tracer() call that instead.
      
      Having those variables only set within the CONFIG_FTRACE_STARTUP_TEST
      scope gets rid of them (and also the ability to remove testing against
      them) when the startup tests are not enabled (most cases).
      
      Link: https://lkml.kernel.org/r/20230528051742.1325503-2-rostedt@goodmis.orgSigned-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e8352cf5
    • Linus Torvalds's avatar
      Merge tag 'phy-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy · e338142b
      Linus Torvalds authored
      Pull phy fixes from Vinod Koul:
      
       - init count imbalance fix in qcom-qmp-pcie and combo drivers
      
       - kernel doc header fix for qcom-snps driver
      
       - mediatek floating point comparison fix
      
       - amlogic fix register value
      
      * tag 'phy-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
        phy: qcom-snps: correct struct qcom_snps_hsphy kerneldoc
        phy: amlogic: phy-meson-g12a-mipi-dphy-analog: fix CNTL2_DIF_TX_CTL0 value
        phy: mediatek: rework the floating point comparisons to fixed point
        phy: qcom-qmp-pcie-msm8996: fix init-count imbalance
        phy: qcom-qmp-combo: fix init-count imbalance
      e338142b
  3. 28 May, 2023 9 commits
  4. 27 May, 2023 3 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 4e893b5a
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - a double free fix in the Xen pvcalls backend driver
      
       - a fix for a regression causing the MSI related sysfs entries to not
         being created in Xen PV guests
      
       - a fix in the Xen blkfront driver for handling insane input data
         better
      
      * tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        x86/pci/xen: populate MSI sysfs entries
        xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
        xen/blkfront: Only check REQ_FUA for writes
      4e893b5a
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 957f3f8e
      Linus Torvalds authored
      Pull char/misc fixes from Greg KH:
       "Here are some small driver fixes for 6.4-rc4. They are just two
        different types:
      
         - binder fixes and reverts for reported problems and regressions in
           the binder "driver".
      
         - coresight driver fixes for reported problems.
      
        All of these have been in linux-next for over a week with no reported
        problems"
      
      * tag 'char-misc-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        binder: fix UAF of alloc->vma in race with munmap()
        binder: add lockless binder_alloc_(set|get)_vma()
        Revert "android: binder: stop saving a pointer to the VMA"
        Revert "binder_alloc: add missing mmap_lock calls when using the VMA"
        binder: fix UAF caused by faulty buffer cleanup
        coresight: perf: Release Coresight path when alloc trace id failed
        coresight: Fix signedness bug in tmc_etr_buf_insert_barrier_packet()
      957f3f8e
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 49572d53
      Linus Torvalds authored
      Pull compute express link fixes from Dan Williams:
       "The 'media ready' series prevents the driver from acting on bad
        capacity information, and it moves some checks earlier in the init
        sequence which impacts topics in the queue for 6.5.
      
        Additional hotplug testing uncovered a missing enable for memory
        decode. A debug crash fix is also included.
      
        Summary:
      
         - Stop trusting capacity data before the "media ready" indication
      
         - Add missing HDM decoder capability enable for the cold-plug case
      
         - Fix a debug message induced crash"
      
      * tag 'cxl-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl: Explicitly initialize resources when media is not ready
        cxl/port: Fix NULL pointer access in devm_cxl_add_port()
        cxl: Move cxl_await_media_ready() to before capacity info retrieval
        cxl: Wait Memory_Info_Valid before access memory related info
        cxl/port: Enable the HDM decoder capability for switch ports
      49572d53
  5. 26 May, 2023 8 commits
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 18713e8a
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "There have not been a lot of fixes for for the soc tree in 6.4, but
        these have been sitting here for too long.
      
        For the devicetree side, there is one minor warning fix for vexpress,
        the rest all all for the the NXP i.MX platforms: SoC specific bugfixes
        for the iMX8 clocks and its USB-3.0 gadget device, as well as board
        specific fixes for regulators and the phy on some of the i.MX boards.
      
        The microchip risc-v and arm32 maintainers now also add a shared
        maintainer file entry for the arm64 parts.
      
        The remaining fixes are all for firmware drivers, addressing mistakes
        in the optee, scmi and ff-a firmware driver implementation, mostly in
        the error handling code, incorrect use of the alloc_workqueue()
        interface in SCMI, and compatibility with corner cases of the firmware
        implementation"
      
      * tag 'arm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        MAINTAINERS: update arm64 Microchip entries
        arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at super speed
        dt-binding: cdns,usb3: Fix cdns,on-chip-buff-size type
        arm64: dts: colibri-imx8x: delete adc1 and dsp
        arm64: dts: colibri-imx8x: fix iris pinctrl configuration
        arm64: dts: colibri-imx8x: move pinctrl property from SoM to eval board
        arm64: dts: colibri-imx8x: fix eval board pin configuration
        arm64: dts: imx8mp: Fix video clock parents
        ARM: dts: imx6qdl-mba6: Add missing pvcie-supply regulator
        ARM: dts: imx6ull-dhcor: Set and limit the mode for PMIC buck 1, 2 and 3
        arm64: dts: imx8mn-var-som: fix PHY detection bug by adding deassert delay
        arm64: dts: imx8mn: Fix video clock parents
        firmware: arm_ffa: Set reserved/MBZ fields to zero in the memory descriptors
        firmware: arm_ffa: Fix FFA device names for logical partitions
        firmware: arm_ffa: Fix usage of partition info get count flag
        firmware: arm_ffa: Check if ffa_driver remove is present before executing
        arm64: dts: arm: add missing cache properties
        ARM: dts: vexpress: add missing cache properties
        firmware: arm_scmi: Fix incorrect alloc_workqueue() invocation
        optee: fix uninited async notif value
      18713e8a
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.4-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 96f15fc6
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
      
       - Quirk Ice Lake Root Ports to work around DPC log size issue (Mika
         Westerberg)
      
      * tag 'pci-v6.4-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports
      96f15fc6
    • Linus Torvalds's avatar
      Merge tag 'vfio-v6.4-rc4' of https://github.com/awilliam/linux-vfio · 8846af75
      Linus Torvalds authored
      Pull VFIO fix from Alex Williamson:
      
       - Test for and return error for invalid pfns through the pin pages
         interface (Yan Zhao)
      
      * tag 'vfio-v6.4-rc4' of https://github.com/awilliam/linux-vfio:
        vfio/type1: check pfn valid before converting to struct page
      8846af75
    • Linus Torvalds's avatar
      Merge tag 'block-6.4-2023-05-26' of git://git.kernel.dk/linux · a92c9ab6
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes for the storage side of things:
      
         - Fix bio caching condition for passthrough IO (Anuj)
      
         - end-of-device check fix for zero sized devices (Christoph)
      
         - Update Paolo's email address
      
         - NVMe pull request via Keith with a single quirk addition
      
         - Fix regression in how wbt enablement is done (Yu)
      
         - Fix race in active queue accounting (Tian)"
      
      * tag 'block-6.4-2023-05-26' of git://git.kernel.dk/linux:
        NVMe: Add MAXIO 1602 to bogus nid list.
        block: make bio_check_eod work for zero sized devices
        block: fix bio-cache for passthru IO
        block, bfq: update Paolo's address in maintainer list
        blk-mq: fix race condition in active queue accounting
        blk-wbt: fix that wbt can't be disabled by default
      a92c9ab6
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.4-2023-05-26' of git://git.kernel.dk/linux · 6fae9129
      Linus Torvalds authored
      Pull io_uring fix from Jens Axboe:
       "Just a single fix for the conditional schedule with the SQPOLL thread,
        dropping the uring_lock if we do need to reschedule"
      
      * tag 'io_uring-6.4-2023-05-26' of git://git.kernel.dk/linux:
        io_uring: unlock sqd->lock before sq thread release CPU
      6fae9129
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 77af1f2b
      Linus Torvalds authored
      Pull thermal control fix from Rafael Wysocki:
       "Fix a regression introduced inadvertently during the 6.3 cycle by a
        commit making the Intel int340x thermal driver use sysfs_emit_at()
        instead of scnprintf() (Srinivas Pandruvada)"
      
      * tag 'thermal-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: intel: int340x: Add new line for UUID display
      77af1f2b
    • Linus Torvalds's avatar
      Merge tag 'pm-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · c551afcd
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "Fix three issues related to the ->fast_switch callback in the AMD
        P-state cpufreq driver (Gautham R. Shenoy and Wyes Karny)"
      
      * tag 'pm-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: amd-pstate: Update policy->cur in amd_pstate_adjust_perf()
        cpufreq: amd-pstate: Remove fast_switch_possible flag from active driver
        cpufreq: amd-pstate: Add ->fast_switch() callback
      c551afcd
    • Dave Jiang's avatar
      cxl: Explicitly initialize resources when media is not ready · 793a539a
      Dave Jiang authored
      When media is not ready do not assume that the capacity information from
      the identify command is valid, i.e. ->total_bytes
      ->partition_align_bytes ->{volatile,persistent}_only_bytes. Explicitly
      zero out the capacity resources and exit early.
      
      Given zero-init of those fields this patch is functionally equivalent to
      the prior state, but it improves readability and robustness going
      forward.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Link: https://lore.kernel.org/r/168506118166.3004974.13523455340007852589.stgit@djiang5-mobl3Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      793a539a