1. 13 Jun, 2022 2 commits
    • Jann Horn's avatar
      mm/slub: add missing TID updates on slab deactivation · eeaa345e
      Jann Horn authored
      The fastpath in slab_alloc_node() assumes that c->slab is stable as long as
      the TID stays the same. However, two places in __slab_alloc() currently
      don't update the TID when deactivating the CPU slab.
      
      If multiple operations race the right way, this could lead to an object
      getting lost; or, in an even more unlikely situation, it could even lead to
      an object being freed onto the wrong slab's freelist, messing up the
      `inuse` counter and eventually causing a page to be freed to the page
      allocator while it still contains slab objects.
      
      (I haven't actually tested these cases though, this is just based on
      looking at the code. Writing testcases for this stuff seems like it'd be
      a pain...)
      
      The race leading to state inconsistency is (all operations on the same CPU
      and kmem_cache):
      
       - task A: begin do_slab_free():
          - read TID
          - read pcpu freelist (==NULL)
          - check `slab == c->slab` (true)
       - [PREEMPT A->B]
       - task B: begin slab_alloc_node():
          - fastpath fails (`c->freelist` is NULL)
          - enter __slab_alloc()
          - slub_get_cpu_ptr() (disables preemption)
          - enter ___slab_alloc()
          - take local_lock_irqsave()
          - read c->freelist as NULL
          - get_freelist() returns NULL
          - write `c->slab = NULL`
          - drop local_unlock_irqrestore()
          - goto new_slab
          - slub_percpu_partial() is NULL
          - get_partial() returns NULL
          - slub_put_cpu_ptr() (enables preemption)
       - [PREEMPT B->A]
       - task A: finish do_slab_free():
          - this_cpu_cmpxchg_double() succeeds()
          - [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL]
      
      From there, the object on c->freelist will get lost if task B is allowed to
      continue from here: It will proceed to the retry_load_slab label,
      set c->slab, then jump to load_freelist, which clobbers c->freelist.
      
      But if we instead continue as follows, we get worse corruption:
      
       - task A: run __slab_free() on object from other struct slab:
          - CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial)
       - task A: run slab_alloc_node() with NUMA node constraint:
          - fastpath fails (c->slab is NULL)
          - call __slab_alloc()
          - slub_get_cpu_ptr() (disables preemption)
          - enter ___slab_alloc()
          - c->slab is NULL: goto new_slab
          - slub_percpu_partial() is non-NULL
          - set c->slab to slub_percpu_partial(c)
          - [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects
            from slab-2]
          - goto redo
          - node_match() fails
          - goto deactivate_slab
          - existing c->freelist is passed into deactivate_slab()
          - inuse count of slab-1 is decremented to account for object from
            slab-2
      
      At this point, the inuse count of slab-1 is 1 lower than it should be.
      This means that if we free all allocated objects in slab-1 except for one,
      SLUB will think that slab-1 is completely unused, and may free its page,
      leading to use-after-free.
      
      Fixes: c17dda40 ("slub: Separate out kmem_cache_cpu processing from deactivate_slab")
      Fixes: 03e404af ("slub: fast release on full slab")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Tested-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Link: https://lore.kernel.org/r/20220608182205.2945720-1-jannh@google.com
      eeaa345e
    • Sebastian Andrzej Siewior's avatar
      mm/slub: Move the stackdepot related allocation out of IRQ-off section. · c4cf6785
      Sebastian Andrzej Siewior authored
      The set_track() invocation in free_debug_processing() is invoked with
      acquired slab_lock(). The lock disables interrupts on PREEMPT_RT and
      this forbids to allocate memory which is done in stack_depot_save().
      
      Split set_track() into two parts: set_track_prepare() which allocate
      memory and set_track_update() which only performs the assignment of the
      trace data structure. Use set_track_prepare() before disabling
      interrupts.
      
      [ vbabka@suse.cz: make set_track() call set_track_update() instead of
        open-coded assignments ]
      
      Fixes: 5cf909c5 ("mm/slub: use stackdepot to save stack trace in objects")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Link: https://lore.kernel.org/r/Yp9sqoUi4fVa5ExF@linutronix.de
      c4cf6785
  2. 12 Jun, 2022 10 commits
  3. 11 Jun, 2022 9 commits
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 7a68065e
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
       "A set of fixes. Most address the new warning we emit at build time
        when irq chips are not immutable with some additional tweaks to
        gpio-crystalcove from Andy and a small tweak to gpio-dwapd.
      
         - make irq_chip structs immutable in several Diolan and intel drivers
           to get rid of the new warning we emit when fiddling with irq chips
      
         - don't print error messages on probe deferral in gpio-dwapb"
      
      * tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: dwapb: Don't print error on -EPROBE_DEFER
        gpio: dln2: make irq_chip immutable
        gpio: sch: make irq_chip immutable
        gpio: merrifield: make irq_chip immutable
        gpio: wcove: make irq_chip immutable
        gpio: crystalcove: Join function declarations and long lines
        gpio: crystalcove: Use specific type and API for IRQ number
        gpio: crystalcove: make irq_chip immutable
      7a68065e
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · cecb3540
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Driver fixes and and one core patch.
      
        Nine of the driver patches are minor fixes and reworks to lpfc and the
        rest are trivial and minor fixes elsewhere"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: pmcraid: Fix missing resource cleanup in error case
        scsi: ipr: Fix missing/incorrect resource cleanup in error case
        scsi: mpt3sas: Fix out-of-bounds compiler warning
        scsi: lpfc: Update lpfc version to 14.2.0.4
        scsi: lpfc: Allow reduced polling rate for nvme_admin_async_event cmd completion
        scsi: lpfc: Add more logging of cmd and cqe information for aborted NVMe cmds
        scsi: lpfc: Fix port stuck in bypassed state after LIP in PT2PT topology
        scsi: lpfc: Resolve NULL ptr dereference after an ELS LOGO is aborted
        scsi: lpfc: Address NULL pointer dereference after starget_to_rport()
        scsi: lpfc: Resolve some cleanup issues following SLI path refactoring
        scsi: lpfc: Resolve some cleanup issues following abort path refactoring
        scsi: lpfc: Correct BDE type for XMIT_SEQ64_WQE in lpfc_ct_reject_event()
        scsi: vmw_pvscsi: Expand vcpuHint to 16 bits
        scsi: sd: Fix interpretation of VPD B9h length
      cecb3540
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · abe71eb3
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "Fixes all over the place, most notably fixes for latent bugs in
        drivers that got exposed by suppressing interrupts before DRIVER_OK,
        which in turn has been done by 8b4ec69d ("virtio: harden vring
        IRQ")"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        um: virt-pci: set device ready in probe()
        vdpa: make get_vq_group and set_group_asid optional
        virtio: Fix all occurences of the "the the" typo
        vduse: Fix NULL pointer dereference on sysfs access
        vringh: Fix loop descriptors check in the indirect cases
        vdpa/mlx5: clean up indenting in handle_ctrl_vlan()
        vdpa/mlx5: fix error code for deleting vlan
        virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed
        vdpa/mlx5: Fix syntax errors in comments
        virtio-rng: make device ready before making request
      abe71eb3
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-5.19-1' of... · 0678afa6
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen.
       "Fix build errors and a stale comment"
      
      * tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Remove MIPS comment about cycle counter
        LoongArch: Fix copy_thread() build errors
        LoongArch: Fix the !CONFIG_SMP build
      0678afa6
    • Linus Torvalds's avatar
      iov_iter: fix build issue due to possible type mis-match · 1c27f1fc
      Linus Torvalds authored
      Commit 6c776766 ("iov_iter: Fix iter_xarray_get_pages{,_alloc}()")
      introduced a problem on some 32-bit architectures (at least arm, xtensa,
      csky,sparc and mips), that have a 'size_t' that is 'unsigned int'.
      
      The reason is that we now do
      
          min(nr * PAGE_SIZE - offset, maxsize);
      
      where 'nr' and 'offset' and both 'unsigned int', and PAGE_SIZE is
      'unsigned long'.  As a result, the normal C type rules means that the
      first argument to 'min()' ends up being 'unsigned long'.
      
      In contrast, 'maxsize' is of type 'size_t'.
      
      Now, 'size_t' and 'unsigned long' are always the same physical type in
      the kernel, so you'd think this doesn't matter, and from an actual
      arithmetic standpoint it doesn't.
      
      But on 32-bit architectures 'size_t' is commonly 'unsigned int', even if
      it could also be 'unsigned long'.  In that situation, both are unsigned
      32-bit types, but they are not the *same* type.
      
      And as a result 'min()' will complain about the distinct types (ignore
      the "pointer types" part of the error message: that's an artifact of the
      way we have made 'min()' check types for being the same):
      
        lib/iov_iter.c: In function 'iter_xarray_get_pages':
        include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
           20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
              |                                   ^~
        lib/iov_iter.c:1464:16: note: in expansion of macro 'min'
         1464 |         return min(nr * PAGE_SIZE - offset, maxsize);
              |                ^~~
      
      This was not visible on 64-bit architectures (where we always define
      'size_t' to be 'unsigned long').
      
      Force these cases to use 'min_t(size_t, x, y)' to make the type explicit
      and avoid the issue.
      
      [ Nit-picky note: technically 'size_t' doesn't have to match 'unsigned
        long' arithmetically. We've certainly historically seen environments
        with 16-bit address spaces and 32-bit 'unsigned long'.
      
        Similarly, even in 64-bit modern environments, 'size_t' could be its
        own type distinct from 'unsigned long', even if it were arithmetically
        identical.
      
        So the above type commentary is only really descriptive of the kernel
        environment, not some kind of universal truth for the kinds of wild
        and crazy situations that are allowed by the C standard ]
      Reported-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lore.kernel.org/all/YqRyL2sIqQNDfky2@debian/
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c27f1fc
    • Jason A. Donenfeld's avatar
      wireguard: selftests: use maximum cpu features and allow rng seeding · 17b0128a
      Jason A. Donenfeld authored
      By forcing the maximum CPU that QEMU has available, we expose additional
      capabilities, such as the RNDR instruction, which increases test
      coverage. This then allows the CI to skip the fake seeding step in some
      cases. Also enable STRICT_KERNEL_RWX to catch issues related to early
      jump labels when the RNG is initialized at boot.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      17b0128a
    • Kuan-Ying Lee's avatar
      scripts/gdb: change kernel config dumping method · 1f7a6cf6
      Kuan-Ying Lee authored
      MAGIC_START("IKCFG_ST") and MAGIC_END("IKCFG_ED") are moved out
      from the kernel_config_data variable.
      
      Thus, we parse kernel_config_data directly instead of considering
      offset of MAGIC_START and MAGIC_END.
      
      Fixes: 13610aa9 ("kernel/configs: use .incbin directive to embed config_data.gz")
      Signed-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      1f7a6cf6
    • Vincent Whitchurch's avatar
      um: virt-pci: set device ready in probe() · eacea844
      Vincent Whitchurch authored
      Call virtio_device_ready() to make this driver work after commit
      b4ec69d7e09 ("virtio: harden vring IRQ"), since the driver uses the
      virtqueues in the probe function.  (The virtio core sets the device
      ready when probe returns.)
      
      Fixes: 8b4ec69d ("virtio: harden vring IRQ")
      Fixes: 68f5d3f3 ("um: add PCI over virtio emulation driver")
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Message-Id: <20220610151203.3492541-1-vincent.whitchurch@axis.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      eacea844
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 0885eacd
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
       "Notable changes:
      
         - There is now a backup maintainer for NFSD
      
        Notable fixes:
      
         - Prevent array overruns in svc_rdma_build_writes()
      
         - Prevent buffer overruns when encoding NFSv3 READDIR results
      
         - Fix a potential UAF in nfsd_file_put()"
      
      * tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()
        SUNRPC: Clean up xdr_get_next_encode_buffer()
        SUNRPC: Clean up xdr_commit_encode()
        SUNRPC: Optimize xdr_reserve_space()
        SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
        SUNRPC: Trap RDMA segment overflows
        NFSD: Fix potential use-after-free in nfsd_file_put()
        MAINTAINERS: reciprocal co-maintainership for file locking and nfsd
      0885eacd
  4. 10 Jun, 2022 19 commits