1. 20 Dec, 2023 4 commits
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2023-12-19' of https://evilpiepirate.org/git/bcachefs · 74d8fc2b
      Linus Torvalds authored
      Pull more bcachefs fixes from Kent Overstreet:
      
       - Fix a deadlock in the data move path with nocow locks (vs. update in
         place writes); when trylock failed we were incorrectly waiting for in
         flight ios to flush.
      
       - Fix reporting of NFS file handle length
      
       - Fix early error path in bch2_fs_alloc() - list head wasn't being
         initialized early enough
      
       - Make sure correct (hardware accelerated) crc modules get loaded
      
       - Fix a rare overflow in the btree split path, when the packed bkey
         format grows and all the keys have no value (LRU btree).
      
       - Fix error handling in the sector allocator
      
         This was causing writes to spuriously fail in multidevice setups, and
         another bug meant that the errors weren't being logged, only reported
         via fsync.
      
      * tag 'bcachefs-2023-12-19' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: Fix bch2_alloc_sectors_start_trans() error handling
        bcachefs; guard against overflow in btree node split
        bcachefs: btree_node_u64s_with_format() takes nr keys
        bcachefs: print explicit recovery pass message only once
        bcachefs: improve modprobe support by providing softdeps
        bcachefs: fix invalid memory access in bch2_fs_alloc() error path
        bcachefs: Fix determining required file handle length
        bcachefs: Fix nocow locks deadlock
      74d8fc2b
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · ac1c13e2
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Address a few recently-introduced issues
      
      * tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: Revert 5f7fc5d6
        NFSD: Revert 738401a9
        NFSD: Revert 6c41d9a9
        nfsd: hold nfsd_mutex across entire netlink operation
        nfsd: call nfsd_last_thread() before final nfsd_put()
      ac1c13e2
    • Linus Torvalds's avatar
      Merge tag 'dm-6.7/dm-fixes-3' of... · 0a7a93d9
      Linus Torvalds authored
      Merge tag 'dm-6.7/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - DM raid target (and MD raid) fix for reconfig_mutex MD deadlock that
         should have been merged along with recent v6.7-rc6 MD fixes (see MD
         related commits: f2d87a75^..b3911334)
      
       - DM integrity target fix to avoid modifying immutable biovec in the
         integrity_metadata() edge case where kmalloc fails.
      
       - Fix drivers/md/Kconfig so DM_AUDIT depends on BLK_DEV_DM.
      
       - Update DM entry in MAINTAINERS to remove stale info.
      
      * tag 'dm-6.7/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        MAINTAINERS: remove stale info for DEVICE-MAPPER
        dm audit: fix Kconfig so DM_AUDIT depends on BLK_DEV_DM
        dm-integrity: don't modify bio's immutable bio_vec in integrity_metadata()
        dm-raid: delay flushing event_work() after reconfig_mutex is released
      0a7a93d9
    • Kent Overstreet's avatar
      bcachefs: Fix bch2_alloc_sectors_start_trans() error handling · 247ce5f1
      Kent Overstreet authored
      When we fail to allocate because of insufficient open buckets, we don't
      want to retry from the full set of devices - we just want to retry in
      blocking mode.
      
      But if the retry in blocking mode fails with a different error code, we
      end up squashing the -BCH_ERR_open_buckets_empty error with an error
      that makes us thing we won't be able to allocate (insufficient_devices)
      - which is incorrect when we didn't try to allocate from the full set of
      devices, and causes the write to fail.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      247ce5f1
  2. 19 Dec, 2023 6 commits
    • Kent Overstreet's avatar
    • Kent Overstreet's avatar
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 55cb5f43
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "While working on the ring buffer, I found one more bug with the
        timestamp code, and the fix for this removed the need for the final
        64-bit cmpxchg!
      
        The ring buffer events hold a "delta" from the previous event. If it
        is determined that the delta can not be calculated, it falls back to
        adding an absolute timestamp value. The way to know if the delta can
        be used is via two stored timestamps in the per-cpu buffer meta data:
      
         before_stamp and write_stamp
      
        The before_stamp is written by every event before it tries to allocate
        its space on the ring buffer. The write_stamp is written after it
        allocates its space and knows that nothing came in after it read the
        previous before_stamp and write_stamp and the two matched.
      
        A previous fix dd939425 ("ring-buffer: Do not try to put back
        write_stamp") removed putting back the write_stamp to match the
        before_stamp so that the next event could use the delta, but races
        were found where the two would match, but not be for of the previous
        event.
      
        It was determined to allow the event reservation to not have a valid
        write_stamp when it is finished, and this fixed a lot of races.
      
        The last use of the 64-bit timestamp cmpxchg depended on the
        write_stamp being valid after an interruption. But this is no longer
        the case, as if an event is interrupted by a softirq that writes an
        event, and that event gets interrupted by a hardirq or NMI and that
        writes an event, then the softirq could finish its reservation without
        a valid write_stamp.
      
        In the slow path of the event reservation, a delta can still be used
        if the write_stamp is valid. Instead of using a cmpxchg against the
        write stamp, the before_stamp needs to be read again to validate the
        write_stamp. The cmpxchg is not needed.
      
        This updates the slowpath to validate the write_stamp by comparing it
        to the before_stamp and removes all rb_time_cmpxchg() as there are no
        more users of that function.
      
        The removal of the 32-bit updates of rb_time_t will be done in the
        next merge window"
      
      * tag 'trace-v6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Fix slowpath of interrupted event
      55cb5f43
    • Linus Torvalds's avatar
      Merge tag 'arc-6.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 9c749e61
      Linus Torvalds authored
      Pull ARC fixes from Vineet Gupta:
      
       - build error for hugetlb, sparse and smatch fixes
      
       - removal of VIPT aliasing cache code
      
      * tag 'arc-6.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: add hugetlb definitions
        ARC: fix smatch warning
        ARC: fix spare error
        ARC: mm: retire support for aliasing VIPT D$
        ARC: entry: move ARCompact specific bits out of entry.h
        ARC: entry: SAVE_ABI_CALLEE_REG: ISA/ABI specific helper
      9c749e61
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix slowpath of interrupted event · b803d7c6
      Steven Rostedt (Google) authored
      To synchronize the timestamps with the ring buffer reservation, there are
      two timestamps that are saved in the buffer meta data.
      
      1. before_stamp
      2. write_stamp
      
      When the two are equal, the write_stamp is considered valid, as in, it may
      be used to calculate the delta of the next event as the write_stamp is the
      timestamp of the previous reserved event on the buffer.
      
      This is done by the following:
      
       /*A*/	w = current position on the ring buffer
      	before = before_stamp
      	after = write_stamp
      	ts = read current timestamp
      
      	if (before != after) {
      		write_stamp is not valid, force adding an absolute
      		timestamp.
      	}
      
       /*B*/	before_stamp = ts
      
       /*C*/	write = local_add_return(event length, position on ring buffer)
      
      	if (w == write - event length) {
      		/* Nothing interrupted between A and C */
       /*E*/		write_stamp = ts;
      		delta = ts - after
      		/*
      		 * If nothing interrupted again,
      		 * before_stamp == write_stamp and write_stamp
      		 * can be used to calculate the delta for
      		 * events that come in after this one.
      		 */
      	} else {
      
      		/*
      		 * The slow path!
      		 * Was interrupted between A and C.
      		 */
      
      This is the place that there's a bug. We currently have:
      
      		after = write_stamp
      		ts = read current timestamp
      
       /*F*/		if (write == current position on the ring buffer &&
      		    after < ts && cmpxchg(write_stamp, after, ts)) {
      
      			delta = ts - after;
      
      		} else {
      			delta = 0;
      		}
      
      The assumption is that if the current position on the ring buffer hasn't
      moved between C and F, then it also was not interrupted, and that the last
      event written has a timestamp that matches the write_stamp. That is the
      write_stamp is valid.
      
      But this may not be the case:
      
      If a task context event was interrupted by softirq between B and C.
      
      And the softirq wrote an event that got interrupted by a hard irq between
      C and E.
      
      and the hard irq wrote an event (does not need to be interrupted)
      
      We have:
      
       /*B*/ before_stamp = ts of normal context
      
         ---> interrupted by softirq
      
      	/*B*/ before_stamp = ts of softirq context
      
      	  ---> interrupted by hardirq
      
      		/*B*/ before_stamp = ts of hard irq context
      		/*E*/ write_stamp = ts of hard irq context
      
      		/* matches and write_stamp valid */
      	  <----
      
      	/*E*/ write_stamp = ts of softirq context
      
      	/* No longer matches before_stamp, write_stamp is not valid! */
      
         <---
      
       w != write - length, go to slow path
      
      // Right now the order of events in the ring buffer is:
      //
      // |-- softirq event --|-- hard irq event --|-- normal context event --|
      //
      
       after = write_stamp (this is the ts of softirq)
       ts = read current timestamp
      
       if (write == current position on the ring buffer [true] &&
           after < ts [true] && cmpxchg(write_stamp, after, ts) [true]) {
      
      	delta = ts - after  [Wrong!]
      
      The delta is to be between the hard irq event and the normal context
      event, but the above logic made the delta between the softirq event and
      the normal context event, where the hard irq event is between the two. This
      will shift all the remaining event timestamps on the sub-buffer
      incorrectly.
      
      The write_stamp is only valid if it matches the before_stamp. The cmpxchg
      does nothing to help this.
      
      Instead, the following logic can be done to fix this:
      
      	before = before_stamp
      	ts = read current timestamp
      	before_stamp = ts
      
      	after = write_stamp
      
      	if (write == current position on the ring buffer &&
      	    after == before && after < ts) {
      
      		delta = ts - after
      
      	} else {
      		delta = 0;
      	}
      
      The above will only use the write_stamp if it still matches before_stamp
      and was tested to not have changed since C.
      
      As a bonus, with this logic we do not need any 64-bit cmpxchg() at all!
      
      This means the 32-bit rb_time_t workaround can finally be removed. But
      that's for a later time.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231218175229.58ec3daf@gandalf.local.home/
      Link: https://lore.kernel.org/linux-trace-kernel/20231218230712.3a76b081@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Fixes: dd939425 ("ring-buffer: Do not try to put back write_stamp")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b803d7c6
    • Linus Torvalds's avatar
      Merge tag 'hid-for-linus-2023121901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 3f10e214
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - fix for division by zero in Nintendo driver when generic joycon is
         attached, reported and fixed by SteamOS folks (Guilherme G. Piccoli)
      
       - GCC-7 build fix (which is a good cleanup anyway) for Nintendo driver
         (Ryan McClelland)
      
      * tag 'hid-for-linus-2023121901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: nintendo: Prevent divide-by-zero on code
        HID: nintendo: fix initializer element is not constant error
      3f10e214
  3. 18 Dec, 2023 12 commits
  4. 17 Dec, 2023 10 commits
  5. 16 Dec, 2023 3 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 3b8a9b2e
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix eventfs to check creating new files for events with names greater
         than NAME_MAX. The eventfs lookup needs to check the return result of
         simple_lookup().
      
       - Fix the ring buffer to check the proper max data size. Events must be
         able to fit on the ring buffer sub-buffer, if it cannot, then it
         fails to be written and the logic to add the event is avoided. The
         code to check if an event can fit failed to add the possible absolute
         timestamp which may make the event not be able to fit. This causes
         the ring buffer to go into an infinite loop trying to find a
         sub-buffer that would fit the event. Luckily, there's a check that
         will bail out if it looped over a 1000 times and it also warns.
      
         The real fix is not to add the absolute timestamp to an event that is
         starting at the beginning of a sub-buffer because it uses the
         sub-buffer timestamp.
      
         By avoiding the timestamp at the start of the sub-buffer allows
         events that pass the first check to always find a sub-buffer that it
         can fit on.
      
       - Have large events that do not fit on a trace_seq to print "LINE TOO
         BIG" like it does for the trace_pipe instead of what it does now
         which is to silently drop the output.
      
       - Fix a memory leak of forgetting to free the spare page that is saved
         by a trace instance.
      
       - Update the size of the snapshot buffer when the main buffer is
         updated if the snapshot buffer is allocated.
      
       - Fix ring buffer timestamp logic by removing all the places that tried
         to put the before_stamp back to the write stamp so that the next
         event doesn't add an absolute timestamp. But each of these updates
         added a race where by making the two timestamp equal, it was
         validating the write_stamp so that it can be incorrectly used for
         calculating the delta of an event.
      
       - There's a temp buffer used for printing the event that was using the
         event data size for allocation when it needed to use the size of the
         entire event (meta-data and payload data)
      
       - For hardening, use "%.*s" for printing the trace_marker output, to
         limit the amount that is printed by the size of the event. This was
         discovered by development that added a bug that truncated the '\0'
         and caused a crash.
      
       - Fix a use-after-free bug in the use of the histogram files when an
         instance is being removed.
      
       - Remove a useless update in the rb_try_to_discard of the write_stamp.
         The before_stamp was already changed to force the next event to add
         an absolute timestamp that the write_stamp is not used. But the
         write_stamp is modified again using an unneeded 64-bit cmpxchg.
      
       - Fix several races in the 32-bit implementation of the
         rb_time_cmpxchg() that does a 64-bit cmpxchg.
      
       - While looking at fixing the 64-bit cmpxchg, I noticed that because
         the ring buffer uses normal cmpxchg, and this can be done in NMI
         context, there's some architectures that do not have a working
         cmpxchg in NMI context. For these architectures, fail recording
         events that happen in NMI context.
      
      * tag 'trace-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI
        ring-buffer: Have rb_time_cmpxchg() set the msb counter too
        ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()
        ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs
        ring-buffer: Remove useless update to write_stamp in rb_try_to_discard()
        ring-buffer: Do not try to put back write_stamp
        tracing: Fix uaf issue when open the hist or hist_debug file
        tracing: Add size check when printing trace_marker output
        ring-buffer: Have saved event hold the entire event
        ring-buffer: Do not update before stamp when switching sub-buffers
        tracing: Update snapshot buffer on resize if it is allocated
        ring-buffer: Fix memory leak of free page
        eventfs: Fix events beyond NAME_MAX blocking tasks
        tracing: Have large events show up as '[LINE TOO BIG]' instead of nothing
        ring-buffer: Fix writing to the buffer with max_data_size
      3b8a9b2e
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · c8e97fc6
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Arm CMN perf: fix the DTC allocation failure path which can end up
         erroneously clearing live counters
      
       - arm64/mm: fix hugetlb handling of the dirty page state leading to a
         continuous fault loop in user on hardware without dirty bit
         management (DBM). That's caused by the dirty+writeable information
         not being properly preserved across a series of mprotect(PROT_NONE),
         mprotect(PROT_READ|PROT_WRITE)
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
        perf/arm-cmn: Fail DTC counter allocation correctly
      c8e97fc6
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 2e3f280b
      Linus Torvalds authored
      Pull pci fixes from Bjorn Helgaas:
      
       - Limit Max_Read_Request_Size (MRRS) on some MIPS Loongson systems
         because they don't all support MRRS > 256, and firmware doesn't
         always initialize it correctly, which meant some PCIe devices didn't
         work (Jiaxun Yang)
      
       - Add and use pci_enable_link_state_locked() to prevent potential
         deadlocks in vmd and qcom drivers (Johan Hovold)
      
       - Revert recent (v6.5) acpiphp resource assignment changes that fixed
         issues with hot-adding devices on a root bus or with large BARs, but
         introduced new issues with GPU initialization and hot-adding SCSI
         disks in QEMU VMs and (Bjorn Helgaas)
      
      * tag 'pci-v6.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
        PCI/ASPM: Add pci_disable_link_state_locked() lockdep assert
        PCI/ASPM: Clean up __pci_disable_link_state() 'sem' parameter
        PCI: qcom: Clean up ASPM comment
        PCI: qcom: Fix potential deadlock when enabling ASPM
        PCI: vmd: Fix potential deadlock when enabling ASPM
        PCI/ASPM: Add pci_enable_link_state_locked()
        PCI: loongson: Limit MRRS to 256
      2e3f280b
  6. 15 Dec, 2023 5 commits