1. 02 Nov, 2023 1 commit
    • Kent Overstreet's avatar
      bcachefs: rebalance_work · fb3f57bb
      Kent Overstreet authored
      This adds a new btree, rebalance_work, to eliminate scanning required
      for finding extents that need work done on them in the background - i.e.
      for the background_target and background_compression options.
      
      rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
      extent in the extents or reflink btree at the same pos.
      
      A new extent field is added, bch_extent_rebalance, which indicates that
      this extent has work that needs to be done in the background - and which
      options to use. This allows per-inode options to be propagated to
      indirect extents - at least in some circumstances. In this patch,
      changing IO options on a file will not propagate the new options to
      indirect extents pointed to by that file.
      
      Updating (setting/clearing) the rebalance_work btree is done by the
      extent trigger, which looks at the bch_extent_rebalance field.
      
      Scanning is still requrired after changing IO path options - either just
      for a given inode, or for the whole filesystem. We indicate that
      scanning is required by adding a KEY_TYPE_cookie key to the
      rebalance_work btree: the cookie counter is so that we can detect that
      scanning is still required when an option has been flipped mid-way
      through an existing scan.
      
      Future possible work:
       - Propagate options to indirect extents when being changed
       - Add other IO path options - nr_replicas, ec, to rebalance_work so
         they can be applied in the background when they change
       - Add a counter, for bcachefs fs usage output, showing the pending
         amount of rebalance work: we'll probably want to do this after the
         disk space accounting rewrite (moving it to a new btree)
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      fb3f57bb
  2. 31 Oct, 2023 27 commits
  3. 30 Oct, 2023 12 commits
    • Linus Torvalds's avatar
      Merge tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cd063c8b
      Linus Torvalds authored
      Pull objtool updates from Ingo Molnar:
       "Misc fixes and cleanups:
      
         - Fix potential MAX_NAME_LEN limit related build failures
      
         - Fix scripts/faddr2line symbol filtering bug
      
         - Fix scripts/faddr2line on LLVM=1
      
         - Fix scripts/faddr2line to accept readelf output with mapping
           symbols
      
         - Minor cleanups"
      
      * tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        scripts/faddr2line: Skip over mapping symbols in output from readelf
        scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1
        scripts/faddr2line: Don't filter out non-function symbols from readelf
        objtool: Remove max symbol name length limitation
        objtool: Propagate early errors
        objtool: Use 'the fallthrough' pseudo-keyword
        x86/speculation, objtool: Use absolute relocations for annotations
        x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
      cd063c8b
    • Linus Torvalds's avatar
      Merge tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 63ce50ff
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "Fair scheduler (SCHED_OTHER) improvements:
         - Remove the old and now unused SIS_PROP code & option
         - Scan cluster before LLC in the wake-up path
         - Use candidate prev/recent_used CPU if scanning failed for cluster
           wakeup
      
        NUMA scheduling improvements:
         - Improve the VMA access-PID code to better skip/scan VMAs
         - Extend tracing to cover VMA-skipping decisions
         - Improve/fix the recently introduced sched_numa_find_nth_cpu() code
         - Generalize numa_map_to_online_node()
      
        Energy scheduling improvements:
         - Remove the EM_MAX_COMPLEXITY limit
         - Add tracepoints to track energy computation
         - Make the behavior of the 'sched_energy_aware' sysctl more
           consistent
         - Consolidate and clean up access to a CPU's max compute capacity
         - Fix uclamp code corner cases
      
        RT scheduling improvements:
         - Drive dl_rq->overloaded with dl_rq->pushable_dl_tasks updates
         - Drive the ->rto_mask with rt_rq->pushable_tasks updates
      
        Scheduler scalability improvements:
         - Rate-limit updates to tg->load_avg
         - On x86 disable IBRS when CPU is offline to improve single-threaded
           performance
         - Micro-optimize in_task() and in_interrupt()
         - Micro-optimize the PSI code
         - Avoid updating PSI triggers and ->rtpoll_total when there are no
           state changes
      
        Core scheduler infrastructure improvements:
         - Use saved_state to reduce some spurious freezer wakeups
         - Bring in a handful of fast-headers improvements to scheduler
           headers
         - Make the scheduler UAPI headers more widely usable by user-space
         - Simplify the control flow of scheduler syscalls by using lock
           guards
         - Fix sched_setaffinity() vs. CPU hotplug race
      
        Scheduler debuggability improvements:
         - Disallow writing invalid values to sched_rt_period_us
         - Fix a race in the rq-clock debugging code triggering warnings
         - Fix a warning in the bandwidth distribution code
         - Micro-optimize in_atomic_preempt_off() checks
         - Enforce that the tasklist_lock is held in for_each_thread()
         - Print the TGID in sched_show_task()
         - Remove the /proc/sys/kernel/sched_child_runs_first sysctl
      
        ... and misc cleanups & fixes"
      
      * tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
        sched/fair: Remove SIS_PROP
        sched/fair: Use candidate prev/recent_used CPU if scanning failed for cluster wakeup
        sched/fair: Scan cluster before scanning LLC in wake-up path
        sched: Add cpus_share_resources API
        sched/core: Fix RQCF_ACT_SKIP leak
        sched/fair: Remove unused 'curr' argument from pick_next_entity()
        sched/nohz: Update comments about NEWILB_KICK
        sched/fair: Remove duplicate #include
        sched/psi: Update poll => rtpoll in relevant comments
        sched: Make PELT acronym definition searchable
        sched: Fix stop_one_cpu_nowait() vs hotplug
        sched/psi: Bail out early from irq time accounting
        sched/topology: Rename 'DIE' domain to 'PKG'
        sched/psi: Delete the 'update_total' function parameter from update_triggers()
        sched/psi: Avoid updating PSI triggers and ->rtpoll_total when there are no state changes
        sched/headers: Remove comment referring to rq::cpu_load, since this has been removed
        sched/numa: Complete scanning of inactive VMAs when there is no alternative
        sched/numa: Complete scanning of partial VMAs regardless of PID activity
        sched/numa: Move up the access pid reset logic
        sched/numa: Trace decisions related to skipping VMAs
        ...
      63ce50ff
    • Linus Torvalds's avatar
      Merge tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3cf3fabc
      Linus Torvalds authored
      Pull locking updates from Info Molnar:
       "Futex improvements:
      
         - Add the 'futex2' syscall ABI, which is an attempt to get away from
           the multiplex syscall and adds a little room for extentions, while
           lifting some limitations.
      
         - Fix futex PI recursive rt_mutex waiter state bug
      
         - Fix inter-process shared futexes on no-MMU systems
      
         - Use folios instead of pages
      
        Micro-optimizations of locking primitives:
      
         - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
           architectures, to improve lockref code generation
      
         - Improve the x86-32 lockref_get_not_zero() main loop by adding
           build-time CMPXCHG8B support detection for the relevant lockref
           code, and by better interfacing the CMPXCHG8B assembly code with
           the compiler
      
         - Introduce arch_sync_try_cmpxchg() on x86 to improve
           sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
           users to sync_try_cmpxchg().
      
         - Micro-optimize rcuref_put_slowpath()
      
        Locking debuggability improvements:
      
         - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
      
         - Enforce atomicity of sched_submit_work(), which is de-facto atomic
           but was un-enforced previously.
      
         - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
           semantics
      
         - Fix ww_mutex self-tests
      
         - Clean up const-propagation in <linux/seqlock.h> and simplify the
           API-instantiation macros a bit
      
        RT locking improvements:
      
         - Provide the rt_mutex_*_schedule() primitives/helpers and use them
           in the rtmutex code to avoid recursion vs. rtlock on the PI state.
      
         - Add nested blocking lockdep asserts to rt_mutex_lock(),
           rtlock_lock() and rwbase_read_lock()
      
        .. plus misc fixes & cleanups"
      
      * tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
        futex: Don't include process MM in futex key on no-MMU
        locking/seqlock: Fix grammar in comment
        alpha: Fix up new futex syscall numbers
        locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
        locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
        locking/seqlock: Change __seqprop() to return the function pointer
        locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
        locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
        locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
        locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
        locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
        locking/seqlock: Fix typo in comment
        futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
        locking/local, arch: Rewrite local_add_unless() as a static inline function
        locking/debug: Fix debugfs API return value checks to use IS_ERR()
        locking/ww_mutex/test: Make sure we bail out instead of livelock
        locking/ww_mutex/test: Fix potential workqueue corruption
        locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
        futex: Add sys_futex_requeue()
        futex: Add flags2 argument to futex_requeue()
        ...
      3cf3fabc
    • Linus Torvalds's avatar
      Merge tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9cda4eb0
      Linus Torvalds authored
      Pull x86 fpu fixlet from Borislav Petkov:
      
       - kernel-doc fix
      
      * tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu/xstate: Address kernel-doc warning
      9cda4eb0
    • Linus Torvalds's avatar
      Merge tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f155f3b3
      Linus Torvalds authored
      Pull x86 platform updates from Borislav Petkov:
      
       - Make sure PCI function 4 IDs of AMD family 0x19, models 0x60-0x7f are
         actually used in the amd_nb.c enumeration
      
       - Add support for extracting NUMA information from devicetree for
         Hyper-V usages
      
       - Add PCI device IDs for the new AMD MI300 AI accelerators
      
       - Annotate an array in struct uv_rtc_timer_head with the new
         __counted_by attribute
      
       - Rework UV's NMI action parameter handling
      
      * tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDs
        x86/numa: Add Devicetree support
        x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()
        x86/amd_nb: Add AMD Family MI300 PCI IDs
        x86/platform/uv: Annotate struct uv_rtc_timer_head with __counted_by
        x86/platform/uv: Rework NMI "action" modparam handling
      f155f3b3
    • Linus Torvalds's avatar
      Merge tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ca2e9c3b
      Linus Torvalds authored
      Pull x86 cpuid updates from Borislav Petkov:
      
       - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when
         virtualization support is disabled in the BIOS on AMD and Hygon
         platforms
      
       - A minor cleanup
      
      * tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu/amd: Remove redundant 'break' statement
        x86/cpu: Clear SVM feature if disabled by BIOS
      ca2e9c3b
    • Linus Torvalds's avatar
      Merge tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9ab021a1
      Linus Torvalds authored
      Pull x86 resource control updates from Borislav Petkov:
      
       - Add support for non-contiguous capacity bitmasks being added to
         Intel's CAT implementation
      
       - Other improvements to resctrl code: better configuration,
         simplifications, debugging support, fixes
      
      * tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/resctrl: Display RMID of resource group
        x86/resctrl: Add support for the files of MON groups only
        x86/resctrl: Display CLOSID for resource group
        x86/resctrl: Introduce "-o debug" mount option
        x86/resctrl: Move default group file creation to mount
        x86/resctrl: Unwind properly from rdt_enable_ctx()
        x86/resctrl: Rename rftype flags for consistency
        x86/resctrl: Simplify rftype flag definitions
        x86/resctrl: Add multiple tasks to the resctrl group at once
        Documentation/x86: Document resctrl's new sparse_masks
        x86/resctrl: Add sparse_masks file in info
        x86/resctrl: Enable non-contiguous CBMs in Intel CAT
        x86/resctrl: Rename arch_has_sparse_bitmaps
        x86/resctrl: Fix remaining kernel-doc warnings
      9ab021a1
    • Linus Torvalds's avatar
      Merge tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f84a52ee
      Linus Torvalds authored
      Pull x86 hw mitigation updates from Borislav Petkov:
      
       - A bunch of improvements, cleanups and fixlets to the SRSO mitigation
         machinery and other, general cleanups to the hw mitigations code, by
         Josh Poimboeuf
      
       - Improve the return thunk detection by objtool as it is absolutely
         important that the default return thunk is not used after returns
         have been patched. Future work to detect and report this better is
         pending
      
       - Other misc cleanups and fixes
      
      * tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
        x86/retpoline: Document some thunk handling aspects
        x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN
        x86/callthunks: Delete unused "struct thunk_desc"
        x86/vdso: Run objtool on vdso32-setup.o
        objtool: Fix return thunk patching in retpolines
        x86/srso: Remove unnecessary semicolon
        x86/pti: Fix kernel warnings for pti= and nopti cmdline options
        x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()
        x86/nospec: Refactor UNTRAIN_RET[_*]
        x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macros
        x86/srso: Disentangle rethunk-dependent options
        x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block
        x86/bugs: Remove default case for fully switched enums
        x86/srso: Remove 'pred_cmd' label
        x86/srso: Unexport untraining functions
        x86/srso: Improve i-cache locality for alias mitigation
        x86/srso: Fix unret validation dependencies
        x86/srso: Fix vulnerability reporting for missing microcode
        x86/srso: Print mitigation for retbleed IBPB case
        x86/srso: Print actual mitigation if requested mitigation isn't possible
        ...
      f84a52ee
    • Linus Torvalds's avatar
      Merge tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 01ae815c
      Linus Torvalds authored
      Pull x86 RAS updates from Borislav Petkov:
      
       - Specify what error addresses reported on AMD are actually usable
         memory error addresses for further decoding
      
      * tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Cleanup mce_usable_address()
        x86/mce: Define amd_mce_usable_address()
        x86/MCE/AMD: Split amd_mce_is_memory_error()
      01ae815c
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 66cc8838
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
      
       - A new EDAC driver for Xilinx's Versal integrated memory controller
      
      * tag 'edac_updates_for_v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/versal: Add a Xilinx Versal memory controller driver
        dt-bindings: memory-controllers: Add support for Xilinx Versal EDAC for DDRMC
      66cc8838
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2023-10-30' of https://evilpiepirate.org/git/bcachefs · 9e877052
      Linus Torvalds authored
      Pull initial bcachefs updates from Kent Overstreet:
       "Here's the bcachefs filesystem pull request.
      
        One new patch since last week: the exportfs constants ended up
        conflicting with other filesystems that are also getting added to the
        global enum, so switched to new constants picked by Amir.
      
        The only new non fs/bcachefs/ patch is the objtool patch that adds
        bcachefs functions to the list of noreturns. The patch that exports
        osq_lock() has been dropped for now, per Ingo"
      
      * tag 'bcachefs-2023-10-30' of https://evilpiepirate.org/git/bcachefs: (2781 commits)
        exportfs: Change bcachefs fid_type enum to avoid conflicts
        bcachefs: Refactor memcpy into direct assignment
        bcachefs: Fix drop_alloc_keys()
        bcachefs: snapshot_create_lock
        bcachefs: Fix snapshot skiplists during snapshot deletion
        bcachefs: bch2_sb_field_get() refactoring
        bcachefs: KEY_TYPE_error now counts towards i_sectors
        bcachefs: Fix handling of unknown bkey types
        bcachefs: Switch to unsafe_memcpy() in a few places
        bcachefs: Use struct_size()
        bcachefs: Correctly initialize new buckets on device resize
        bcachefs: Fix another smatch complaint
        bcachefs: Use strsep() in split_devs()
        bcachefs: Add iops fields to bch_member
        bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1
        bcachefs: New superblock section members_v2
        bcachefs: Add new helper to retrieve bch_member from sb
        bcachefs: bucket_lock() is now a sleepable lock
        bcachefs: fix crc32c checksum merge byte order problem
        bcachefs: Fix bch2_inode_delete_keys()
        ...
      9e877052
    • Linus Torvalds's avatar
      Merge tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · d5acbc60
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "New features:
      
         - raid-stripe-tree
      
           New tree for logical file extent mapping where the physical mapping
           may not match on multiple devices. This is now used in zoned mode
           to implement RAID0/RAID1* profiles, but can be used in non-zoned
           mode as well. The support for RAID56 is in development and will
           eventually fix the problems with the current implementation. This
           is a backward incompatible feature and has to be enabled at mkfs
           time.
      
         - simple quota accounting (squota)
      
           A simplified mode of qgroup that accounts all space on the initial
           extent owners (a subvolume), the snapshots are then cheap to create
           and delete. The deletion of snapshots in fully accounting qgroups
           is a known CPU/IO performance bottleneck.
      
           The squota is not suitable for the general use case but works well
           for containers where the original subvolume exists for the whole
           time. This is a backward incompatible feature as it needs extending
           some structures, but can be enabled on an existing filesystem.
      
         - temporary filesystem fsid (temp_fsid)
      
           The fsid identifies a filesystem and is hard coded in the
           structures, which disallows mounting the same fsid found on
           different devices.
      
           For a single device filesystem this is not strictly necessary, a
           new temporary fsid can be generated on mount e.g. after a device is
           cloned. This will be used by Steam Deck for root partition A/B
           testing, or can be used for VM root images.
      
        Other user visible changes:
      
         - filesystems with partially finished metadata_uuid conversion cannot
           be mounted anymore and the uuid fixup has to be done by btrfs-progs
           (btrfstune).
      
        Performance improvements:
      
         - reduce reservations for checksum deletions (with enabled free space
           tree by factor of 4), on a sample workload on file with many
           extents the deletion time decreased by 12%
      
         - make extent state merges more efficient during insertions, reduce
           rb-tree iterations (run time of critical functions reduced by 5%)
      
        Core changes:
      
         - the integrity check functionality has been removed, this was a
           debugging feature and removal does not affect other integrity
           checks like checksums or tree-checker
      
         - space reservation changes:
      
            - more efficient delayed ref reservations, this avoids building up
              too much work or overusing or exhausting the global block
              reserve in some situations
      
            - move delayed refs reservation to the transaction start time,
              this prevents some ENOSPC corner cases related to exhaustion of
              global reserve
      
            - improvements in reducing excessive reservations for block group
              items
      
            - adjust overcommit logic in near full situations, account for one
              more chunk to eventually allocate metadata chunk, this is mostly
              relevant for small filesystems (<10GiB)
      
         - single device filesystems are scanned but not registered (except
           seed devices), this allows temp_fsid to work
      
         - qgroup iterations do not need GFP_ATOMIC allocations anymore
      
         - cleanups, refactoring, reduced data structure size, function
           parameter simplifications, error handling fixes"
      
      * tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (156 commits)
        btrfs: open code timespec64 in struct btrfs_inode
        btrfs: remove redundant log root tree index assignment during log sync
        btrfs: remove redundant initialization of variable dirty in btrfs_update_time()
        btrfs: sysfs: show temp_fsid feature
        btrfs: disable the device add feature for temp-fsid
        btrfs: disable the seed feature for temp-fsid
        btrfs: update comment for temp-fsid, fsid, and metadata_uuid
        btrfs: remove pointless empty log context list check when syncing log
        btrfs: update comment for struct btrfs_inode::lock
        btrfs: remove pointless barrier from btrfs_sync_file()
        btrfs: add and use helpers for reading and writing last_trans_committed
        btrfs: add and use helpers for reading and writing fs_info->generation
        btrfs: add and use helpers for reading and writing log_transid
        btrfs: add and use helpers for reading and writing last_log_commit
        btrfs: support cloned-device mount capability
        btrfs: add helper function find_fsid_by_disk
        btrfs: stop reserving excessive space for block group item insertions
        btrfs: stop reserving excessive space for block group item updates
        btrfs: reorder btrfs_inode to fill gaps
        btrfs: open code btrfs_ordered_inode_tree in btrfs_inode
        ...
      d5acbc60