1. 08 Aug, 2023 11 commits
    • Tejun Heo's avatar
      workqueue: Make unbound workqueues to use per-cpu pool_workqueues · 636b927e
      Tejun Heo authored
      A pwq (pool_workqueue) represents an association between a workqueue and a
      worker_pool. When a work item is queued, the workqueue selects the pwq to
      use, which in turn determines the pool, and queues the work item to the pool
      through the pwq. pwq is also what implements the maximum concurrency limit -
      @max_active.
      
      As a per-cpu workqueue should be assocaited with a different worker_pool on
      each CPU, it always had per-cpu pwq's that are accessed through wq->cpu_pwq.
      However, unbound workqueues were sharing a pwq within each NUMA node by
      default. The sharing has several downsides:
      
      * Because @max_active is per-pwq, the meaning of @max_active changes
        depending on the machine configuration and whether workqueue NUMA locality
        support is enabled.
      
      * Makes per-cpu and unbound code deviate.
      
      * Gets in the way of making workqueue CPU locality awareness more flexible.
      
      This patch makes unbound workqueues use per-cpu pwq's the same way per-cpu
      workqueues do by making the following changes:
      
      * wq->numa_pwq_tbl[] is removed and unbound workqueues now use wq->cpu_pwq
        just like per-cpu workqueues. wq->cpu_pwq is now RCU protected for unbound
        workqueues.
      
      * numa_pwq_tbl_install() is renamed to install_unbound_pwq() and installs
        the specified pwq to the target CPU's wq->cpu_pwq.
      
      * apply_wqattrs_prepare() now always allocates a separate pwq for each CPU
        unless the workqueue is ordered. If ordered, all CPUs use wq->dfl_pwq.
        This makes the return value of wq_calc_node_cpumask() unnecessary. It now
        returns void.
      
      * @max_active now means the same thing for both per-cpu and unbound
        workqueues. WQ_UNBOUND_MAX_ACTIVE now equals WQ_MAX_ACTIVE and
        documentation is updated accordingly. WQ_UNBOUND_MAX_ACTIVE is no longer
        used in workqueue implementation and will be removed later.
      
      * All unbound pwq operations which used to be per-numa-node are now per-cpu.
      
      For most unbound workqueue users, this shouldn't cause noticeable changes.
      Work item issue and completion will be a small bit faster, flush_workqueue()
      would become a bit more expensive, and the total concurrency limit would
      likely become higher. All @max_active==1 use cases are currently being
      audited for conversion into alloc_ordered_workqueue() and they shouldn't be
      affected once the audit and conversion is complete.
      
      One area where the behavior change may be more noticeable is
      workqueue_congested() as the reported congestion state is now per CPU
      instead of NUMA node. There are only two users of this interface -
      drivers/infiniband/hw/hfi1 and net/smc. Maintainers of both subsystems are
      cc'd. Inputs on the behavior change would be very much appreciated.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: Karsten Graul <kgraul@linux.ibm.com>
      Cc: Wenjia Zhang <wenjia@linux.ibm.com>
      Cc: Jan Karcher <jaka@linux.ibm.com>
      636b927e
    • Tejun Heo's avatar
      workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug · 4cbfd3de
      Tejun Heo authored
      When a CPU went online or offline, wq_update_unbound_numa() was called only
      on the CPU which was going up or down. This works fine because all CPUs on
      the same NUMA node share the same pool_workqueue slot - one CPU updating it
      updates it for everyone in the node.
      
      However, future changes will make each CPU use a separate pool_workqueue
      even when they're sharing the same worker_pool, which requires updating
      pool_workqueue's for all CPUs which may be sharing the same pool_workqueue
      on hotplug.
      
      To accommodate the planned changes, this patch updates
      workqueue_on/offline_cpu() so that they call wq_update_unbound_numa() for
      all CPUs sharing the same NUMA node as the CPU going up or down. In the
      current code, the second+ calls would be noops and there shouldn't be any
      behavior changes.
      
      * As wq_update_unbound_numa() is now called on multiple CPUs per each
        hotplug event, @cpu is renamed to @hotplug_cpu and another @cpu argument
        is added. The former indicates the CPU being hot[un]plugged and the latter
        the CPU whose pool_workqueue is being updated.
      
      * In wq_update_unbound_numa(), cpu_off is renamed to off_cpu for consistency
        with the new @hotplug_cpu.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4cbfd3de
    • Tejun Heo's avatar
      workqueue: Make per-cpu pool_workqueues allocated and released like unbound ones · 687a9aa5
      Tejun Heo authored
      Currently, all per-cpu pwq's (pool_workqueue's) are allocated directly
      through a per-cpu allocation and thus, unlike unbound workqueues, not
      reference counted. This difference in lifetime management between the two
      types is a bit confusing.
      
      Unbound workqueues are currently accessed through wq->numa_pwq_tbl[] which
      isn't suitiable for the planned CPU locality related improvements. The plan
      is to unify pwq handling across per-cpu and unbound workqueues so that
      they're always accessed through wq->cpu_pwq.
      
      In preparation, this patch makes per-cpu pwq's to be allocated, reference
      counted and released the same way as unbound pwq's. wq->cpu_pwq now holds
      pointers to pwq's instead of containing them directly.
      
      pwq_unbound_release_workfn() is renamed to pwq_release_workfn() as it's now
      also used for per-cpu work items.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      687a9aa5
    • Tejun Heo's avatar
      workqueue: Use a kthread_worker to release pool_workqueues · 967b494e
      Tejun Heo authored
      pool_workqueue release path is currently bounced to system_wq; however, this
      is a bit tricky because this bouncing occurs while holding a pool lock and
      thus has risk of causing a A-A deadlock. This is currently addressed by the
      fact that only unbound workqueues use this bouncing path and system_wq is a
      per-cpu workqueue.
      
      While this works, it's brittle and requires a work-around like setting the
      lockdep subclass for the lock of unbound pools. Besides, future changes will
      use the bouncing path for per-cpu workqueues too making the current approach
      unusable.
      
      Let's just use a dedicated kthread_worker to untangle the dependency. This
      is just one more kthread for all workqueues and makes the pwq release logic
      simpler and more robust.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      967b494e
    • Tejun Heo's avatar
      workqueue: Remove module param disable_numa and sysfs knobs pool_ids and numa · fcecfa8f
      Tejun Heo authored
      Unbound workqueue CPU affinity is going to receive an overhaul and the NUMA
      specific knobs won't make sense anymore. Remove them. Also, the pool_ids
      knob was used for debugging and not really meaningful given that there is no
      visibility into the pools associated with those IDs. Remove it too. A future
      patch will improve overall visibility.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fcecfa8f
    • Tejun Heo's avatar
      workqueue: Relocate worker and work management functions · 797e8345
      Tejun Heo authored
      Collect first_idle_worker(), worker_enter/leave_idle(),
      find_worker_executing_work(), move_linked_works() and wake_up_worker() into
      one place. These functions will later be used to implement higher level
      worker management logic.
      
      No functional changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      797e8345
    • Tejun Heo's avatar
      workqueue: Rename wq->cpu_pwqs to wq->cpu_pwq · ee1ceef7
      Tejun Heo authored
      wq->cpu_pwqs is a percpu variable carraying one pointer to a pool_workqueue.
      The field name being plural is unusual and confusing. Rename it to singular.
      
      This patch doesn't cause any functional changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      ee1ceef7
    • Tejun Heo's avatar
      workqueue: Not all work insertion needs to wake up a worker · fe089f87
      Tejun Heo authored
      insert_work() always tried to wake up a worker; however, the only time it
      needs to try to wake up a worker is when a new active work item is queued.
      When a work item goes on the inactive list or queueing a flush work item,
      there's no reason to try to wake up a worker.
      
      This patch moves the worker wakeup logic out of insert_work() and places it
      in the active new work item queueing path in __queue_work().
      
      While at it:
      
      * __queue_work() is dereferencing pwq->pool repeatedly. Add local variable
        pool.
      
      * Every caller of insert_work() calls debug_work_activate(). Consolidate the
        invocations into insert_work().
      
      * In __queue_work() pool->watchdog_ts update is relocated slightly. This is
        to better accommodate future changes.
      
      This makes wakeups more precise and will help the planned change to assign
      work items to workers before waking them up. No behavior changes intended.
      
      v2: WARN_ON_ONCE(pool != last_pool) added in __queue_work() to clarify as
          suggested by Lai.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      fe089f87
    • Tejun Heo's avatar
      workqueue: Cleanups around process_scheduled_works() · c0ab017d
      Tejun Heo authored
      * Drop the trivial optimization in worker_thread() where it bypasses calling
        process_scheduled_works() if the first work item isn't linked. This is a
        mostly pointless micro optimization and gets in the way of improving the
        work processing path.
      
      * Consolidate pool->watchdog_ts updates in the two callers into
        process_scheduled_works().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      c0ab017d
    • Tejun Heo's avatar
      workqueue: Drop the special locking rule for worker->flags and worker_pool->flags · bc8b50c2
      Tejun Heo authored
      worker->flags used to be accessed from scheduler hooks without grabbing
      pool->lock for concurrency management. This is no longer true since
      6d25be57 ("sched/core, workqueues: Distangle worker accounting from rq
      lock"). Also, it's unclear why worker_pool->flags was using the "X" rule.
      All relevant users are accessing it under the pool lock.
      
      Let's drop the special "X" rule and use the "L" rule for these flag fields
      instead. While at it, replace the CONTEXT comment with
      lockdep_assert_held().
      
      This allows worker_set/clr_flags() to be used from context which isn't the
      worker itself. This will be used later to implement assinging work items to
      workers before waking them up so that workqueue can have better control over
      which worker executes which work item on which CPU.
      
      The only actual changes are sanity checks. There shouldn't be any visible
      behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      bc8b50c2
    • Tejun Heo's avatar
      workqueue: Merge branch 'for-6.5-fixes' into for-6.6 · 87437656
      Tejun Heo authored
      Unbound workqueue execution locality improvement patchset is about to
      applied which will cause merge conflicts with changes in for-6.5-fixes.
      Let's avoid future merge conflict by pulling in for-6.5-fixes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      87437656
  2. 07 Aug, 2023 1 commit
  3. 25 Jul, 2023 1 commit
    • Tejun Heo's avatar
      workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000 · aa6fde93
      Tejun Heo authored
      wq_cpu_intensive_thresh_us is used to detect CPU-hogging per-cpu work items.
      Once detected, they're excluded from concurrency management to prevent them
      from blocking other per-cpu work items. If CONFIG_WQ_CPU_INTENSIVE_REPORT is
      enabled, repeat offenders are also reported so that the code can be updated.
      
      The default threshold is 10ms which is long enough to do fair bit of work on
      modern CPUs while short enough to be usually not noticeable. This
      unfortunately leads to a lot of, arguable spurious, detections on very slow
      CPUs. Using the same threshold across CPUs whose performance levels may be
      apart by multiple levels of magnitude doesn't make whole lot of sense.
      
      This patch scales up wq_cpu_intensive_thresh_us upto 1 second when BogoMIPS
      is below 4000. This is obviously very inaccurate but it doesn't have to be
      accurate to be useful. The mechanism is still useful when the threshold is
      fully scaled up and the benefits of reports are usually shared with everyone
      regardless of who's reporting, so as long as there are sufficient number of
      fast machines reporting, we don't lose much.
      
      Some (or is it all?) ARM CPUs systemtically report significantly lower
      BogoMIPS. While this doesn't break anything, given how widespread ARM CPUs
      are, it's at least a missed opportunity and it probably would be a good idea
      to teach workqueue about it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-Tested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      aa6fde93
  4. 11 Jul, 2023 1 commit
  5. 10 Jul, 2023 4 commits
  6. 09 Jul, 2023 10 commits
  7. 08 Jul, 2023 12 commits
    • Hugh Dickins's avatar
      mm: lock newly mapped VMA with corrected ordering · 1c7873e3
      Hugh Dickins authored
      Lockdep is certainly right to complain about
      
        (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
                       but task is already holding lock:
        (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
      
      Invert those to the usual ordering.
      
      Fixes: 33313a74 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c7873e3
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of... · 946c6b59
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull hotfixes from Andrew Morton:
       "16 hotfixes. Six are cc:stable and the remainder address post-6.4
        issues"
      
      The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
      it was all hopefully fixed in mainline.
      
      * tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        lib: dhry: fix sleeping allocations inside non-preemptable section
        kasan, slub: fix HW_TAGS zeroing with slub_debug
        kasan: fix type cast in memory_is_poisoned_n
        mailmap: add entries for Heiko Stuebner
        mailmap: update manpage link
        bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
        MAINTAINERS: add linux-next info
        mailmap: add Markus Schneider-Pargmann
        writeback: account the number of pages written back
        mm: call arch_swap_restore() from do_swap_page()
        squashfs: fix cache race with migration
        mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
        docs: update ocfs2-devel mailing list address
        MAINTAINERS: update ocfs2-devel mailing list address
        mm: disable CONFIG_PER_VMA_LOCK until its fixed
        fork: lock VMAs of the parent process when forking
      946c6b59
    • Suren Baghdasaryan's avatar
      fork: lock VMAs of the parent process when forking · fb49c455
      Suren Baghdasaryan authored
      When forking a child process, the parent write-protects anonymous pages
      and COW-shares them with the child being forked using copy_present_pte().
      
      We must not take any concurrent page faults on the source vma's as they
      are being processed, as we expect both the vma and the pte's behind it
      to be stable.  For example, the anon_vma_fork() expects the parents
      vma->anon_vma to not change during the vma copy.
      
      A concurrent page fault on a page newly marked read-only by the page
      copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
      source vma, defeating the anon_vma_clone() that wasn't done because the
      parent vma originally didn't have an anon_vma, but we now might end up
      copying a pte entry for a page that has one.
      
      Before the per-vma lock based changes, the mmap_lock guaranteed
      exclusion with concurrent page faults.  But now we need to do a
      vma_start_write() to make sure no concurrent faults happen on this vma
      while it is being processed.
      
      This fix can potentially regress some fork-heavy workloads.  Kernel
      build time did not show noticeable regression on a 56-core machine while
      a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
      shows ~5% regression.  If such fork time regression is unacceptable,
      disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
      optimizations are possible if this regression proves to be problematic.
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/Reported-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/Reported-by: default avatarJacob Young <jacobly.alt@gmail.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
      Fixes: 0bff0aae ("x86/mm: try VMA lock-based page fault handling first")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb49c455
    • Suren Baghdasaryan's avatar
      mm: lock newly mapped VMA which can be modified after it becomes visible · 33313a74
      Suren Baghdasaryan authored
      mmap_region adds a newly created VMA into VMA tree and might modify it
      afterwards before dropping the mmap_lock.  This poses a problem for page
      faults handled under per-VMA locks because they don't take the mmap_lock
      and can stumble on this VMA while it's still being modified.  Currently
      this does not pose a problem since post-addition modifications are done
      only for file-backed VMAs, which are not handled under per-VMA lock.
      However, once support for handling file-backed page faults with per-VMA
      locks is added, this will become a race.
      
      Fix this by write-locking the VMA before inserting it into the VMA tree.
      Other places where a new VMA is added into VMA tree do not modify it
      after the insertion, so do not need the same locking.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33313a74
    • Suren Baghdasaryan's avatar
      mm: lock a vma before stack expansion · c137381f
      Suren Baghdasaryan authored
      With recent changes necessitating mmap_lock to be held for write while
      expanding a stack, per-VMA locks should follow the same rules and be
      write-locked to prevent page faults into the VMA being expanded. Add
      the necessary locking.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c137381f
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 7fcd473a
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "A few late arriving patches that missed the initial pull request. It's
        mostly bug fixes (the dt-bindings is a fix for the initial pull)"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: core: Remove unused function declaration
        scsi: target: docs: Remove tcm_mod_builder.py
        scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
        scsi: dt-bindings: ufs: qcom: Fix ICE phandle
        scsi: core: Simplify scsi_cdl_check_cmd()
        scsi: isci: Fix comment typo
        scsi: smartpqi: Replace one-element arrays with flexible-array members
        scsi: target: tcmu: Replace strlcpy() with strscpy()
        scsi: ncr53c8xx: Replace strlcpy() with strscpy()
        scsi: lpfc: Fix lpfc_name struct packing
      7fcd473a
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 84dc5aa3
      Linus Torvalds authored
      Pull more i2c updates from Wolfram Sang:
      
       - xiic patch should have been in the original pull but slipped through
      
       - mpc patch fixes a build regression
      
       - nomadik cleanup
      
      * tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: mpc: Drop unused variable
        i2c: nomadik: Remove a useless call in the remove function
        i2c: xiic: Don't try to handle more interrupt events after error
      84dc5aa3
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 8fc3b8f0
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - Check for NULL bdev in LoadPin (Matthias Kaehlcke)
      
       - Revert unwanted KUnit FORTIFY build default
      
       - Fix 1-element array causing boot warnings with xhci-hub
      
      * tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array
        Revert "fortify: Allow KUnit test to build without FORTIFY"
        dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
      8fc3b8f0
    • Anup Sharma's avatar
      ntb: hw: amd: Fix debugfs_create_dir error checking · bff6efc5
      Anup Sharma authored
      The debugfs_create_dir function returns ERR_PTR in case of error, and the
      only correct way to check if an error occurred is 'IS_ERR' inline function.
      This patch will replace the null-comparison with IS_ERR.
      Signed-off-by: default avatarAnup Sharma <anupnewsmail@gmail.com>
      Suggested-by: default avatarIvan Orlov <ivan.orlov0322@gmail.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      bff6efc5
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of... · c206353d
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
      
      Pull more perf tools updates from Namhyung Kim:
       "These are remaining changes and fixes for this cycle.
      
        Build:
      
         - Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
           and skip if the vmlinux has no BTF.
      
         - Replace deprecated clang -target xxx option by --target=xxx.
      
        perf record:
      
         - Print event attributes with well known type and config symbols in
           the debug output like below:
      
             # perf record -e cycles,cpu-clock -C0 -vv true
             <SNIP>
             ------------------------------------------------------------
             perf_event_attr:
               type                             0 (PERF_TYPE_HARDWARE)
               size                             136
               config                           0 (PERF_COUNT_HW_CPU_CYCLES)
               { sample_period, sample_freq }   4000
               sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
               read_format                      ID
               disabled                         1
               inherit                          1
               freq                             1
               sample_id_all                    1
               exclude_guest                    1
             ------------------------------------------------------------
             sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
             ------------------------------------------------------------
             perf_event_attr:
               type                             1 (PERF_TYPE_SOFTWARE)
               size                             136
               config                           0 (PERF_COUNT_SW_CPU_CLOCK)
               { sample_period, sample_freq }   4000
               sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
               read_format                      ID
               disabled                         1
               inherit                          1
               freq                             1
               sample_id_all                    1
               exclude_guest                    1
      
         - Update AMD IBS event error message since it now support per-process
           profiling but no priviledge filters.
      
             $ sudo perf record -e ibs_op//k -C 0
             Error:
             AMD IBS doesn't support privilege filtering. Try again without
             the privilege modifiers (like 'k') at the end.
      
        perf lock contention:
      
         - Support CSV style output using -x option
      
             $ sudo perf lock con -ab -x, sleep 1
             # output: contended, total wait, max wait, avg wait, type, caller
             19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
             15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
             4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
             1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
             8, 67608, 27404, 8451, spinlock, __queue_work+0x174
             3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
             3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
             2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
             1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
      
         - Add --output option to save the data to a file not to be interfered
           by other debug messages.
      
        Test:
      
         - Fix event parsing test on ARM where there's no raw PMU nor supports
           PERF_PMU_CAP_EXTENDED_HW_TYPE.
      
         - Update the lock contention test case for CSV output.
      
         - Fix a segfault in the daemon command test.
      
        Vendor events (JSON):
      
         - Add has_event() to check if the given event is available on system
           at runtime. On Intel machines, some transaction events may not be
           present when TSC extensions are disabled.
      
         - Update Intel event metrics.
      
        Misc:
      
         - Sort symbols by name using an external array of pointers instead of
           a rbtree node in the symbol. This will save 16-bytes or 24-bytes
           per symbol whether the sorting is actually requested or not.
      
         - Fix unwinding DWARF callstacks using libdw when --symfs option is
           used"
      
      * tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
        perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
        perf test: Fix event parsing test on Arm
        perf evsel amd: Fix IBS error message
        perf: unwind: Fix symfs with libdw
        perf symbol: Fix uninitialized return value in symbols__find_by_name()
        perf test: Test perf lock contention CSV output
        perf lock contention: Add --output option
        perf lock contention: Add -x option for CSV style output
        perf lock: Remove stale comments
        perf vendor events intel: Update tigerlake to 1.13
        perf vendor events intel: Update skylakex to 1.31
        perf vendor events intel: Update skylake to 57
        perf vendor events intel: Update sapphirerapids to 1.14
        perf vendor events intel: Update icelakex to 1.21
        perf vendor events intel: Update icelake to 1.19
        perf vendor events intel: Update cascadelakex to 1.19
        perf vendor events intel: Update meteorlake to 1.03
        perf vendor events intel: Add rocketlake events/metrics
        perf vendor metrics intel: Make transaction metrics conditional
        perf jevents: Support for has_event function
        ...
      c206353d
    • Linus Torvalds's avatar
      Merge tag 'bitmap-6.5-rc1' of https://github.com/norov/linux · ad8258e8
      Linus Torvalds authored
      Pull bitmap updates from Yury Norov:
       "Fixes for different bitmap pieces:
      
         - lib/test_bitmap: increment failure counter properly
      
           The tests that don't use expect_eq() macro to determine that a test
           is failured must increment failed_tests explicitly.
      
         - lib/bitmap: drop optimization of bitmap_{from,to}_arr64
      
           bitmap_{from,to}_arr64() optimization is overly optimistic
           on 32-bit LE architectures when it's wired to
           bitmap_copy_clear_tail().
      
         - nodemask: Drop duplicate check in for_each_node_mask()
      
           As the return value type of first_node() became unsigned, the node
           >= 0 became unnecessary.
      
         - cpumask: fix function description kernel-doc notation
      
         - MAINTAINERS: Add bits.h and bitfield.h to the BITMAP API record
      
           Add linux/bits.h and linux/bitfield.h for visibility"
      
      * tag 'bitmap-6.5-rc1' of https://github.com/norov/linux:
        MAINTAINERS: Add bitfield.h to the BITMAP API record
        MAINTAINERS: Add bits.h to the BITMAP API record
        cpumask: fix function description kernel-doc notation
        nodemask: Drop duplicate check in for_each_node_mask()
        lib/bitmap: drop optimization of bitmap_{from,to}_arr64
        lib/test_bitmap: increment failure counter properly
      ad8258e8
    • Geert Uytterhoeven's avatar
      lib: dhry: fix sleeping allocations inside non-preemptable section · 8ba388c0
      Geert Uytterhoeven authored
      The Smatch static checker reports the following warnings:
      
          lib/dhry_run.c:38 dhry_benchmark() warn: sleeping in atomic context
          lib/dhry_run.c:43 dhry_benchmark() warn: sleeping in atomic context
      
      Indeed, dhry() does sleeping allocations inside the non-preemptable
      section delimited by get_cpu()/put_cpu().
      
      Fix this by using atomic allocations instead.
      Add error handling, as atomic these allocations may fail.
      
      Link: https://lkml.kernel.org/r/bac6d517818a7cd8efe217c1ad649fffab9cc371.1688568764.git.geert+renesas@glider.be
      Fixes: 13684e96 ("lib: dhry: fix unstable smp_processor_id(_) usage")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Closes: https://lore.kernel.org/r/0469eb3a-02eb-4b41-b189-de20b931fa56@moroto.mountainSigned-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8ba388c0