1. 14 Oct, 2024 1 commit
  2. 10 Oct, 2024 7 commits
    • Tejun Heo's avatar
      sched_ext: Don't hold scx_tasks_lock for too long · b07996c7
      Tejun Heo authored
      While enabling and disabling a BPF scheduler, every task is iterated a
      couple times by walking scx_tasks. Except for one, all iterations keep
      holding scx_tasks_lock. On multi-socket systems under heavy rq lock
      contention and high number of threads, this can can lead to RCU and other
      stalls.
      
      The following is triggered on a 2 x AMD EPYC 7642 system (192 logical CPUs)
      running `stress-ng --workload 150 --workload-threads 10` with >400k idle
      threads and RCU stall period reduced to 5s:
      
        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        rcu:     91-...!: (10 ticks this GP) idle=0754/1/0x4000000000000000 softirq=18204/18206 fqs=17
        rcu:     186-...!: (17 ticks this GP) idle=ec54/1/0x4000000000000000 softirq=25863/25866 fqs=17
        rcu:     (detected by 80, t=10042 jiffies, g=89305, q=33 ncpus=192)
        Sending NMI from CPU 80 to CPUs 91:
        NMI backtrace for cpu 91
        CPU: 91 UID: 0 PID: 284038 Comm: sched_ext_ops_h Kdump: loaded Not tainted 6.12.0-rc2-work-g6bf5681f7ee2-dirty #471
        Hardware name: Supermicro Super Server/H11DSi, BIOS 2.8 12/14/2023
        Sched_ext: simple (disabling+all)
        RIP: 0010:queued_spin_lock_slowpath+0x17b/0x2f0
        Code: 02 c0 10 03 00 83 79 08 00 75 08 f3 90 83 79 08 00 74 f8 48 8b 11 48 85 d2 74 09 0f 0d 0a eb 0a 31 d2 eb 06 31 d2 eb 02 f3 90 <8b> 07 66 85 c0 75 f7 39 d8 75 0d be 01 00 00 00 89 d8 f0 0f b1 37
        RSP: 0018:ffffc9000fadfcb8 EFLAGS: 00000002
        RAX: 0000000001700001 RBX: 0000000001700000 RCX: ffff88bfcaaf10c0
        RDX: 0000000000000000 RSI: 0000000000000101 RDI: ffff88bfca8f0080
        RBP: 0000000001700000 R08: 0000000000000090 R09: ffffffffffffffff
        R10: ffff88a74761b268 R11: 0000000000000000 R12: ffff88a6b6765460
        R13: ffffc9000fadfd60 R14: ffff88bfca8f0080 R15: ffff88bfcaac0000
        FS:  0000000000000000(0000) GS:ffff88bfcaac0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f5c55f526a0 CR3: 0000000afd474000 CR4: 0000000000350eb0
        Call Trace:
         <NMI>
         </NMI>
         <TASK>
         do_raw_spin_lock+0x9c/0xb0
         task_rq_lock+0x50/0x190
         scx_task_iter_next_locked+0x157/0x170
         scx_ops_disable_workfn+0x2c2/0xbf0
         kthread_worker_fn+0x108/0x2a0
         kthread+0xeb/0x110
         ret_from_fork+0x36/0x40
         ret_from_fork_asm+0x1a/0x30
         </TASK>
        Sending NMI from CPU 80 to CPUs 186:
        NMI backtrace for cpu 186
        CPU: 186 UID: 0 PID: 51248 Comm: fish Kdump: loaded Not tainted 6.12.0-rc2-work-g6bf5681f7ee2-dirty #471
      
      scx_task_iter can safely drop locks while iterating. Make
      scx_task_iter_next() drop scx_tasks_lock every 32 iterations to avoid
      stalls.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      b07996c7
    • Tejun Heo's avatar
      sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers · 967da578
      Tejun Heo authored
      Iterating with scx_task_iter involves scx_tasks_lock and optionally the rq
      lock of the task being iterated. Both locks can be released during iteration
      and the iteration can be continued after re-grabbing scx_tasks_lock.
      Currently, all lock handling is pushed to the caller which is a bit
      cumbersome and makes it difficult to add lock-aware behaviors. Make the
      scx_task_iter helpers handle scx_tasks_lock.
      
      - scx_task_iter_init/scx_taks_iter_exit() now grabs and releases
        scx_task_lock, respectively. Renamed to
        scx_task_iter_start/scx_task_iter_stop() to more clearly indicate that
        there are non-trivial side-effects.
      
      - Add __ prefix to scx_task_iter_rq_unlock() to indicate that the function
        is internal.
      
      - Add scx_task_iter_unlock/relock(). The former drops both rq lock (if held)
        and scx_tasks_lock and the latter re-locks only scx_tasks_lock.
      
      This doesn't cause behavior changes and will be used to implement stall
      avoidance.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      967da578
    • Tejun Heo's avatar
      sched_ext: bypass mode shouldn't depend on ops.select_cpu() · aebe7ae4
      Tejun Heo authored
      Bypass mode was depending on ops.select_cpu() which can't be trusted as with
      the rest of the BPF scheduler. Always enable and use scx_select_cpu_dfl() in
      bypass mode.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      aebe7ae4
    • Tejun Heo's avatar
      sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl() · cc3e1cac
      Tejun Heo authored
      Move the sanity check from the inner function scx_select_cpu_dfl() to the
      exported kfunc scx_bpf_select_cpu_dfl(). This doesn't cause behavior
      differences and will allow using scx_select_cpu_dfl() in bypass mode
      regardless of scx_builtin_idle_enabled.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      cc3e1cac
    • Tejun Heo's avatar
      sched_ext: Start schedulers with consistent p->scx.slice values · 3fdb9ebc
      Tejun Heo authored
      The disable path caps p->scx.slice to SCX_SLICE_DFL. As the field is already
      being ignored at this stage during disable, the only effect this has is that
      when the next BPF scheduler is loaded, it won't see unreasonable left-over
      slices. Ultimately, this shouldn't matter but it's better to start in a
      known state. Drop p->scx.slice capping from the disable path and instead
      reset it to SCX_SLICE_DFL in the enable path.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      3fdb9ebc
    • Tejun Heo's avatar
      Revert "sched_ext: Use shorter slice while bypassing" · 54baa7ac
      Tejun Heo authored
      This reverts commit 6f34d8d3.
      
      Slice length is ignored while bypassing and tasks are switched on every tick
      and thus the patch does not make any difference. The perceived difference
      was from test noise.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      54baa7ac
    • Honglei Wang's avatar
      sched_ext: use correct function name in pick_task_scx() warning message · c425180d
      Honglei Wang authored
      pick_next_task_scx() was turned into pick_task_scx() since
      commit 753e2836 ("sched_ext: Unify regular and core-sched pick
      task paths"). Update the outdated message.
      Signed-off-by: default avatarHonglei Wang <jameshongleiwang@126.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      c425180d
  3. 09 Oct, 2024 1 commit
  4. 08 Oct, 2024 1 commit
  5. 07 Oct, 2024 3 commits
    • Tejun Heo's avatar
      sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED · 9b671793
      Tejun Heo authored
      scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
      tell whether ops.select_cpu() was called. This is incorrect as
      ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
      incorrectly skipping direct dispatch for tasks that are bound to a single
      CPU.
      
      sched core has been updated to specify ENQUEUE_RQ_SELECTED if
      ->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
      scx_qmap to test it instead of SCX_ENQ_WAKEUP.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
      Cc: Changwoo Min <multics69@gmail.com>
      Cc: Andrea Righi <andrea.righi@linux.dev>
      Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
      9b671793
    • Tejun Heo's avatar
      sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called · f207dc2d
      Tejun Heo authored
      During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
      migration is disabled. sched_ext schedulers may perform operations such as
      direct dispatch from ->select_task_rq() path and it is useful for them to
      know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
      
      Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
      and end up assuming incorrectly that ->select_task_rq() was called for tasks
      that are bound to a single CPU or migration disabled.
      
      Make select_task_rq() indicate whether ->select_task_rq() was called by
      setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
      to ENQUEUE_RQ_SELECTED for ->enqueue_task().
      
      This will be used by sched_ext to fix ->select_task_rq() skip detection.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      f207dc2d
    • Tejun Heo's avatar
      sched/core: Make select_task_rq() take the pointer to wake_flags instead of value · b62933ee
      Tejun Heo authored
      This will be used to allow select_task_rq() to indicate whether
      ->select_task_rq() was called by modifying *wake_flags.
      
      This makes try_to_wake_up() call all functions that take wake_flags with
      WF_TTWU set. Previously, only select_task_rq() was. Using the same flags is
      more consistent, and, as the flag is only tested by ->select_task_rq()
      implementations, it doesn't cause any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      b62933ee
  6. 04 Oct, 2024 2 commits
    • Tejun Heo's avatar
      sched_ext: scx_cgroup_exit() may be called without successful scx_cgroup_init() · ec010333
      Tejun Heo authored
      568894ed ("sched_ext: Add scx_cgroup_enabled to gate cgroup operations
      and fix scx_tg_online()") assumed that scx_cgroup_exit() is only called
      after scx_cgroup_init() finished successfully. This isn't true.
      scx_cgroup_exit() can be called without scx_cgroup_init() being called at
      all or after scx_cgroup_init() failed in the middle.
      
      As init state is tracked per cgroup, scx_cgroup_exit() can be used safely to
      clean up in all cases. Remove the incorrect WARN_ON_ONCE().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 568894ed ("sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online()")
      ec010333
    • Tejun Heo's avatar
      sched_ext: Improve error reporting during loading · cc9877fb
      Tejun Heo authored
      When the BPF scheduler fails, ops.exit() allows rich error reporting through
      scx_exit_info. Use scx.exit() path consistently for all failures which can
      be caused by the BPF scheduler:
      
      - scx_ops_error() is called after ops.init() and ops.cgroup_init() failure
        to record error information.
      
      - ops.init_task() failure now uses scx_ops_error() instead of pr_err().
      
      - The err_disable path updated to automatically trigger scx_ops_error() to
        cover cases that the error message hasn't already been generated and
        always return 0 indicating init success so that the error is reported
        through ops.exit().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Vernet <void@manifault.com>
      Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
      Cc: Changwoo Min <multics69@gmail.com>
      Cc: Andrea Righi <andrea.righi@linux.dev>
      Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
      cc9877fb
  7. 02 Oct, 2024 1 commit
  8. 27 Sep, 2024 9 commits
    • Zhang Qiao's avatar
      sched_ext: Remove redundant p->nr_cpus_allowed checker · 95b87369
      Zhang Qiao authored
      select_rq_task() already checked that 'p->nr_cpus_allowed > 1',
      'p->nr_cpus_allowed == 1' checker in scx_select_cpu_dfl() is redundant.
      Signed-off-by: default avatarZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      95b87369
    • Tejun Heo's avatar
      sched_ext: Decouple locks in scx_ops_enable() · efe231d9
      Tejun Heo authored
      The enable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
      cpus_read_lock. Currently, the locks are grabbed together which is prone to
      locking order problems.
      
      For example, currently, there is a possible deadlock involving
      scx_fork_rwsem and cpus_read_lock. cpus_read_lock has to nest inside
      scx_fork_rwsem due to locking order existing in other subsystems. However,
      there exists a dependency in the other direction during hotplug if hotplug
      needs to fork a new task, which happens in some cases. This leads to the
      following deadlock:
      
             scx_ops_enable()                               hotplug
      
                                                percpu_down_write(&cpu_hotplug_lock)
         percpu_down_write(&scx_fork_rwsem)
         block on cpu_hotplug_lock
                                                kthread_create() waits for kthreadd
      					  kthreadd blocks on scx_fork_rwsem
      
      Note that this doesn't trigger lockdep because the hotplug side dependency
      bounces through kthreadd.
      
      With the preceding scx_cgroup_enabled change, this can be solved by
      decoupling cpus_read_lock, which is needed for static_key manipulations,
      from the other two locks.
      
      - Move the first block of static_key manipulations outside of scx_fork_rwsem
        and scx_cgroup_rwsem. This is now safe with the preceding
        scx_cgroup_enabled change.
      
      - Drop scx_cgroup_rwsem and scx_fork_rwsem between the two task iteration
        blocks so that __scx_ops_enabled static_key enabling is outside the two
        rwsems.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarAboorva Devarajan <aboorvad@linux.ibm.com>
      Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
      efe231d9
    • Tejun Heo's avatar
      sched_ext: Decouple locks in scx_ops_disable_workfn() · 16021656
      Tejun Heo authored
      The disable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
      cpus_read_lock. Currently, the locks are grabbed together which is prone to
      locking order problems. With the preceding scx_cgroup_enabled change, we can
      decouple them:
      
      - As cgroup disabling no longer requires modifying a static_key which
        requires cpus_read_lock(), no need to grab cpus_read_lock() before
        grabbing scx_cgroup_rwsem.
      
      - cgroup can now be independently disabled before tasks are moved back to
        the fair class.
      
      Relocate scx_cgroup_exit() invocation before scx_fork_rwsem is grabbed, drop
      now unnecessary cpus_read_lock() and move static_key operations out of
      scx_fork_rwsem. This decouples all three locks in the disable path.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarAboorva Devarajan <aboorvad@linux.ibm.com>
      Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
      16021656
    • Tejun Heo's avatar
      sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online() · 568894ed
      Tejun Heo authored
      If the BPF scheduler does not implement ops.cgroup_init(), scx_tg_online()
      didn't set SCX_TG_INITED which meant that ops.cgroup_exit(), even if
      implemented, won't be called from scx_tg_offline(). This is because
      SCX_HAS_OP(cgroupt_init) is used to test both whether SCX cgroup operations
      are enabled and ops.cgroup_init() exists.
      
      Fix it by introducing a separate bool scx_cgroup_enabled to gate cgroup
      operations and use SCX_HAS_OP(cgroup_init) only to test whether
      ops.cgroup_init() exists. Make all cgroup operations consistently use
      scx_cgroup_enabled to test whether cgroup operations are enabled.
      scx_cgroup_enabled is added instead of using scx_enabled() to ease planned
      locking updates.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      568894ed
    • Tejun Heo's avatar
      sched_ext: Enable scx_ops_init_task() separately · 4269c603
      Tejun Heo authored
      scx_ops_init_task() and the follow-up scx_ops_enable_task() in the fork path
      were gated by scx_enabled() test and thus __scx_ops_enabled had to be turned
      on before the first scx_ops_init_task() loop in scx_ops_enable(). However,
      if an external entity causes sched_class switch before the loop is complete,
      tasks which are not initialized could be switched to SCX.
      
      The following can be reproduced by running a program which keeps toggling a
      process between SCHED_OTHER and SCHED_EXT using sched_setscheduler(2).
      
        sched_ext: Invalid task state transition 0 -> 3 for fish[1623]
        WARNING: CPU: 1 PID: 1650 at kernel/sched/ext.c:3392 scx_ops_enable_task+0x1a1/0x200
        ...
        Sched_ext: simple (enabling)
        RIP: 0010:scx_ops_enable_task+0x1a1/0x200
        ...
         switching_to_scx+0x13/0xa0
         __sched_setscheduler+0x850/0xa50
         do_sched_setscheduler+0x104/0x1c0
         __x64_sys_sched_setscheduler+0x18/0x30
         do_syscall_64+0x7b/0x140
         entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Fix it by gating scx_ops_init_task() separately using
      scx_ops_init_task_enabled. __scx_ops_enabled is now set after all tasks are
      finished with scx_ops_init_task().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4269c603
    • Tejun Heo's avatar
      sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable() · 9753358a
      Tejun Heo authored
      scx_ops_enable() has two task iteration loops. The first one calls
      scx_ops_init_task() on every task and the latter switches the eligible ones
      into SCX. The first loop left the tasks in SCX_TASK_INIT state and then the
      second loop switched it into READY before switching the task into SCX.
      
      The distinction between INIT and READY is only meaningful in the fork path
      where it's used to tell whether the task finished forking so that we can
      tell ops.exit_task() accordingly. Leaving task in INIT state between the two
      loops is incosistent with the fork path and incorrect. The following can be
      triggered by running a program which keeps toggling a task between
      SCHED_OTHER and SCHED_SCX while enabling a task:
      
        sched_ext: Invalid task state transition 1 -> 3 for fish[1526]
        WARNING: CPU: 2 PID: 1615 at kernel/sched/ext.c:3393 scx_ops_enable_task+0x1a1/0x200
        ...
        Sched_ext: qmap (enabling+all)
        RIP: 0010:scx_ops_enable_task+0x1a1/0x200
        ...
         switching_to_scx+0x13/0xa0
         __sched_setscheduler+0x850/0xa50
         do_sched_setscheduler+0x104/0x1c0
         __x64_sys_sched_setscheduler+0x18/0x30
         do_syscall_64+0x7b/0x140
         entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Fix it by transitioning to READY in the first loop right after
      scx_ops_init_task() succeeds.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Vernet <void@manifault.com>
      9753358a
    • Tejun Heo's avatar
      sched_ext: Initialize in bypass mode · 8c2090c5
      Tejun Heo authored
      scx_ops_enable() used preempt_disable() around the task iteration loop to
      switch tasks into SCX to guarantee forward progress of the task which is
      running scx_ops_enable(). However, in the gap between setting
      __scx_ops_enabled and preeempt_disable(), an external entity can put tasks
      including the enabling one into SCX prematurely, which can lead to
      malfunctions including stalls.
      
      The bypass mode can wrap the entire enabling operation and guarantee forward
      progress no matter what the BPF scheduler does. Use the bypass mode instead
      to guarantee forward progress while enabling.
      
      While at it, release and regrab scx_tasks_lock between the two task
      iteration locks in scx_ops_enable() for clarity as there is no reason to
      keep holding the lock between them.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      8c2090c5
    • Tejun Heo's avatar
      sched_ext: Remove SCX_OPS_PREPPING · fc1fcebe
      Tejun Heo authored
      The distinction between SCX_OPS_PREPPING and SCX_OPS_ENABLING is not used
      anywhere and only adds confusion. Drop SCX_OPS_PREPPING.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fc1fcebe
    • Tejun Heo's avatar
      sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable() · 1bbcfe62
      Tejun Heo authored
      check_hotplug_seq() is used to detect CPU hotplug event which occurred while
      the BPF scheduler is being loaded so that initialization can be retried if
      CPU hotplug events take place before the CPU hotplug callbacks are online.
      
      As such, the best place to call it is in the same cpu_read_lock() section
      that enables the CPU hotplug ops. Currently, it is called in the next
      cpus_read_lock() block in scx_ops_enable(). The side effect of this
      placement is a small window in which hotplug sequence detection can trigger
      unnecessarily, which isn't critical.
      
      Move check_hotplug_seq() invocation to the same cpus_read_lock() block as
      the hotplug operation enablement to close the window and get the invocation
      out of the way for planned locking updates.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Vernet <void@manifault.com>
      1bbcfe62
  9. 26 Sep, 2024 5 commits
    • Tejun Heo's avatar
      sched_ext: Use shorter slice while bypassing · 6f34d8d3
      Tejun Heo authored
      While bypassing, tasks are scheduled in FIFO order which favors tasks that
      hog CPUs. This can slow down e.g. unloading of the BPF scheduler. While
      bypassing, guaranteeing timely forward progress is the main goal. There's no
      point in giving long slices. Shorten the time slice used while bypassing
      from 20ms to 5ms.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      6f34d8d3
    • Tejun Heo's avatar
      sched_ext: Split the global DSQ per NUMA node · b7b3b2db
      Tejun Heo authored
      In the bypass mode, the global DSQ is used to schedule all tasks in simple
      FIFO order. All tasks are queued into the global DSQ and all CPUs try to
      execute tasks from it. This creates a lot of cross-node cacheline accesses
      and scheduling across the node boundaries, and can lead to live-lock
      conditions where the system takes tens of minutes to disable the BPF
      scheduler while executing in the bypass mode.
      
      Split the global DSQ per NUMA node. Each node has its own global DSQ. When a
      task is dispatched to SCX_DSQ_GLOBAL, it's put into the global DSQ local to
      the task's CPU and all CPUs in a node only consume its node-local global
      DSQ.
      
      This resolves a livelock condition which could be reliably triggered on an
      2x EPYC 7642 system by running `stress-ng --race-sched 1024` together with
      `stress-ng --workload 80 --workload-threads 10` while repeatedly enabling
      and disabling a SCX scheduler.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      b7b3b2db
    • Tejun Heo's avatar
      sched_ext: Relocate find_user_dsq() · bba26bf3
      Tejun Heo authored
      To prepare for the addition of find_global_dsq(). No functional changes.
      Signed-off-by: default avatartejun heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      bba26bf3
    • Tejun Heo's avatar
      sched_ext: Allow only user DSQs for scx_bpf_consume(), scx_bpf_dsq_nr_queued()... · 63fb3ec8
      Tejun Heo authored
      sched_ext: Allow only user DSQs for scx_bpf_consume(), scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new()
      
      SCX_DSQ_GLOBAL is special in that it can't be used as a priority queue and
      is consumed implicitly, but all BPF DSQ related kfuncs could be used on it.
      SCX_DSQ_GLOBAL will be split per-node for scalability and those operations
      won't make sense anymore. Disallow SCX_DSQ_GLOBAL on scx_bpf_consume(),
      scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new(). This means that
      SCX_DSQ_GLOBAL can only be used as a dispatch target from BPF schedulers.
      
      With scx_flatcg, which was using SCX_DSQ_GLOBAL as the fallback DSQ,
      updated, this shouldn't affect any schedulers.
      
      This leaves find_dsq_for_dispatch() the only user of find_non_local_dsq().
      Open code and remove find_non_local_dsq().
      Signed-off-by: default avatartejun heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      63fb3ec8
    • Tejun Heo's avatar
      scx_flatcg: Use a user DSQ for fallback instead of SCX_DSQ_GLOBAL · c9c809f4
      Tejun Heo authored
      scx_flatcg was using SCX_DSQ_GLOBAL for fallback handling. However, it is
      assuming that SCX_DSQ_GLOBAL isn't automatically consumed, which was true a
      while ago but is no longer the case. Also, there are further changes planned
      for SCX_DSQ_GLOBAL which will disallow explicit consumption from it. Switch
      to a user DSQ for fallback.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      c9c809f4
  10. 25 Sep, 2024 2 commits
  11. 24 Sep, 2024 8 commits
    • Tejun Heo's avatar
      sched_ext: Build fix for !CONFIG_SMP · 42268ad0
      Tejun Heo authored
      move_remote_task_to_local_dsq() is only defined on SMP configs but
      scx_disaptch_from_dsq() was calling move_remote_task_to_local_dsq() on UP
      configs too causing build failures. Add a dummy
      move_remote_task_to_local_dsq() which triggers a warning.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 4c30f5ce ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202409241108.jaocHiDJ-lkp@intel.com/
      42268ad0
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 68e5c7d4
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - Support cross-compiling linux-headers Debian package and kernel-devel
         RPM package
      
       - Add support for the linux-debug Pacman package
      
       - Improve module rebuilding speed by factoring out the common code to
         scripts/module-common.c
      
       - Separate device tree build rules into scripts/Makefile.dtbs
      
       - Add a new script to generate modules.builtin.ranges, which is useful
         for tracing tools to find symbols in built-in modules
      
       - Refactor Kconfig and misc tools
      
       - Update Kbuild and Kconfig documentation
      
      * tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
        kbuild: doc: replace "gcc" in external module description
        kbuild: doc: describe the -C option precisely for external module builds
        kbuild: doc: remove the description about shipped files
        kbuild: doc: drop section numbering, use references in modules.rst
        kbuild: doc: throw out the local table of contents in modules.rst
        kbuild: doc: remove outdated description of the limitation on -I usage
        kbuild: doc: remove description about grepping CONFIG options
        kbuild: doc: update the description about Kbuild/Makefile split
        kbuild: remove unnecessary export of RUST_LIB_SRC
        kbuild: remove append operation on cmd_ld_ko_o
        kconfig: cache expression values
        kconfig: use hash table to reuse expressions
        kconfig: refactor expr_eliminate_dups()
        kconfig: add comments to expression transformations
        kconfig: change some expr_*() functions to bool
        scripts: move hash function from scripts/kconfig/ to scripts/include/
        kallsyms: change overflow variable to bool type
        kallsyms: squash output_address()
        kbuild: add install target for modules.builtin.ranges
        scripts: add verifier script for builtin module range data
        ...
      68e5c7d4
    • Linus Torvalds's avatar
      Merge tag 'linux-cpupower-6.12-rc1-fixes' of... · 7f8de2bf
      Linus Torvalds authored
      Merge tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
      
      Pull cpupower updates from Shuah Khan
       "The 'raw_pylibcpupower.i' file was being removed by "make mrproper".
      
        That was because '*.i', '.s' and '*.o' files are generated during
        kernel compile and removed when the repo is cleaned by mrproper.
      
        Rename it to use .swg extension instead to avoid the problem.
      
        A second patch removes references to it from .gitignore"
      
      * tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
        pm: cpupower: Clean up bindings gitignore
        pm: cpupower: rename raw_pylibcpupower.i
      7f8de2bf
    • Linus Torvalds's avatar
      Merge tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux · cd3d6477
      Linus Torvalds authored
      Pull i3c updates from Alexandre Belloni:
       "This adds support for the I3C HCI controller of the AMD SoC which as
        expected requires quirks. Also fixes for the other drivers, including
        rate selection fixes for svc.
      
        Core:
         - allow adjusting first broadcast address speed
      
        Drivers:
         - cdns: few fixes
         - mipi-i3c-hci: Add AMD SoC I3C controller support and quirks, fix
           get_i3c_mode
         - svc: adjust rates, fix race condition"
      
      * tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
        i3c: master: svc: Fix use after free vulnerability in svc_i3c_master Driver Due to Race Condition
        i3c: master: cdns: Fix use after free vulnerability in cdns_i3c_master Driver Due to Race Condition
        i3c: master: svc: adjust SDR according to i3c spec
        i3c: master: svc: use slow speed for first broadcast address
        i3c: master: support to adjust first broadcast address speed
        i3c/master: cmd_v1: Fix the rule for getting i3c mode
        i3c: master: cdns: fix module autoloading
        i3c: mipi-i3c-hci: Add a quirk to set Response buffer threshold
        i3c: mipi-i3c-hci: Add a quirk to set timing parameters
        i3c: mipi-i3c-hci: Relocate helper macros to HCI header file
        i3c: mipi-i3c-hci: Add a quirk to set PIO mode
        i3c: mipi-i3c-hci: Read HC_CONTROL_PIO_MODE only after i3c hci v1.1
        i3c: mipi-i3c-hci: Add AMDI5017 ACPI ID to the I3C Support List
      cd3d6477
    • Linus Torvalds's avatar
      remoteproc: k3-m4: use the proper dependencies · ba0c0cb5
      Linus Torvalds authored
      The TI_K3_M4_REMOTEPROC Kconfig entry selects OMAP2PLUS_MBOX, but that
      driver in turn depends on other things, which the k4-m4 driver didn't.
      
      This causes a Kconfig time warning:
      
        WARNING: unmet direct dependencies detected for OMAP2PLUS_MBOX
          Depends on [n]: MAILBOX [=y] && (ARCH_OMAP2PLUS || ARCH_K3)
          Selected by [m]:
          - TI_K3_M4_REMOTEPROC [=m] && REMOTEPROC [=y] && (ARCH_K3 || COMPILE_TEST [=y])
      
      because you can't select something that is unavailable.
      
      Make the dependencies for TI_K3_M4_REMOTEPROC match those of the
      OMAP2PLUS_MBOX driver that it needs.
      
      Fixes: ebcf9008 ("remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem")
      Cc: Bjorn Andersson <andersson@kernel.org>
      Cc: Martyn Welch <martyn.welch@collabora.com>
      Cc: Hari Nagalla <hnagalla@ti.com>
      Cc: Andrew Davis <afd@ti.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba0c0cb5
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 9ae2940c
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - support for PixArt PS/2 touchpad
      
       - updates to tsc2004/5, usbtouchscreen, and zforce_ts drivers
      
       - support for GPIO-only mode for ADP55888 controller
      
       - support for touch keys in Zinitix driver
      
       - support for querying density of Synaptics sensors
      
       - sysfs interface for Goodex "Berlin" devices to read and write touch
         IC registers
      
       - more quirks to i8042 to handle various Tuxedo laptops
      
       - a number of drivers have been converted to using "guard" notation
         when acquiring various locks, as well as using other cleanup
         functions to simplify releasing of resources (with more drivers to
         follow)
      
       - evdev will limit amount of data that can be written into an evdev
         instance at a given time to 4096 bytes (170 input events) to avoid
         holding evdev->mutex for too long and starving other users
      
       - Spitz has been converted to use software nodes/properties to describe
         its matrix keypad and GPIO-connected LEDs
      
       - msc5000_ts, msc_touchkey and keypad-nomadik-ske drivers have been
         removed since noone in mainline have been using them
      
       - other assorted cleanups and fixes
      
      * tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (98 commits)
        ARM: spitz: fix compile error when matrix keypad driver is enabled
        Input: hynitron_cstxxx - drop explicit initialization of struct i2c_device_id::driver_data to 0
        Input: adp5588-keys - fix check on return code
        Input: Convert comma to semicolon
        Input: i8042 - add TUXEDO Stellaris 15 Slim Gen6 AMD to i8042 quirk table
        Input: i8042 - add another board name for TUXEDO Stellaris Gen5 AMD line
        Input: tegra-kbc - use of_property_read_variable_u32_array() and of_property_present()
        Input: ps2-gpio - use IRQF_NO_AUTOEN flag in request_irq()
        Input: ims-pcu - fix calling interruptible mutex
        Input: zforce_ts - switch to using asynchronous probing
        Input: zforce_ts - remove assert/deassert wrappers
        Input: zforce_ts - do not hardcode interrupt level
        Input: zforce_ts - switch to using devm_regulator_get_enable()
        Input: zforce_ts - stop treating VDD regulator as optional
        Input: zforce_ts - make zforce_idtable constant
        Input: zforce_ts - use dev_err_probe() where appropriate
        Input: zforce_ts - do not ignore errors when acquiring regulator
        Input: zforce_ts - make parsing of contacts less confusing
        Input: zforce_ts - switch to using get_unaligned_le16
        Input: zforce_ts - use guard notation when acquiring mutexes
        ...
      9ae2940c
    • Linus Torvalds's avatar
      Merge tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · 6db6a19f
      Linus Torvalds authored
      Pull hwspinlock update from Bjorn Andersson:
       "This converts the Spreadtrum hardware spinlock DeviceTree binding to
        YAML, to allow validation of related DeviceTree source"
      
      * tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
        dt-bindings: hwlock: sprd-hwspinlock: convert to YAML
      6db6a19f
    • Linus Torvalds's avatar
      Merge tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · 6e10aa1f
      Linus Torvalds authored
      Pull rpmsg updates from Bjorn Andersson:
      
       - Minor cleanup/refactor to the Qualcomm GLINK code, in order to add
         trace events related to the messages exchange with the remote side,
         useful for debugging a range of interoperability issues
      
       - Rewrite the nested structs with flexible array members in order to
         avoid the risk of invalid accesses
      
      * tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
        rpmsg: glink: Avoid -Wflex-array-member-not-at-end warnings
        rpmsg: glink: Introduce packet tracepoints
        rpmsg: glink: Pass channel to qcom_glink_send_close_ack()
        rpmsg: glink: Tidy up RX advance handling
      6e10aa1f