An error occurred fetching the project authors.
  1. 29 Feb, 2024 1 commit
  2. 28 Feb, 2024 1 commit
  3. 15 Dec, 2023 1 commit
    • Kees Cook's avatar
      kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy() · ff6d413b
      Kees Cook authored
      One of the last remaining users of strlcpy() in the kernel is
      kernfs_path_from_node_locked(), which passes back the problematic "length
      we _would_ have copied" return value to indicate truncation.  Convert the
      chain of all callers to use the negative return value (some of which
      already doing this explicitly). All callers were already also checking
      for negative return values, so the risk to missed checks looks very low.
      
      In this analysis, it was found that cgroup1_release_agent() actually
      didn't handle the "too large" condition, so this is technically also a
      bug fix. :)
      
      Here's the chain of callers, and resolution identifying each one as now
      handling the correct return value:
      
      kernfs_path_from_node_locked()
              kernfs_path_from_node()
                      pr_cont_kernfs_path()
                              returns void
                      kernfs_path()
                              sysfs_warn_dup()
                                      return value ignored
                              cgroup_path()
                                      blkg_path()
                                              bfq_bic_update_cgroup()
                                                      return value ignored
                                      TRACE_IOCG_PATH()
                                              return value ignored
                                      TRACE_CGROUP_PATH()
                                              return value ignored
                                      perf_event_cgroup()
                                              return value ignored
                                      task_group_path()
                                              return value ignored
                                      damon_sysfs_memcg_path_eq()
                                              return value ignored
                                      get_mm_memcg_path()
                                              return value ignored
                                      lru_gen_seq_show()
                                              return value ignored
                              cgroup_path_from_kernfs_id()
                                      return value ignored
                      cgroup_show_path()
                              already converted "too large" error to negative value
                      cgroup_path_ns_locked()
                              cgroup_path_ns()
                                      bpf_iter_cgroup_show_fdinfo()
                                              return value ignored
                                      cgroup1_release_agent()
                                              wasn't checking "too large" error
                              proc_cgroup_show()
                                      already converted "too large" to negative value
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Zefan Li <lizefan.x@bytedance.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc:  <cgroups@vger.kernel.org>
      Co-developed-by: default avatarAzeem Shaikh <azeemshaikh38@gmail.com>
      Signed-off-by: default avatarAzeem Shaikh <azeemshaikh38@gmail.com>
      Link: https://lore.kernel.org/r/20231116192127.1558276-3-keescook@chromium.orgSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20231212211741.164376-3-keescook@chromium.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff6d413b
  4. 06 Dec, 2023 1 commit
    • Waiman Long's avatar
      cgroup/cpuset: Include isolated cpuset CPUs in cpu_is_isolated() check · 3232e7aa
      Waiman Long authored
      Currently, the cpu_is_isolated() function checks only the statically
      isolated CPUs specified via the "isolcpus" and "nohz_full" kernel
      command line options. This function is used by vmstat and memcg to
      reduce interference with isolated CPUs by not doing stat flushing
      or scheduling works on those CPUs.
      
      Workloads running on isolated CPUs within isolated cpuset
      partitions should receive the same treatment to reduce unnecessary
      interference. This patch introduces a new cpuset_cpu_is_isolated()
      function to be called by cpu_is_isolated() so that the set of dynamically
      created cpuset isolated CPUs will be included in the check.
      
      Assuming that testing a bit in a cpumask is atomic, no synchronization
      primitive is currently used to synchronize access to the cpuset's
      isolated_cpus mask.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      3232e7aa
  5. 28 Nov, 2023 1 commit
    • Waiman Long's avatar
      cgroup/cpuset: Expose cpuset.cpus.isolated · 877c737d
      Waiman Long authored
      The root-only cpuset.cpus.isolated control file shows the current set
      of isolated CPUs in isolated partitions. This control file is currently
      exposed only with the cgroup_debug boot command line option which also
      adds the ".__DEBUG__." prefix. This is actually a useful control file if
      users want to find out which CPUs are currently in an isolated state by
      the cpuset controller. Remove CFTYPE_DEBUG flag for this control file and
      make it available by default without any prefix.
      
      The test_cpuset_prs.sh test script and the cgroup-v2.rst documentation
      file are also updated accordingly. Minor code change is also made in
      test_cpuset_prs.sh to avoid false test failure when running on debug
      kernel.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      877c737d
  6. 12 Nov, 2023 2 commits
    • Waiman Long's avatar
      cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask · 72c6303a
      Waiman Long authored
      To make CPUs in isolated cpuset partition closer in isolation to
      the boot time isolated CPUs specified in the "isolcpus" boot command
      line option, we need to take those CPUs out of the workqueue unbound
      cpumask so that work functions from the unbound workqueues won't run
      on those CPUs.  Otherwise, they will interfere the user tasks running
      on those isolated CPUs.
      
      With the introduction of the workqueue_unbound_exclude_cpumask() helper
      function in an earlier commit, those isolated CPUs can now be taken
      out from the workqueue unbound cpumask.
      
      This patch also updates cgroup-v2.rst to mention that isolated
      CPUs will be excluded from unbound workqueue cpumask as well as
      updating test_cpuset_prs.sh to verify the correctness of the new
      *cpuset.cpus.isolated file, if available via cgroup_debug option.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      72c6303a
    • Waiman Long's avatar
      cgroup/cpuset: Keep track of CPUs in isolated partitions · 11e5f407
      Waiman Long authored
      Add a new internal isolated_cpus mask to keep track of the CPUs that are in
      isolated partitions. Expose that new cpumask as a new root-only control file
      ".cpuset.cpus.isolated".
      
      tj: Updated patch description to reflect dropping __DEBUG__ prefix.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      11e5f407
  7. 04 Oct, 2023 2 commits
  8. 18 Sep, 2023 5 commits
    • Waiman Long's avatar
      cgroup/cpuset: Check partition conflict with housekeeping setup · 4a74e418
      Waiman Long authored
      A user can pre-configure certain CPUs in an isolated state at boot time
      with the "isolcpus" kernel boot command line option. Those CPUs will
      not be in the housekeeping_cpumask(HK_TYPE_DOMAIN) and so will not
      be in any sched domains. This may conflict with the partition setup
      at runtime. Those boot time isolated CPUs should only be used in an
      isolated partition.
      
      This patch adds the necessary check and disallows partition setup if the
      check fails.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4a74e418
    • Waiman Long's avatar
      cgroup/cpuset: Introduce remote partition · 181c8e09
      Waiman Long authored
      One can use "cpuset.cpus.partition" to create multiple scheduling domains
      or to produce a set of isolated CPUs where load balancing is disabled.
      The former use case is less common but the latter one can be frequently
      used especially for the Telco use cases like DPDK.
      
      The existing "isolated" partition can be used to produce isolated
      CPUs if the applications have full control of a system. However, in a
      containerized environment where all the apps are run in a container,
      it is hard to distribute out isolated CPUs from the root down given
      the unified hierarchy nature of cgroup v2.
      
      The container running on isolated CPUs can be several layers down from
      the root. The current partition feature requires that all the ancestors
      of a leaf partition root must be parititon roots themselves. This can
      be hard to configure.
      
      This patch introduces a new type of partition called remote partition.
      A remote partition is a partition whose parent is not a partition root
      itself and its CPUs are acquired directly from available CPUs in the
      top cpuset through a hierachical distribution of exclusive CPUs down
      from it.
      
      By contrast, the existing type of partitions where their parents have
      to be valid partition roots are referred to as local partitions as they
      have to be clustered around a parent partition root.
      
      Child local partitons can be created under a remote partition, but
      a remote partition cannot be created under a local partition. We may
      relax this limitation in the future if there are use cases for such
      configuration.
      
      Manually writing to the "cpuset.cpus.exclusive" file is not necessary
      when creating local partitions.  However, writing proper values to
      "cpuset.cpus.exclusive" down the cgroup hierarchy before the target
      remote partition root is mandatory for the creation of a remote
      partition.
      
      The value in "cpuset.cpus.exclusive.effective" may change if its
      "cpuset.cpus" or its parent's "cpuset.cpus.exclusive.effective" changes.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      181c8e09
    • Waiman Long's avatar
      cgroup/cpuset: Add cpuset.cpus.exclusive for v2 · e2ffe502
      Waiman Long authored
      This patch introduces a new writable "cpuset.cpus.exclusive" control
      file for v2 which will be added to non-root cpuset enabled cgroups. This new
      file enables user to set a smaller list of exclusive CPUs to be used in
      the creation of a cpuset partition.
      
      The value written to "cpuset.cpus.exclusive" may not be the effective
      value being used for the creation of cpuset partition, the effective
      value will show up in "cpuset.cpus.exclusive.effective" and it is
      subject to the constraint that it must also be a subset of cpus_allowed
      and parent's "cpuset.cpus.exclusive.effective".
      
      By writing to "cpuset.cpus.exclusive", "cpuset.cpus.exclusive.effective"
      may be set to a non-empty value even for cgroups that are not valid
      partition roots yet.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e2ffe502
    • Waiman Long's avatar
      cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2 · 0c7f293e
      Waiman Long authored
      The creation of a cpuset partition means dedicating a set of exclusive
      CPUs to be used by a particular partition only. These exclusive CPUs
      will not be used by any cpusets outside of that partition.
      
      To enable more flexibility in creating partitions, we need a way to
      distribute exclusive CPUs that can be used in new partitions. Currently,
      we have a subparts_cpus cpumask in struct cpuset that tracks only
      the exclusive CPUs used by all the sub-partitions underneath a given
      cpuset.
      
      This patch reworks the way we do exclusive CPUs tracking. The
      subparts_cpus is now renamed to effective_xcpus which tracks the
      exclusive CPUs allocated to a partition root including those that are
      further distributed down to sub-partitions underneath it. IOW, it also
      includes the exclusive CPUs used by the current partition root. Note
      that effective_xcpus can contain offline CPUs and it will always be a
      subset of cpus_allowed.
      
      The renamed effective_xcpus is now exposed via a new read-only
      "cpuset.cpus.exclusive.effective" control file. The new effective_xcpus
      cpumask should be set to cpus_allowed when a cpuset becomes a partition
      root and be cleared if it is not a valid partition root.
      
      In the next patch, we will enable write to another new control file to
      enable further control of what can get into effective_xcpus.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0c7f293e
    • Waiman Long's avatar
      cgroup/cpuset: Fix load balance state in update_partition_sd_lb() · 6fcdb018
      Waiman Long authored
      Commit a86ce680 ("cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE
      & CS_SCHED_LOAD_BALANCE handling") adds a new helper function
      update_partition_sd_lb() to update the load balance state of the
      cpuset. However the new load balance is determined by just looking at
      whether the cpuset is a valid isolated partition root or not.  That is
      not enough if the cpuset is not a valid partition root but its parent
      is in the isolated state (load balance off). Update the function to
      set the new state to be the same as its parent in this case like what
      has been done in commit c8c92620 ("cgroup/cpuset: Inherit parent's
      load balance state in v2").
      
      Fixes: a86ce680 ("cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE handling")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      6fcdb018
  9. 02 Aug, 2023 1 commit
  10. 10 Jul, 2023 7 commits
  11. 12 Jun, 2023 1 commit
  12. 21 May, 2023 1 commit
  13. 08 May, 2023 5 commits
  14. 12 Apr, 2023 4 commits
  15. 29 Mar, 2023 3 commits
  16. 24 Mar, 2023 1 commit
  17. 06 Feb, 2023 2 commits
    • Will Deacon's avatar
      cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task · 7a2127e6
      Will Deacon authored
      set_cpus_allowed_ptr() will fail with -EINVAL if the requested
      affinity mask is not a subset of the task_cpu_possible_mask() for the
      task being updated. Consequently, on a heterogeneous system with cpusets
      spanning the different CPU types, updates to the cgroup hierarchy can
      silently fail to update task affinities when the effective affinity
      mask for the cpuset is expanded.
      
      For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
      the only cores capable of executing 32-bit tasks. Attaching a 32-bit
      task to a cpuset containing CPUs 0-2 will correctly affine the task to
      CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
      the affinity mask of the 32-bit task because update_tasks_cpumask() will
      pass the full 0-3 mask to set_cpus_allowed_ptr().
      
      Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
      and use it to mask the 'effective_cpus' mask with the possible mask for
      each task being updated.
      
      Fixes: 431c69fa ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      7a2127e6
    • Waiman Long's avatar
      cgroup/cpuset: Don't filter offline CPUs in cpuset_cpus_allowed() for top cpuset tasks · 3fb906e7
      Waiman Long authored
      Since commit 8f9ea86f ("sched: Always preserve the user
      requested cpumask"), relax_compatible_cpus_allowed_ptr() is calling
      __sched_setaffinity() unconditionally. This helps to expose a bug in
      the current cpuset hotplug code where the cpumasks of the tasks in
      the top cpuset are not updated at all when some CPUs become online or
      offline. It is likely caused by the fact that some of the tasks in the
      top cpuset, like percpu kthreads, cannot have their cpu affinity changed.
      
      One way to reproduce this as suggested by Peter is:
       - boot machine
       - offline all CPUs except one
       - taskset -p ffffffff $$
       - online all CPUs
      
      Fix this by allowing cpuset_cpus_allowed() to return a wider mask that
      includes offline CPUs for those tasks that are in the top cpuset. For
      tasks not in the top cpuset, the old rule applies and only online CPUs
      will be returned in the mask since hotplug events will update their
      cpumasks accordingly.
      
      Fixes: 8f9ea86f ("sched: Always preserve the user requested cpumask")
      Reported-by: default avatarWill Deacon <will@kernel.org>
      Originally-from: Peter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      3fb906e7
  18. 31 Jan, 2023 1 commit