• Tejun Heo's avatar
    sched: Implement interface for cgroup unified hierarchy · 0d593634
    Tejun Heo authored
    There are a couple interface issues which can be addressed in cgroup2
    interface.
    
    * Stats from cpuacct being reported separately from the cpu stats.
    
    * Use of different time units.  Writable control knobs use
      microseconds, some stat fields use nanoseconds while other cpuacct
      stat fields use centiseconds.
    
    * Control knobs which can't be used in the root cgroup still show up
      in the root.
    
    * Control knob names and semantics aren't consistent with other
      controllers.
    
    This patchset implements cpu controller's interface on cgroup2 which
    adheres to the controller file conventions described in
    Documentation/cgroups/cgroup-v2.txt.  Overall, the following changes
    are made.
    
    * cpuacct is implictly enabled and disabled by cpu and its information
      is reported through "cpu.stat" which now uses microseconds for all
      time durations.  All time duration fields now have "_usec" appended
      to them for clarity.
    
      Note that cpuacct.usage_percpu is currently not included in
      "cpu.stat".  If this information is actually called for, it will be
      added later.
    
    * "cpu.shares" is replaced with "cpu.weight" and operates on the
      standard scale defined by CGROUP_WEIGHT_MIN/DFL/MAX (1, 100, 10000).
      The weight is scaled to scheduler weight so that 100 maps to 1024
      and the ratio relationship is preserved - if weight is W and its
      scaled value is S, W / 100 == S / 1024.  While the mapped range is a
      bit smaller than the orignal scheduler weight range, the dead zones
      on both sides are relatively small and covers wider range than the
      nice value mappings.  This file doesn't make sense in the root
      cgroup and isn't created on root.
    
    * "cpu.weight.nice" is added. When read, it reads back the nice value
      which is closest to the current "cpu.weight".  When written, it sets
      "cpu.weight" to the weight value which matches the nice value.  This
      makes it easy to configure cgroups when they're competing against
      threads in threaded subtrees.
    
    * "cpu.cfs_quota_us" and "cpu.cfs_period_us" are replaced by "cpu.max"
      which contains both quota and period.
    
    v4: - Use cgroup2 basic usage stat as the information source instead
          of cpuacct.
    
    v3: - Added "cpu.weight.nice" to allow using nice values when
          configuring the weight.  The feature is requested by PeterZ.
        - Merge the patch to enable threaded support on cpu and cpuacct.
        - Dropped the bits about getting rid of cpuacct from patch
          description as there is a pretty strong case for making cpuacct
          an implicit controller so that basic cpu usage stats are always
          available.
        - Documentation updated accordingly.  "cpu.rt.max" section is
          dropped for now.
    
    v2: - cpu_stats_show() was incorrectly using CONFIG_FAIR_GROUP_SCHED
          for CFS bandwidth stats and also using raw division for u64.
          Use CONFIG_CFS_BANDWITH and do_div() instead.  "cpu.rt.max" is
          not included yet.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Li Zefan <lizefan@huawei.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    0d593634
cgroup-v2.txt 70 KB