Commit 633b11be authored by Mauro Carvalho Chehab's avatar Mauro Carvalho Chehab Committed by Jonathan Corbet

cgroup-v2.txt: standardize document format

Each text file under Documentation follows a different
format. Some doesn't even have titles!

Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:

- Comment the internal index;
- Use :Date: and :Author: for authorship;
- Mark titles;
- Mark literal blocks;
- Adjust witespaces;
- Mark notes;
- Use table notation for the existing tables.
Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent 58ef0e5b
================
Control Group v2 Control Group v2
================
October, 2015 Tejun Heo <tj@kernel.org> :Date: October, 2015
:Author: Tejun Heo <tj@kernel.org>
This is the authoritative documentation on the design, interface and This is the authoritative documentation on the design, interface and
conventions of cgroup v2. It describes all userland-visible aspects conventions of cgroup v2. It describes all userland-visible aspects
...@@ -9,12 +11,12 @@ of cgroup including core and specific controller behaviors. All ...@@ -9,12 +11,12 @@ of cgroup including core and specific controller behaviors. All
future changes must be reflected in this document. Documentation for future changes must be reflected in this document. Documentation for
v1 is available under Documentation/cgroup-v1/. v1 is available under Documentation/cgroup-v1/.
CONTENTS .. CONTENTS
1. Introduction 1. Introduction
1-1. Terminology 1-1. Terminology
1-2. What is cgroup? 1-2. What is cgroup?
2. Basic Operations 2. Basic Operations
2-1. Mounting 2-1. Mounting
2-2. Organizing Processes 2-2. Organizing Processes
2-3. [Un]populated Notification 2-3. [Un]populated Notification
...@@ -28,16 +30,16 @@ CONTENTS ...@@ -28,16 +30,16 @@ CONTENTS
2-6. Guidelines 2-6. Guidelines
2-6-1. Organize Once and Control 2-6-1. Organize Once and Control
2-6-2. Avoid Name Collisions 2-6-2. Avoid Name Collisions
3. Resource Distribution Models 3. Resource Distribution Models
3-1. Weights 3-1. Weights
3-2. Limits 3-2. Limits
3-3. Protections 3-3. Protections
3-4. Allocations 3-4. Allocations
4. Interface Files 4. Interface Files
4-1. Format 4-1. Format
4-2. Conventions 4-2. Conventions
4-3. Core Interface Files 4-3. Core Interface Files
5. Controllers 5. Controllers
5-1. CPU 5-1. CPU
5-1-1. CPU Interface Files 5-1-1. CPU Interface Files
5-2. Memory 5-2. Memory
...@@ -53,15 +55,15 @@ CONTENTS ...@@ -53,15 +55,15 @@ CONTENTS
5-5-1. RDMA Interface Files 5-5-1. RDMA Interface Files
5-6. Misc 5-6. Misc
5-6-1. perf_event 5-6-1. perf_event
6. Namespace 6. Namespace
6-1. Basics 6-1. Basics
6-2. The Root and Views 6-2. The Root and Views
6-3. Migration and setns(2) 6-3. Migration and setns(2)
6-4. Interaction with Other Namespaces 6-4. Interaction with Other Namespaces
P. Information on Kernel Programming P. Information on Kernel Programming
P-1. Filesystem Support for Writeback P-1. Filesystem Support for Writeback
D. Deprecated v1 Core Features D. Deprecated v1 Core Features
R. Issues with v1 and Rationales for v2 R. Issues with v1 and Rationales for v2
R-1. Multiple Hierarchies R-1. Multiple Hierarchies
R-2. Thread Granularity R-2. Thread Granularity
R-3. Competition Between Inner Nodes and Threads R-3. Competition Between Inner Nodes and Threads
...@@ -70,9 +72,11 @@ R. Issues with v1 and Rationales for v2 ...@@ -70,9 +72,11 @@ R. Issues with v1 and Rationales for v2
R-5-1. Memory R-5-1. Memory
1. Introduction Introduction
============
1-1. Terminology Terminology
-----------
"cgroup" stands for "control group" and is never capitalized. The "cgroup" stands for "control group" and is never capitalized. The
singular form is used to designate the whole feature and also as a singular form is used to designate the whole feature and also as a
...@@ -80,7 +84,8 @@ qualifier as in "cgroup controllers". When explicitly referring to ...@@ -80,7 +84,8 @@ qualifier as in "cgroup controllers". When explicitly referring to
multiple individual control groups, the plural form "cgroups" is used. multiple individual control groups, the plural form "cgroups" is used.
1-2. What is cgroup? What is cgroup?
---------------
cgroup is a mechanism to organize processes hierarchically and cgroup is a mechanism to organize processes hierarchically and
distribute system resources along the hierarchy in a controlled and distribute system resources along the hierarchy in a controlled and
...@@ -110,12 +115,14 @@ restrictions set closer to the root in the hierarchy can not be ...@@ -110,12 +115,14 @@ restrictions set closer to the root in the hierarchy can not be
overridden from further away. overridden from further away.
2. Basic Operations Basic Operations
================
2-1. Mounting Mounting
--------
Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2 Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2
hierarchy can be mounted with the following mount command. hierarchy can be mounted with the following mount command::
# mount -t cgroup2 none $MOUNT_POINT # mount -t cgroup2 none $MOUNT_POINT
...@@ -160,10 +167,11 @@ cgroup v2 currently supports the following mount options. ...@@ -160,10 +167,11 @@ cgroup v2 currently supports the following mount options.
Delegation section for details. Delegation section for details.
2-2. Organizing Processes Organizing Processes
--------------------
Initially, only the root cgroup exists to which all processes belong. Initially, only the root cgroup exists to which all processes belong.
A child cgroup can be created by creating a sub-directory. A child cgroup can be created by creating a sub-directory::
# mkdir $CGROUP_NAME # mkdir $CGROUP_NAME
...@@ -190,28 +198,29 @@ moved to another cgroup. ...@@ -190,28 +198,29 @@ moved to another cgroup.
A cgroup which doesn't have any children or live processes can be A cgroup which doesn't have any children or live processes can be
destroyed by removing the directory. Note that a cgroup which doesn't destroyed by removing the directory. Note that a cgroup which doesn't
have any children and is associated only with zombie processes is have any children and is associated only with zombie processes is
considered empty and can be removed. considered empty and can be removed::
# rmdir $CGROUP_NAME # rmdir $CGROUP_NAME
"/proc/$PID/cgroup" lists a process's cgroup membership. If legacy "/proc/$PID/cgroup" lists a process's cgroup membership. If legacy
cgroup is in use in the system, this file may contain multiple lines, cgroup is in use in the system, this file may contain multiple lines,
one for each hierarchy. The entry for cgroup v2 is always in the one for each hierarchy. The entry for cgroup v2 is always in the
format "0::$PATH". format "0::$PATH"::
# cat /proc/842/cgroup # cat /proc/842/cgroup
... ...
0::/test-cgroup/test-cgroup-nested 0::/test-cgroup/test-cgroup-nested
If the process becomes a zombie and the cgroup it was associated with If the process becomes a zombie and the cgroup it was associated with
is removed subsequently, " (deleted)" is appended to the path. is removed subsequently, " (deleted)" is appended to the path::
# cat /proc/842/cgroup # cat /proc/842/cgroup
... ...
0::/test-cgroup/test-cgroup-nested (deleted) 0::/test-cgroup/test-cgroup-nested (deleted)
2-3. [Un]populated Notification [Un]populated Notification
--------------------------
Each non-root cgroup has a "cgroup.events" file which contains Each non-root cgroup has a "cgroup.events" file which contains
"populated" field indicating whether the cgroup's sub-hierarchy has "populated" field indicating whether the cgroup's sub-hierarchy has
...@@ -222,7 +231,7 @@ example, to start a clean-up operation after all processes of a given ...@@ -222,7 +231,7 @@ example, to start a clean-up operation after all processes of a given
sub-hierarchy have exited. The populated state updates and sub-hierarchy have exited. The populated state updates and
notifications are recursive. Consider the following sub-hierarchy notifications are recursive. Consider the following sub-hierarchy
where the numbers in the parentheses represent the numbers of processes where the numbers in the parentheses represent the numbers of processes
in each cgroup. in each cgroup::
A(4) - B(0) - C(1) A(4) - B(0) - C(1)
\ D(0) \ D(0)
...@@ -233,18 +242,20 @@ file modified events will be generated on the "cgroup.events" files of ...@@ -233,18 +242,20 @@ file modified events will be generated on the "cgroup.events" files of
both cgroups. both cgroups.
2-4. Controlling Controllers Controlling Controllers
-----------------------
2-4-1. Enabling and Disabling Enabling and Disabling
~~~~~~~~~~~~~~~~~~~~~~
Each cgroup has a "cgroup.controllers" file which lists all Each cgroup has a "cgroup.controllers" file which lists all
controllers available for the cgroup to enable. controllers available for the cgroup to enable::
# cat cgroup.controllers # cat cgroup.controllers
cpu io memory cpu io memory
No controller is enabled by default. Controllers can be enabled and No controller is enabled by default. Controllers can be enabled and
disabled by writing to the "cgroup.subtree_control" file. disabled by writing to the "cgroup.subtree_control" file::
# echo "+cpu +memory -io" > cgroup.subtree_control # echo "+cpu +memory -io" > cgroup.subtree_control
...@@ -256,7 +267,7 @@ are specified, the last one is effective. ...@@ -256,7 +267,7 @@ are specified, the last one is effective.
Enabling a controller in a cgroup indicates that the distribution of Enabling a controller in a cgroup indicates that the distribution of
the target resource across its immediate children will be controlled. the target resource across its immediate children will be controlled.
Consider the following sub-hierarchy. The enabled controllers are Consider the following sub-hierarchy. The enabled controllers are
listed in parentheses. listed in parentheses::
A(cpu,memory) - B(memory) - C() A(cpu,memory) - B(memory) - C()
\ D() \ D()
...@@ -276,7 +287,8 @@ controller interface files - anything which doesn't start with ...@@ -276,7 +287,8 @@ controller interface files - anything which doesn't start with
"cgroup." are owned by the parent rather than the cgroup itself. "cgroup." are owned by the parent rather than the cgroup itself.
2-4-2. Top-down Constraint Top-down Constraint
~~~~~~~~~~~~~~~~~~~
Resources are distributed top-down and a cgroup can further distribute Resources are distributed top-down and a cgroup can further distribute
a resource only if the resource has been distributed to it from the a resource only if the resource has been distributed to it from the
...@@ -287,7 +299,8 @@ the parent has the controller enabled and a controller can't be ...@@ -287,7 +299,8 @@ the parent has the controller enabled and a controller can't be
disabled if one or more children have it enabled. disabled if one or more children have it enabled.
2-4-3. No Internal Process Constraint No Internal Process Constraint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Non-root cgroups can only distribute resources to their children when Non-root cgroups can only distribute resources to their children when
they don't have any processes of their own. In other words, only they don't have any processes of their own. In other words, only
...@@ -314,9 +327,11 @@ children before enabling controllers in its "cgroup.subtree_control" ...@@ -314,9 +327,11 @@ children before enabling controllers in its "cgroup.subtree_control"
file. file.
2-5. Delegation Delegation
----------
2-5-1. Model of Delegation Model of Delegation
~~~~~~~~~~~~~~~~~~~
A cgroup can be delegated in two ways. First, to a less privileged A cgroup can be delegated in two ways. First, to a less privileged
user by granting write access of the directory and its "cgroup.procs" user by granting write access of the directory and its "cgroup.procs"
...@@ -345,7 +360,8 @@ cgroups in or nesting depth of a delegated sub-hierarchy; however, ...@@ -345,7 +360,8 @@ cgroups in or nesting depth of a delegated sub-hierarchy; however,
this may be limited explicitly in the future. this may be limited explicitly in the future.
2-5-2. Delegation Containment Delegation Containment
~~~~~~~~~~~~~~~~~~~~~~
A delegated sub-hierarchy is contained in the sense that processes A delegated sub-hierarchy is contained in the sense that processes
can't be moved into or out of the sub-hierarchy by the delegatee. can't be moved into or out of the sub-hierarchy by the delegatee.
...@@ -366,7 +382,7 @@ in from or push out to outside the sub-hierarchy. ...@@ -366,7 +382,7 @@ in from or push out to outside the sub-hierarchy.
For an example, let's assume cgroups C0 and C1 have been delegated to For an example, let's assume cgroups C0 and C1 have been delegated to
user U0 who created C00, C01 under C0 and C10 under C1 as follows and user U0 who created C00, C01 under C0 and C10 under C1 as follows and
all processes under C0 and C1 belong to U0. all processes under C0 and C1 belong to U0::
~~~~~~~~~~~~~ - C0 - C00 ~~~~~~~~~~~~~ - C0 - C00
~ cgroup ~ \ C01 ~ cgroup ~ \ C01
...@@ -386,9 +402,11 @@ namespace of the process which is attempting the migration. If either ...@@ -386,9 +402,11 @@ namespace of the process which is attempting the migration. If either
is not reachable, the migration is rejected with -ENOENT. is not reachable, the migration is rejected with -ENOENT.
2-6. Guidelines Guidelines
----------
2-6-1. Organize Once and Control Organize Once and Control
~~~~~~~~~~~~~~~~~~~~~~~~~
Migrating a process across cgroups is a relatively expensive operation Migrating a process across cgroups is a relatively expensive operation
and stateful resources such as memory are not moved together with the and stateful resources such as memory are not moved together with the
...@@ -404,7 +422,8 @@ distribution can be made by changing controller configuration through ...@@ -404,7 +422,8 @@ distribution can be made by changing controller configuration through
the interface files. the interface files.
2-6-2. Avoid Name Collisions Avoid Name Collisions
~~~~~~~~~~~~~~~~~~~~~
Interface files for a cgroup and its children cgroups occupy the same Interface files for a cgroup and its children cgroups occupy the same
directory and it is possible to create children cgroups which collide directory and it is possible to create children cgroups which collide
...@@ -422,14 +441,16 @@ cgroup doesn't do anything to prevent name collisions and it's the ...@@ -422,14 +441,16 @@ cgroup doesn't do anything to prevent name collisions and it's the
user's responsibility to avoid them. user's responsibility to avoid them.
3. Resource Distribution Models Resource Distribution Models
============================
cgroup controllers implement several resource distribution schemes cgroup controllers implement several resource distribution schemes
depending on the resource type and expected use cases. This section depending on the resource type and expected use cases. This section
describes major schemes in use along with their expected behaviors. describes major schemes in use along with their expected behaviors.
3-1. Weights Weights
-------
A parent's resource is distributed by adding up the weights of all A parent's resource is distributed by adding up the weights of all
active children and giving each the fraction matching the ratio of its active children and giving each the fraction matching the ratio of its
...@@ -450,7 +471,8 @@ process migrations. ...@@ -450,7 +471,8 @@ process migrations.
and is an example of this type. and is an example of this type.
3-2. Limits Limits
------
A child can only consume upto the configured amount of the resource. A child can only consume upto the configured amount of the resource.
Limits can be over-committed - the sum of the limits of children can Limits can be over-committed - the sum of the limits of children can
...@@ -466,7 +488,8 @@ process migrations. ...@@ -466,7 +488,8 @@ process migrations.
on an IO device and is an example of this type. on an IO device and is an example of this type.
3-3. Protections Protections
-----------
A cgroup is protected to be allocated upto the configured amount of A cgroup is protected to be allocated upto the configured amount of
the resource if the usages of all its ancestors are under their the resource if the usages of all its ancestors are under their
...@@ -486,7 +509,8 @@ process migrations. ...@@ -486,7 +509,8 @@ process migrations.
example of this type. example of this type.
3-4. Allocations Allocations
-----------
A cgroup is exclusively allocated a certain amount of a finite A cgroup is exclusively allocated a certain amount of a finite
resource. Allocations can't be over-committed - the sum of the resource. Allocations can't be over-committed - the sum of the
...@@ -505,12 +529,14 @@ may be rejected. ...@@ -505,12 +529,14 @@ may be rejected.
type. type.
4. Interface Files Interface Files
===============
4-1. Format Format
------
All interface files should be in one of the following formats whenever All interface files should be in one of the following formats whenever
possible. possible::
New-line separated values New-line separated values
(when only one value can be written at once) (when only one value can be written at once)
...@@ -545,7 +571,8 @@ can be written at a time. For nested keyed files, the sub key pairs ...@@ -545,7 +571,8 @@ can be written at a time. For nested keyed files, the sub key pairs
may be specified in any order and not all pairs have to be specified. may be specified in any order and not all pairs have to be specified.
4-2. Conventions Conventions
-----------
- Settings for a single feature should be contained in a single file. - Settings for a single feature should be contained in a single file.
...@@ -581,25 +608,25 @@ may be specified in any order and not all pairs have to be specified. ...@@ -581,25 +608,25 @@ may be specified in any order and not all pairs have to be specified.
with "default" as the value must not appear when read. with "default" as the value must not appear when read.
For example, a setting which is keyed by major:minor device numbers For example, a setting which is keyed by major:minor device numbers
with integer values may look like the following. with integer values may look like the following::
# cat cgroup-example-interface-file # cat cgroup-example-interface-file
default 150 default 150
8:0 300 8:0 300
The default value can be updated by The default value can be updated by::
# echo 125 > cgroup-example-interface-file # echo 125 > cgroup-example-interface-file
or or::
# echo "default 125" > cgroup-example-interface-file # echo "default 125" > cgroup-example-interface-file
An override can be set by An override can be set by::
# echo "8:16 170" > cgroup-example-interface-file # echo "8:16 170" > cgroup-example-interface-file
and cleared by and cleared by::
# echo "8:0 default" > cgroup-example-interface-file # echo "8:0 default" > cgroup-example-interface-file
# cat cgroup-example-interface-file # cat cgroup-example-interface-file
...@@ -612,12 +639,12 @@ may be specified in any order and not all pairs have to be specified. ...@@ -612,12 +639,12 @@ may be specified in any order and not all pairs have to be specified.
generated on the file. generated on the file.
4-3. Core Interface Files Core Interface Files
--------------------
All cgroup core files are prefixed with "cgroup." All cgroup core files are prefixed with "cgroup."
cgroup.procs cgroup.procs
A read-write new-line separated values file which exists on A read-write new-line separated values file which exists on
all cgroups. all cgroups.
...@@ -643,7 +670,6 @@ All cgroup core files are prefixed with "cgroup." ...@@ -643,7 +670,6 @@ All cgroup core files are prefixed with "cgroup."
should be granted along with the containing directory. should be granted along with the containing directory.
cgroup.controllers cgroup.controllers
A read-only space separated values file which exists on all A read-only space separated values file which exists on all
cgroups. cgroups.
...@@ -651,7 +677,6 @@ All cgroup core files are prefixed with "cgroup." ...@@ -651,7 +677,6 @@ All cgroup core files are prefixed with "cgroup."
the cgroup. The controllers are not ordered. the cgroup. The controllers are not ordered.
cgroup.subtree_control cgroup.subtree_control
A read-write space separated values file which exists on all A read-write space separated values file which exists on all
cgroups. Starts out empty. cgroups. Starts out empty.
...@@ -667,23 +692,25 @@ All cgroup core files are prefixed with "cgroup." ...@@ -667,23 +692,25 @@ All cgroup core files are prefixed with "cgroup."
operations are specified, either all succeed or all fail. operations are specified, either all succeed or all fail.
cgroup.events cgroup.events
A read-only flat-keyed file which exists on non-root cgroups. A read-only flat-keyed file which exists on non-root cgroups.
The following entries are defined. Unless specified The following entries are defined. Unless specified
otherwise, a value change in this file generates a file otherwise, a value change in this file generates a file
modified event. modified event.
populated populated
1 if the cgroup or its descendants contains any live 1 if the cgroup or its descendants contains any live
processes; otherwise, 0. processes; otherwise, 0.
5. Controllers Controllers
===========
5-1. CPU CPU
---
[NOTE: The interface for the cpu controller hasn't been merged yet] .. note::
The interface for the cpu controller hasn't been merged yet
The "cpu" controllers regulates distribution of CPU cycles. This The "cpu" controllers regulates distribution of CPU cycles. This
controller implements weight and absolute bandwidth limit models for controller implements weight and absolute bandwidth limit models for
...@@ -691,36 +718,34 @@ normal scheduling policy and absolute bandwidth allocation model for ...@@ -691,36 +718,34 @@ normal scheduling policy and absolute bandwidth allocation model for
realtime scheduling policy. realtime scheduling policy.
5-1-1. CPU Interface Files CPU Interface Files
~~~~~~~~~~~~~~~~~~~
All time durations are in microseconds. All time durations are in microseconds.
cpu.stat cpu.stat
A read-only flat-keyed file which exists on non-root cgroups. A read-only flat-keyed file which exists on non-root cgroups.
It reports the following six stats. It reports the following six stats:
usage_usec - usage_usec
user_usec - user_usec
system_usec - system_usec
nr_periods - nr_periods
nr_throttled - nr_throttled
throttled_usec - throttled_usec
cpu.weight cpu.weight
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
cgroups. The default is "100". cgroups. The default is "100".
The weight in the range [1, 10000]. The weight in the range [1, 10000].
cpu.max cpu.max
A read-write two value file which exists on non-root cgroups. A read-write two value file which exists on non-root cgroups.
The default is "max 100000". The default is "max 100000".
The maximum bandwidth limit. It's in the following format. The maximum bandwidth limit. It's in the following format::
$MAX $PERIOD $MAX $PERIOD
...@@ -729,9 +754,10 @@ All time durations are in microseconds. ...@@ -729,9 +754,10 @@ All time durations are in microseconds.
one number is written, $MAX is updated. one number is written, $MAX is updated.
cpu.rt.max cpu.rt.max
.. note::
[NOTE: The semantics of this file is still under discussion and the The semantics of this file is still under discussion and the
interface hasn't been merged yet] interface hasn't been merged yet
A read-write two value file which exists on all cgroups. A read-write two value file which exists on all cgroups.
The default is "0 100000". The default is "0 100000".
...@@ -739,7 +765,7 @@ All time durations are in microseconds. ...@@ -739,7 +765,7 @@ All time durations are in microseconds.
The maximum realtime runtime allocation. Over-committing The maximum realtime runtime allocation. Over-committing
configurations are disallowed and process migrations are configurations are disallowed and process migrations are
rejected if not enough bandwidth is available. It's in the rejected if not enough bandwidth is available. It's in the
following format. following format::
$MAX $PERIOD $MAX $PERIOD
...@@ -748,7 +774,8 @@ All time durations are in microseconds. ...@@ -748,7 +774,8 @@ All time durations are in microseconds.
updated. updated.
5-2. Memory Memory
------
The "memory" controller regulates distribution of memory. Memory is The "memory" controller regulates distribution of memory. Memory is
stateful and implements both limit and protection models. Due to the stateful and implements both limit and protection models. Due to the
...@@ -770,14 +797,14 @@ following types of memory usages are tracked. ...@@ -770,14 +797,14 @@ following types of memory usages are tracked.
The above list may expand in the future for better coverage. The above list may expand in the future for better coverage.
5-2-1. Memory Interface Files Memory Interface Files
~~~~~~~~~~~~~~~~~~~~~~
All memory amounts are in bytes. If a value which is not aligned to All memory amounts are in bytes. If a value which is not aligned to
PAGE_SIZE is written, the value may be rounded up to the closest PAGE_SIZE is written, the value may be rounded up to the closest
PAGE_SIZE multiple when read back. PAGE_SIZE multiple when read back.
memory.current memory.current
A read-only single value file which exists on non-root A read-only single value file which exists on non-root
cgroups. cgroups.
...@@ -785,7 +812,6 @@ PAGE_SIZE multiple when read back. ...@@ -785,7 +812,6 @@ PAGE_SIZE multiple when read back.
and its descendants. and its descendants.
memory.low memory.low
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
cgroups. The default is "0". cgroups. The default is "0".
...@@ -798,7 +824,6 @@ PAGE_SIZE multiple when read back. ...@@ -798,7 +824,6 @@ PAGE_SIZE multiple when read back.
protection is discouraged. protection is discouraged.
memory.high memory.high
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
cgroups. The default is "max". cgroups. The default is "max".
...@@ -811,7 +836,6 @@ PAGE_SIZE multiple when read back. ...@@ -811,7 +836,6 @@ PAGE_SIZE multiple when read back.
under extreme conditions the limit may be breached. under extreme conditions the limit may be breached.
memory.max memory.max
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
cgroups. The default is "max". cgroups. The default is "max".
...@@ -826,21 +850,18 @@ PAGE_SIZE multiple when read back. ...@@ -826,21 +850,18 @@ PAGE_SIZE multiple when read back.
utility is limited to providing the final safety net. utility is limited to providing the final safety net.
memory.events memory.events
A read-only flat-keyed file which exists on non-root cgroups. A read-only flat-keyed file which exists on non-root cgroups.
The following entries are defined. Unless specified The following entries are defined. Unless specified
otherwise, a value change in this file generates a file otherwise, a value change in this file generates a file
modified event. modified event.
low low
The number of times the cgroup is reclaimed due to The number of times the cgroup is reclaimed due to
high memory pressure even though its usage is under high memory pressure even though its usage is under
the low boundary. This usually indicates that the low the low boundary. This usually indicates that the low
boundary is over-committed. boundary is over-committed.
high high
The number of times processes of the cgroup are The number of times processes of the cgroup are
throttled and routed to perform direct memory reclaim throttled and routed to perform direct memory reclaim
because the high memory boundary was exceeded. For a because the high memory boundary was exceeded. For a
...@@ -849,13 +870,11 @@ PAGE_SIZE multiple when read back. ...@@ -849,13 +870,11 @@ PAGE_SIZE multiple when read back.
occurrences are expected. occurrences are expected.
max max
The number of times the cgroup's memory usage was The number of times the cgroup's memory usage was
about to go over the max boundary. If direct reclaim about to go over the max boundary. If direct reclaim
fails to bring it down, the cgroup goes to OOM state. fails to bring it down, the cgroup goes to OOM state.
oom oom
The number of time the cgroup's memory usage was The number of time the cgroup's memory usage was
reached the limit and allocation was about to fail. reached the limit and allocation was about to fail.
...@@ -868,12 +887,10 @@ PAGE_SIZE multiple when read back. ...@@ -868,12 +887,10 @@ PAGE_SIZE multiple when read back.
tasks iff shortage has happened inside page fault. tasks iff shortage has happened inside page fault.
oom_kill oom_kill
The number of processes belonging to this cgroup The number of processes belonging to this cgroup
killed by any kind of OOM killer. killed by any kind of OOM killer.
memory.stat memory.stat
A read-only flat-keyed file which exists on non-root cgroups. A read-only flat-keyed file which exists on non-root cgroups.
This breaks down the cgroup's memory footprint into different This breaks down the cgroup's memory footprint into different
...@@ -887,73 +904,55 @@ PAGE_SIZE multiple when read back. ...@@ -887,73 +904,55 @@ PAGE_SIZE multiple when read back.
fixed position; use the keys to look up specific values! fixed position; use the keys to look up specific values!
anon anon
Amount of memory used in anonymous mappings such as Amount of memory used in anonymous mappings such as
brk(), sbrk(), and mmap(MAP_ANONYMOUS) brk(), sbrk(), and mmap(MAP_ANONYMOUS)
file file
Amount of memory used to cache filesystem data, Amount of memory used to cache filesystem data,
including tmpfs and shared memory. including tmpfs and shared memory.
kernel_stack kernel_stack
Amount of memory allocated to kernel stacks. Amount of memory allocated to kernel stacks.
slab slab
Amount of memory used for storing in-kernel data Amount of memory used for storing in-kernel data
structures. structures.
sock sock
Amount of memory used in network transmission buffers Amount of memory used in network transmission buffers
shmem shmem
Amount of cached filesystem data that is swap-backed, Amount of cached filesystem data that is swap-backed,
such as tmpfs, shm segments, shared anonymous mmap()s such as tmpfs, shm segments, shared anonymous mmap()s
file_mapped file_mapped
Amount of cached filesystem data mapped with mmap() Amount of cached filesystem data mapped with mmap()
file_dirty file_dirty
Amount of cached filesystem data that was modified but Amount of cached filesystem data that was modified but
not yet written back to disk not yet written back to disk
file_writeback file_writeback
Amount of cached filesystem data that was modified and Amount of cached filesystem data that was modified and
is currently being written back to disk is currently being written back to disk
inactive_anon inactive_anon, active_anon, inactive_file, active_file, unevictable
active_anon
inactive_file
active_file
unevictable
Amount of memory, swap-backed and filesystem-backed, Amount of memory, swap-backed and filesystem-backed,
on the internal memory management lists used by the on the internal memory management lists used by the
page reclaim algorithm page reclaim algorithm
slab_reclaimable slab_reclaimable
Part of "slab" that might be reclaimed, such as Part of "slab" that might be reclaimed, such as
dentries and inodes. dentries and inodes.
slab_unreclaimable slab_unreclaimable
Part of "slab" that cannot be reclaimed on memory Part of "slab" that cannot be reclaimed on memory
pressure. pressure.
pgfault pgfault
Total number of page faults incurred Total number of page faults incurred
pgmajfault pgmajfault
Number of major page faults incurred Number of major page faults incurred
workingset_refault workingset_refault
...@@ -997,7 +996,6 @@ PAGE_SIZE multiple when read back. ...@@ -997,7 +996,6 @@ PAGE_SIZE multiple when read back.
Amount of reclaimed lazyfree pages Amount of reclaimed lazyfree pages
memory.swap.current memory.swap.current
A read-only single value file which exists on non-root A read-only single value file which exists on non-root
cgroups. cgroups.
...@@ -1005,7 +1003,6 @@ PAGE_SIZE multiple when read back. ...@@ -1005,7 +1003,6 @@ PAGE_SIZE multiple when read back.
and its descendants. and its descendants.
memory.swap.max memory.swap.max
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
cgroups. The default is "max". cgroups. The default is "max".
...@@ -1013,7 +1010,8 @@ PAGE_SIZE multiple when read back. ...@@ -1013,7 +1010,8 @@ PAGE_SIZE multiple when read back.
limit, anonymous meomry of the cgroup will not be swapped out. limit, anonymous meomry of the cgroup will not be swapped out.
5-2-2. Usage Guidelines Usage Guidelines
~~~~~~~~~~~~~~~~
"memory.high" is the main mechanism to control memory usage. "memory.high" is the main mechanism to control memory usage.
Over-committing on high limit (sum of high limits > available memory) Over-committing on high limit (sum of high limits > available memory)
...@@ -1036,7 +1034,8 @@ memory; unfortunately, memory pressure monitoring mechanism isn't ...@@ -1036,7 +1034,8 @@ memory; unfortunately, memory pressure monitoring mechanism isn't
implemented yet. implemented yet.
5-2-3. Memory Ownership Memory Ownership
~~~~~~~~~~~~~~~~
A memory area is charged to the cgroup which instantiated it and stays A memory area is charged to the cgroup which instantiated it and stays
charged to the cgroup until the area is released. Migrating a process charged to the cgroup until the area is released. Migrating a process
...@@ -1054,7 +1053,8 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas ...@@ -1054,7 +1053,8 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
belonging to the affected files to ensure correct memory ownership. belonging to the affected files to ensure correct memory ownership.
5-3. IO IO
--
The "io" controller regulates the distribution of IO resources. This The "io" controller regulates the distribution of IO resources. This
controller implements both weight based and absolute bandwidth or IOPS controller implements both weight based and absolute bandwidth or IOPS
...@@ -1063,28 +1063,29 @@ only if cfq-iosched is in use and neither scheme is available for ...@@ -1063,28 +1063,29 @@ only if cfq-iosched is in use and neither scheme is available for
blk-mq devices. blk-mq devices.
5-3-1. IO Interface Files IO Interface Files
~~~~~~~~~~~~~~~~~~
io.stat io.stat
A read-only nested-keyed file which exists on non-root A read-only nested-keyed file which exists on non-root
cgroups. cgroups.
Lines are keyed by $MAJ:$MIN device numbers and not ordered. Lines are keyed by $MAJ:$MIN device numbers and not ordered.
The following nested keys are defined. The following nested keys are defined.
====== ===================
rbytes Bytes read rbytes Bytes read
wbytes Bytes written wbytes Bytes written
rios Number of read IOs rios Number of read IOs
wios Number of write IOs wios Number of write IOs
====== ===================
An example read output follows. An example read output follows:
8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353
8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252
io.weight io.weight
A read-write flat-keyed file which exists on non-root cgroups. A read-write flat-keyed file which exists on non-root cgroups.
The default is "default 100". The default is "default 100".
...@@ -1098,14 +1099,13 @@ blk-mq devices. ...@@ -1098,14 +1099,13 @@ blk-mq devices.
$WEIGHT" or simply "$WEIGHT". Overrides can be set by writing $WEIGHT" or simply "$WEIGHT". Overrides can be set by writing
"$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default". "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default".
An example read output follows. An example read output follows::
default 100 default 100
8:16 200 8:16 200
8:0 50 8:0 50
io.max io.max
A read-write nested-keyed file which exists on non-root A read-write nested-keyed file which exists on non-root
cgroups. cgroups.
...@@ -1113,10 +1113,12 @@ blk-mq devices. ...@@ -1113,10 +1113,12 @@ blk-mq devices.
device numbers and not ordered. The following nested keys are device numbers and not ordered. The following nested keys are
defined. defined.
===== ==================================
rbps Max read bytes per second rbps Max read bytes per second
wbps Max write bytes per second wbps Max write bytes per second
riops Max read IO operations per second riops Max read IO operations per second
wiops Max write IO operations per second wiops Max write IO operations per second
===== ==================================
When writing, any number of nested key-value pairs can be When writing, any number of nested key-value pairs can be
specified in any order. "max" can be specified as the value specified in any order. "max" can be specified as the value
...@@ -1126,24 +1128,25 @@ blk-mq devices. ...@@ -1126,24 +1128,25 @@ blk-mq devices.
BPS and IOPS are measured in each IO direction and IOs are BPS and IOPS are measured in each IO direction and IOs are
delayed if limit is reached. Temporary bursts are allowed. delayed if limit is reached. Temporary bursts are allowed.
Setting read limit at 2M BPS and write at 120 IOPS for 8:16. Setting read limit at 2M BPS and write at 120 IOPS for 8:16::
echo "8:16 rbps=2097152 wiops=120" > io.max echo "8:16 rbps=2097152 wiops=120" > io.max
Reading returns the following. Reading returns the following::
8:16 rbps=2097152 wbps=max riops=max wiops=120 8:16 rbps=2097152 wbps=max riops=max wiops=120
Write IOPS limit can be removed by writing the following. Write IOPS limit can be removed by writing the following::
echo "8:16 wiops=max" > io.max echo "8:16 wiops=max" > io.max
Reading now returns the following. Reading now returns the following::
8:16 rbps=2097152 wbps=max riops=max wiops=max 8:16 rbps=2097152 wbps=max riops=max wiops=max
5-3-2. Writeback Writeback
~~~~~~~~~
Page cache is dirtied through buffered writes and shared mmaps and Page cache is dirtied through buffered writes and shared mmaps and
written asynchronously to the backing filesystem by the writeback written asynchronously to the backing filesystem by the writeback
...@@ -1191,22 +1194,19 @@ patterns. ...@@ -1191,22 +1194,19 @@ patterns.
The sysctl knobs which affect writeback behavior are applied to cgroup The sysctl knobs which affect writeback behavior are applied to cgroup
writeback as follows. writeback as follows.
vm.dirty_background_ratio vm.dirty_background_ratio, vm.dirty_ratio
vm.dirty_ratio
These ratios apply the same to cgroup writeback with the These ratios apply the same to cgroup writeback with the
amount of available memory capped by limits imposed by the amount of available memory capped by limits imposed by the
memory controller and system-wide clean memory. memory controller and system-wide clean memory.
vm.dirty_background_bytes vm.dirty_background_bytes, vm.dirty_bytes
vm.dirty_bytes
For cgroup writeback, this is calculated into ratio against For cgroup writeback, this is calculated into ratio against
total available memory and applied the same way as total available memory and applied the same way as
vm.dirty[_background]_ratio. vm.dirty[_background]_ratio.
5-4. PID PID
---
The process number controller is used to allow a cgroup to stop any The process number controller is used to allow a cgroup to stop any
new tasks from being fork()'d or clone()'d after a specified limit is new tasks from being fork()'d or clone()'d after a specified limit is
...@@ -1221,17 +1221,16 @@ Note that PIDs used in this controller refer to TIDs, process IDs as ...@@ -1221,17 +1221,16 @@ Note that PIDs used in this controller refer to TIDs, process IDs as
used by the kernel. used by the kernel.
5-4-1. PID Interface Files PID Interface Files
~~~~~~~~~~~~~~~~~~~
pids.max pids.max
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
cgroups. The default is "max". cgroups. The default is "max".
Hard limit of number of processes. Hard limit of number of processes.
pids.current pids.current
A read-only single value file which exists on all cgroups. A read-only single value file which exists on all cgroups.
The number of processes currently in the cgroup and its The number of processes currently in the cgroup and its
...@@ -1246,12 +1245,14 @@ through fork() or clone(). These will return -EAGAIN if the creation ...@@ -1246,12 +1245,14 @@ through fork() or clone(). These will return -EAGAIN if the creation
of a new process would cause a cgroup policy to be violated. of a new process would cause a cgroup policy to be violated.
5-5. RDMA RDMA
----
The "rdma" controller regulates the distribution and accounting of The "rdma" controller regulates the distribution and accounting of
of RDMA resources. of RDMA resources.
5-5-1. RDMA Interface Files RDMA Interface Files
~~~~~~~~~~~~~~~~~~~~
rdma.max rdma.max
A readwrite nested-keyed file that exists for all the cgroups A readwrite nested-keyed file that exists for all the cgroups
...@@ -1264,10 +1265,12 @@ of RDMA resources. ...@@ -1264,10 +1265,12 @@ of RDMA resources.
The following nested keys are defined. The following nested keys are defined.
========== =============================
hca_handle Maximum number of HCA Handles hca_handle Maximum number of HCA Handles
hca_object Maximum number of HCA Objects hca_object Maximum number of HCA Objects
========== =============================
An example for mlx4 and ocrdma device follows. An example for mlx4 and ocrdma device follows::
mlx4_0 hca_handle=2 hca_object=2000 mlx4_0 hca_handle=2 hca_object=2000
ocrdma1 hca_handle=3 hca_object=max ocrdma1 hca_handle=3 hca_object=max
...@@ -1276,15 +1279,17 @@ of RDMA resources. ...@@ -1276,15 +1279,17 @@ of RDMA resources.
A read-only file that describes current resource usage. A read-only file that describes current resource usage.
It exists for all the cgroup except root. It exists for all the cgroup except root.
An example for mlx4 and ocrdma device follows. An example for mlx4 and ocrdma device follows::
mlx4_0 hca_handle=1 hca_object=20 mlx4_0 hca_handle=1 hca_object=20
ocrdma1 hca_handle=1 hca_object=23 ocrdma1 hca_handle=1 hca_object=23
5-6. Misc Misc
----
5-6-1. perf_event perf_event
~~~~~~~~~~
perf_event controller, if not mounted on a legacy hierarchy, is perf_event controller, if not mounted on a legacy hierarchy, is
automatically enabled on the v2 hierarchy so that perf events can automatically enabled on the v2 hierarchy so that perf events can
...@@ -1292,9 +1297,11 @@ always be filtered by cgroup v2 path. The controller can still be ...@@ -1292,9 +1297,11 @@ always be filtered by cgroup v2 path. The controller can still be
moved to a legacy hierarchy after v2 hierarchy is populated. moved to a legacy hierarchy after v2 hierarchy is populated.
6. Namespace Namespace
=========
6-1. Basics Basics
------
cgroup namespace provides a mechanism to virtualize the view of the cgroup namespace provides a mechanism to virtualize the view of the
"/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone "/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone
...@@ -1308,7 +1315,7 @@ Without cgroup namespace, the "/proc/$PID/cgroup" file shows the ...@@ -1308,7 +1315,7 @@ Without cgroup namespace, the "/proc/$PID/cgroup" file shows the
complete path of the cgroup of a process. In a container setup where complete path of the cgroup of a process. In a container setup where
a set of cgroups and namespaces are intended to isolate processes the a set of cgroups and namespaces are intended to isolate processes the
"/proc/$PID/cgroup" file may leak potential system level information "/proc/$PID/cgroup" file may leak potential system level information
to the isolated processes. For Example: to the isolated processes. For Example::
# cat /proc/self/cgroup # cat /proc/self/cgroup
0::/batchjobs/container_id1 0::/batchjobs/container_id1
...@@ -1316,14 +1323,14 @@ to the isolated processes. For Example: ...@@ -1316,14 +1323,14 @@ to the isolated processes. For Example:
The path '/batchjobs/container_id1' can be considered as system-data The path '/batchjobs/container_id1' can be considered as system-data
and undesirable to expose to the isolated processes. cgroup namespace and undesirable to expose to the isolated processes. cgroup namespace
can be used to restrict visibility of this path. For example, before can be used to restrict visibility of this path. For example, before
creating a cgroup namespace, one would see: creating a cgroup namespace, one would see::
# ls -l /proc/self/ns/cgroup # ls -l /proc/self/ns/cgroup
lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
# cat /proc/self/cgroup # cat /proc/self/cgroup
0::/batchjobs/container_id1 0::/batchjobs/container_id1
After unsharing a new namespace, the view changes. After unsharing a new namespace, the view changes::
# ls -l /proc/self/ns/cgroup # ls -l /proc/self/ns/cgroup
lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
...@@ -1341,7 +1348,8 @@ namespace is destroyed. The cgroupns root and the actual cgroups ...@@ -1341,7 +1348,8 @@ namespace is destroyed. The cgroupns root and the actual cgroups
remain. remain.
6-2. The Root and Views The Root and Views
------------------
The 'cgroupns root' for a cgroup namespace is the cgroup in which the The 'cgroupns root' for a cgroup namespace is the cgroup in which the
process calling unshare(2) is running. For example, if a process in process calling unshare(2) is running. For example, if a process in
...@@ -1350,7 +1358,7 @@ process calling unshare(2) is running. For example, if a process in ...@@ -1350,7 +1358,7 @@ process calling unshare(2) is running. For example, if a process in
init_cgroup_ns, this is the real root ('/') cgroup. init_cgroup_ns, this is the real root ('/') cgroup.
The cgroupns root cgroup does not change even if the namespace creator The cgroupns root cgroup does not change even if the namespace creator
process later moves to a different cgroup. process later moves to a different cgroup::
# ~/unshare -c # unshare cgroupns in some cgroup # ~/unshare -c # unshare cgroupns in some cgroup
# cat /proc/self/cgroup # cat /proc/self/cgroup
...@@ -1364,7 +1372,7 @@ Each process gets its namespace-specific view of "/proc/$PID/cgroup" ...@@ -1364,7 +1372,7 @@ Each process gets its namespace-specific view of "/proc/$PID/cgroup"
Processes running inside the cgroup namespace will be able to see Processes running inside the cgroup namespace will be able to see
cgroup paths (in /proc/self/cgroup) only inside their root cgroup. cgroup paths (in /proc/self/cgroup) only inside their root cgroup.
From within an unshared cgroupns: From within an unshared cgroupns::
# sleep 100000 & # sleep 100000 &
[1] 7353 [1] 7353
...@@ -1373,7 +1381,7 @@ From within an unshared cgroupns: ...@@ -1373,7 +1381,7 @@ From within an unshared cgroupns:
0::/sub_cgrp_1 0::/sub_cgrp_1
From the initial cgroup namespace, the real cgroup path will be From the initial cgroup namespace, the real cgroup path will be
visible: visible::
$ cat /proc/7353/cgroup $ cat /proc/7353/cgroup
0::/batchjobs/container_id1/sub_cgrp_1 0::/batchjobs/container_id1/sub_cgrp_1
...@@ -1381,7 +1389,7 @@ visible: ...@@ -1381,7 +1389,7 @@ visible:
From a sibling cgroup namespace (that is, a namespace rooted at a From a sibling cgroup namespace (that is, a namespace rooted at a
different cgroup), the cgroup path relative to its own cgroup different cgroup), the cgroup path relative to its own cgroup
namespace root will be shown. For instance, if PID 7353's cgroup namespace root will be shown. For instance, if PID 7353's cgroup
namespace root is at '/batchjobs/container_id2', then it will see namespace root is at '/batchjobs/container_id2', then it will see::
# cat /proc/7353/cgroup # cat /proc/7353/cgroup
0::/../container_id2/sub_cgrp_1 0::/../container_id2/sub_cgrp_1
...@@ -1390,13 +1398,14 @@ Note that the relative path always starts with '/' to indicate that ...@@ -1390,13 +1398,14 @@ Note that the relative path always starts with '/' to indicate that
its relative to the cgroup namespace root of the caller. its relative to the cgroup namespace root of the caller.
6-3. Migration and setns(2) Migration and setns(2)
----------------------
Processes inside a cgroup namespace can move into and out of the Processes inside a cgroup namespace can move into and out of the
namespace root if they have proper access to external cgroups. For namespace root if they have proper access to external cgroups. For
example, from inside a namespace with cgroupns root at example, from inside a namespace with cgroupns root at
/batchjobs/container_id1, and assuming that the global hierarchy is /batchjobs/container_id1, and assuming that the global hierarchy is
still accessible inside cgroupns: still accessible inside cgroupns::
# cat /proc/7353/cgroup # cat /proc/7353/cgroup
0::/sub_cgrp_1 0::/sub_cgrp_1
...@@ -1418,10 +1427,11 @@ namespace. It is expected that the someone moves the attaching ...@@ -1418,10 +1427,11 @@ namespace. It is expected that the someone moves the attaching
process under the target cgroup namespace root. process under the target cgroup namespace root.
6-4. Interaction with Other Namespaces Interaction with Other Namespaces
---------------------------------
Namespace specific cgroup hierarchy can be mounted by a process Namespace specific cgroup hierarchy can be mounted by a process
running inside a non-init cgroup namespace. running inside a non-init cgroup namespace::
# mount -t cgroup2 none $MOUNT_POINT # mount -t cgroup2 none $MOUNT_POINT
...@@ -1434,27 +1444,27 @@ the view of cgroup hierarchy by namespace-private cgroupfs mount ...@@ -1434,27 +1444,27 @@ the view of cgroup hierarchy by namespace-private cgroupfs mount
provides a properly isolated cgroup view inside the container. provides a properly isolated cgroup view inside the container.
P. Information on Kernel Programming Information on Kernel Programming
=================================
This section contains kernel programming information in the areas This section contains kernel programming information in the areas
where interacting with cgroup is necessary. cgroup core and where interacting with cgroup is necessary. cgroup core and
controllers are not covered. controllers are not covered.
P-1. Filesystem Support for Writeback Filesystem Support for Writeback
--------------------------------
A filesystem can support cgroup writeback by updating A filesystem can support cgroup writeback by updating
address_space_operations->writepage[s]() to annotate bio's using the address_space_operations->writepage[s]() to annotate bio's using the
following two functions. following two functions.
wbc_init_bio(@wbc, @bio) wbc_init_bio(@wbc, @bio)
Should be called for each bio carrying writeback data and Should be called for each bio carrying writeback data and
associates the bio with the inode's owner cgroup. Can be associates the bio with the inode's owner cgroup. Can be
called anytime between bio allocation and submission. called anytime between bio allocation and submission.
wbc_account_io(@wbc, @page, @bytes) wbc_account_io(@wbc, @page, @bytes)
Should be called for each data segment being written out. Should be called for each data segment being written out.
While this function doesn't care exactly when it's called While this function doesn't care exactly when it's called
during the writeback session, it's the easiest and most during the writeback session, it's the easiest and most
...@@ -1475,7 +1485,8 @@ cases by skipping wbc_init_bio() or using bio_associate_blkcg() ...@@ -1475,7 +1485,8 @@ cases by skipping wbc_init_bio() or using bio_associate_blkcg()
directly. directly.
D. Deprecated v1 Core Features Deprecated v1 Core Features
===========================
- Multiple hierarchies including named ones are not supported. - Multiple hierarchies including named ones are not supported.
...@@ -1489,9 +1500,11 @@ D. Deprecated v1 Core Features ...@@ -1489,9 +1500,11 @@ D. Deprecated v1 Core Features
at the root instead. at the root instead.
R. Issues with v1 and Rationales for v2 Issues with v1 and Rationales for v2
====================================
R-1. Multiple Hierarchies Multiple Hierarchies
--------------------
cgroup v1 allowed an arbitrary number of hierarchies and each cgroup v1 allowed an arbitrary number of hierarchies and each
hierarchy could host any number of controllers. While this seemed to hierarchy could host any number of controllers. While this seemed to
...@@ -1543,7 +1556,8 @@ how memory is distributed beyond a certain level while still wanting ...@@ -1543,7 +1556,8 @@ how memory is distributed beyond a certain level while still wanting
to control how CPU cycles are distributed. to control how CPU cycles are distributed.
R-2. Thread Granularity Thread Granularity
------------------
cgroup v1 allowed threads of a process to belong to different cgroups. cgroup v1 allowed threads of a process to belong to different cgroups.
This didn't make sense for some controllers and those controllers This didn't make sense for some controllers and those controllers
...@@ -1586,7 +1600,8 @@ misbehaving and poorly abstracted interfaces and kernel exposing and ...@@ -1586,7 +1600,8 @@ misbehaving and poorly abstracted interfaces and kernel exposing and
locked into constructs inadvertently. locked into constructs inadvertently.
R-3. Competition Between Inner Nodes and Threads Competition Between Inner Nodes and Threads
-------------------------------------------
cgroup v1 allowed threads to be in any cgroups which created an cgroup v1 allowed threads to be in any cgroups which created an
interesting problem where threads belonging to a parent cgroup and its interesting problem where threads belonging to a parent cgroup and its
...@@ -1605,7 +1620,7 @@ simply weren't available for threads. ...@@ -1605,7 +1620,7 @@ simply weren't available for threads.
The io controller implicitly created a hidden leaf node for each The io controller implicitly created a hidden leaf node for each
cgroup to host the threads. The hidden leaf had its own copies of all cgroup to host the threads. The hidden leaf had its own copies of all
the knobs with "leaf_" prefixed. While this allowed equivalent the knobs with ``leaf_`` prefixed. While this allowed equivalent
control over internal threads, it was with serious drawbacks. It control over internal threads, it was with serious drawbacks. It
always added an extra layer of nesting which wouldn't be necessary always added an extra layer of nesting which wouldn't be necessary
otherwise, made the interface messy and significantly complicated the otherwise, made the interface messy and significantly complicated the
...@@ -1626,7 +1641,8 @@ This clearly is a problem which needs to be addressed from cgroup core ...@@ -1626,7 +1641,8 @@ This clearly is a problem which needs to be addressed from cgroup core
in a uniform way. in a uniform way.
R-4. Other Interface Issues Other Interface Issues
----------------------
cgroup v1 grew without oversight and developed a large number of cgroup v1 grew without oversight and developed a large number of
idiosyncrasies and inconsistencies. One issue on the cgroup core side idiosyncrasies and inconsistencies. One issue on the cgroup core side
...@@ -1654,9 +1670,11 @@ cgroup v2 establishes common conventions where appropriate and updates ...@@ -1654,9 +1670,11 @@ cgroup v2 establishes common conventions where appropriate and updates
controllers so that they expose minimal and consistent interfaces. controllers so that they expose minimal and consistent interfaces.
R-5. Controller Issues and Remedies Controller Issues and Remedies
------------------------------
R-5-1. Memory Memory
~~~~~~
The original lower boundary, the soft limit, is defined as a limit The original lower boundary, the soft limit, is defined as a limit
that is per default unset. As a result, the set of cgroups that that is per default unset. As a result, the set of cgroups that
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment