Commit 47dfe403 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup changes from Tejun Heo:
 "Mostly changes to get the v2 interface ready.  The core features are
  mostly ready now and I think it's reasonable to expect to drop the
  devel mask in one or two devel cycles at least for a subset of
  controllers.

   - cgroup added a controller dependency mechanism so that block cgroup
     can depend on memory cgroup.  This will be used to finally support
     IO provisioning on the writeback traffic, which is currently being
     implemented.

   - The v2 interface now uses a separate table so that the interface
     files for the new interface are explicitly declared in one place.
     Each controller will explicitly review and add the files for the
     new interface.

   - cpuset is getting ready for the hierarchical behavior which is in
     the similar style with other controllers so that an ancestor's
     configuration change doesn't change the descendants' configurations
     irreversibly and processes aren't silently migrated when a CPU or
     node goes down.

  All the changes are to the new interface and no behavior changed for
  the multiple hierarchies"

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits)
  cpuset: fix the WARN_ON() in update_nodemasks_hier()
  cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test
  cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core
  cgroup: distinguish the default and legacy hierarchies when handling cftypes
  cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()
  cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes
  cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[]
  cpuset: export effective masks to userspace
  cpuset: allow writing offlined masks to cpuset.cpus/mems
  cpuset: enable onlined cpu/node in effective masks
  cpuset: refactor cpuset_hotplug_update_tasks()
  cpuset: make cs->{cpus, mems}_allowed as user-configured masks
  cpuset: apply cs->effective_{cpus,mems}
  cpuset: initialize top_cpuset's configured masks at mount
  cpuset: use effective cpumask to build sched domains
  cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty
  cpuset: update cs->effective_{cpus, mems} when config changes
  cpuset: update cpuset->effective_{cpus,mems} at hotplug
  cpuset: add cs->effective_cpus and cs->effective_mems
  cgroup: clean up sane_behavior handling
  ...
parents f2a84170 a1381268
......@@ -599,6 +599,20 @@ fork. If this method returns 0 (success) then this should remain valid
while the caller holds cgroup_mutex and it is ensured that either
attach() or cancel_attach() will be called in future.
void css_reset(struct cgroup_subsys_state *css)
(cgroup_mutex held by caller)
An optional operation which should restore @css's configuration to the
initial state. This is currently only used on the unified hierarchy
when a subsystem is disabled on a cgroup through
"cgroup.subtree_control" but should remain enabled because other
subsystems depend on it. cgroup core makes such a css invisible by
removing the associated interface files and invokes this callback so
that the hidden subsystem can return to the initial neutral state.
This prevents unexpected resource control from a hidden css and
ensures that the configuration is in the initial state when it is made
visible again later.
void cancel_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
(cgroup_mutex held by caller)
......
......@@ -94,12 +94,35 @@ change soon.
mount -t cgroup -o __DEVEL__sane_behavior cgroup $MOUNT_POINT
All controllers which are not bound to other hierarchies are
automatically bound to unified hierarchy and show up at the root of
it. Controllers which are enabled only in the root of unified
hierarchy can be bound to other hierarchies at any time. This allows
mixing unified hierarchy with the traditional multiple hierarchies in
a fully backward compatible way.
All controllers which support the unified hierarchy and are not bound
to other hierarchies are automatically bound to unified hierarchy and
show up at the root of it. Controllers which are enabled only in the
root of unified hierarchy can be bound to other hierarchies. This
allows mixing unified hierarchy with the traditional multiple
hierarchies in a fully backward compatible way.
For development purposes, the following boot parameter makes all
controllers to appear on the unified hierarchy whether supported or
not.
cgroup__DEVEL__legacy_files_on_dfl
A controller can be moved across hierarchies only after the controller
is no longer referenced in its current hierarchy. Because per-cgroup
controller states are destroyed asynchronously and controllers may
have lingering references, a controller may not show up immediately on
the unified hierarchy after the final umount of the previous
hierarchy. Similarly, a controller should be fully disabled to be
moved out of the unified hierarchy and it may take some time for the
disabled controller to become available for other hierarchies;
furthermore, due to dependencies among controllers, other controllers
may need to be disabled too.
While useful for development and manual configurations, dynamically
moving controllers between the unified and other hierarchies is
strongly discouraged for production use. It is recommended to decide
the hierarchies and controller associations before starting using the
controllers.
2-2. cgroup.subtree_control
......
......@@ -928,7 +928,15 @@ struct cgroup_subsys blkio_cgrp_subsys = {
.css_offline = blkcg_css_offline,
.css_free = blkcg_css_free,
.can_attach = blkcg_can_attach,
.base_cftypes = blkcg_files,
.legacy_cftypes = blkcg_files,
#ifdef CONFIG_MEMCG
/*
* This ensures that, if available, memcg is automatically enabled
* together on the default hierarchy so that the owner cgroup can
* be retrieved from writeback pages.
*/
.depends_on = 1 << memory_cgrp_id,
#endif
};
EXPORT_SYMBOL_GPL(blkio_cgrp_subsys);
......@@ -1120,7 +1128,8 @@ int blkcg_policy_register(struct blkcg_policy *pol)
/* everything is in place, add intf files for the new policy */
if (pol->cftypes)
WARN_ON(cgroup_add_cftypes(&blkio_cgrp_subsys, pol->cftypes));
WARN_ON(cgroup_add_legacy_cftypes(&blkio_cgrp_subsys,
pol->cftypes));
ret = 0;
out_unlock:
mutex_unlock(&blkcg_pol_mutex);
......
......@@ -412,13 +412,13 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
int rw;
/*
* If sane_hierarchy is enabled, we switch to properly hierarchical
* If on the default hierarchy, we switch to properly hierarchical
* behavior where limits on a given throtl_grp are applied to the
* whole subtree rather than just the group itself. e.g. If 16M
* read_bps limit is set on the root group, the whole system can't
* exceed 16M for the device.
*
* If sane_hierarchy is not enabled, the broken flat hierarchy
* If not on the default hierarchy, the broken flat hierarchy
* behavior is retained where all throtl_grps are treated as if
* they're all separate root groups right below throtl_data.
* Limits of a group don't interact with limits of other groups
......@@ -426,7 +426,7 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
*/
parent_sq = &td->service_queue;
if (cgroup_sane_behavior(blkg->blkcg->css.cgroup) && blkg->parent)
if (cgroup_on_dfl(blkg->blkcg->css.cgroup) && blkg->parent)
parent_sq = &blkg_to_tg(blkg->parent)->service_queue;
throtl_service_queue_init(&tg->service_queue, parent_sq);
......
......@@ -203,7 +203,15 @@ struct cgroup {
struct kernfs_node *kn; /* cgroup kernfs entry */
struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
/* the bitmask of subsystems enabled on the child cgroups */
/*
* The bitmask of subsystems enabled on the child cgroups.
* ->subtree_control is the one configured through
* "cgroup.subtree_control" while ->child_subsys_mask is the
* effective one which may have more subsystems enabled.
* Controller knobs are made available iff it's enabled in
* ->subtree_control.
*/
unsigned int subtree_control;
unsigned int child_subsys_mask;
/* Private pointers for each registered subsystem */
......@@ -248,73 +256,9 @@ struct cgroup {
/* cgroup_root->flags */
enum {
/*
* Unfortunately, cgroup core and various controllers are riddled
* with idiosyncrasies and pointless options. The following flag,
* when set, will force sane behavior - some options are forced on,
* others are disallowed, and some controllers will change their
* hierarchical or other behaviors.
*
* The set of behaviors affected by this flag are still being
* determined and developed and the mount option for this flag is
* prefixed with __DEVEL__. The prefix will be dropped once we
* reach the point where all behaviors are compatible with the
* planned unified hierarchy, which will automatically turn on this
* flag.
*
* The followings are the behaviors currently affected this flag.
*
* - Mount options "noprefix", "xattr", "clone_children",
* "release_agent" and "name" are disallowed.
*
* - When mounting an existing superblock, mount options should
* match.
*
* - Remount is disallowed.
*
* - rename(2) is disallowed.
*
* - "tasks" is removed. Everything should be at process
* granularity. Use "cgroup.procs" instead.
*
* - "cgroup.procs" is not sorted. pids will be unique unless they
* got recycled inbetween reads.
*
* - "release_agent" and "notify_on_release" are removed.
* Replacement notification mechanism will be implemented.
*
* - "cgroup.clone_children" is removed.
*
* - "cgroup.subtree_populated" is available. Its value is 0 if
* the cgroup and its descendants contain no task; otherwise, 1.
* The file also generates kernfs notification which can be
* monitored through poll and [di]notify when the value of the
* file changes.
*
* - If mount is requested with sane_behavior but without any
* subsystem, the default unified hierarchy is mounted.
*
* - cpuset: tasks will be kept in empty cpusets when hotplug happens
* and take masks of ancestors with non-empty cpus/mems, instead of
* being moved to an ancestor.
*
* - cpuset: a task can be moved into an empty cpuset, and again it
* takes masks of ancestors.
*
* - memcg: use_hierarchy is on by default and the cgroup file for
* the flag is not created.
*
* - blkcg: blk-throttle becomes properly hierarchical.
*
* - debug: disallowed on the default hierarchy.
*/
CGRP_ROOT_SANE_BEHAVIOR = (1 << 0),
CGRP_ROOT_SANE_BEHAVIOR = (1 << 0), /* __DEVEL__sane_behavior specified */
CGRP_ROOT_NOPREFIX = (1 << 1), /* mounted subsystems have no named prefix */
CGRP_ROOT_XATTR = (1 << 2), /* supports extended attributes */
/* mount options live below bit 16 */
CGRP_ROOT_OPTION_MASK = (1 << 16) - 1,
};
/*
......@@ -440,9 +384,11 @@ struct css_set {
enum {
CFTYPE_ONLY_ON_ROOT = (1 << 0), /* only create on root cgrp */
CFTYPE_NOT_ON_ROOT = (1 << 1), /* don't create on root cgrp */
CFTYPE_INSANE = (1 << 2), /* don't create if sane_behavior */
CFTYPE_NO_PREFIX = (1 << 3), /* (DON'T USE FOR NEW FILES) no subsys prefix */
CFTYPE_ONLY_ON_DFL = (1 << 4), /* only on default hierarchy */
/* internal flags, do not use outside cgroup core proper */
__CFTYPE_ONLY_ON_DFL = (1 << 16), /* only on default hierarchy */
__CFTYPE_NOT_ON_DFL = (1 << 17), /* not on default hierarchy */
};
#define MAX_CFTYPE_NAME 64
......@@ -526,20 +472,64 @@ struct cftype {
extern struct cgroup_root cgrp_dfl_root;
extern struct css_set init_css_set;
/**
* cgroup_on_dfl - test whether a cgroup is on the default hierarchy
* @cgrp: the cgroup of interest
*
* The default hierarchy is the v2 interface of cgroup and this function
* can be used to test whether a cgroup is on the default hierarchy for
* cases where a subsystem should behave differnetly depending on the
* interface version.
*
* The set of behaviors which change on the default hierarchy are still
* being determined and the mount option is prefixed with __DEVEL__.
*
* List of changed behaviors:
*
* - Mount options "noprefix", "xattr", "clone_children", "release_agent"
* and "name" are disallowed.
*
* - When mounting an existing superblock, mount options should match.
*
* - Remount is disallowed.
*
* - rename(2) is disallowed.
*
* - "tasks" is removed. Everything should be at process granularity. Use
* "cgroup.procs" instead.
*
* - "cgroup.procs" is not sorted. pids will be unique unless they got
* recycled inbetween reads.
*
* - "release_agent" and "notify_on_release" are removed. Replacement
* notification mechanism will be implemented.
*
* - "cgroup.clone_children" is removed.
*
* - "cgroup.subtree_populated" is available. Its value is 0 if the cgroup
* and its descendants contain no task; otherwise, 1. The file also
* generates kernfs notification which can be monitored through poll and
* [di]notify when the value of the file changes.
*
* - cpuset: tasks will be kept in empty cpusets when hotplug happens and
* take masks of ancestors with non-empty cpus/mems, instead of being
* moved to an ancestor.
*
* - cpuset: a task can be moved into an empty cpuset, and again it takes
* masks of ancestors.
*
* - memcg: use_hierarchy is on by default and the cgroup file for the flag
* is not created.
*
* - blkcg: blk-throttle becomes properly hierarchical.
*
* - debug: disallowed on the default hierarchy.
*/
static inline bool cgroup_on_dfl(const struct cgroup *cgrp)
{
return cgrp->root == &cgrp_dfl_root;
}
/*
* See the comment above CGRP_ROOT_SANE_BEHAVIOR for details. This
* function can be called as long as @cgrp is accessible.
*/
static inline bool cgroup_sane_behavior(const struct cgroup *cgrp)
{
return cgrp->root->flags & CGRP_ROOT_SANE_BEHAVIOR;
}
/* no synchronization, the result can only be used as a hint */
static inline bool cgroup_has_tasks(struct cgroup *cgrp)
{
......@@ -602,7 +592,8 @@ static inline void pr_cont_cgroup_path(struct cgroup *cgrp)
char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen);
int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_rm_cftypes(struct cftype *cfts);
bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor);
......@@ -634,6 +625,7 @@ struct cgroup_subsys {
int (*css_online)(struct cgroup_subsys_state *css);
void (*css_offline)(struct cgroup_subsys_state *css);
void (*css_free)(struct cgroup_subsys_state *css);
void (*css_reset)(struct cgroup_subsys_state *css);
int (*can_attach)(struct cgroup_subsys_state *css,
struct cgroup_taskset *tset);
......@@ -682,8 +674,21 @@ struct cgroup_subsys {
*/
struct list_head cfts;
/* base cftypes, automatically registered with subsys itself */
struct cftype *base_cftypes;
/*
* Base cftypes which are automatically registered. The two can
* point to the same array.
*/
struct cftype *dfl_cftypes; /* for the default hierarchy */
struct cftype *legacy_cftypes; /* for the legacy hierarchies */
/*
* A subsystem may depend on other subsystems. When such subsystem
* is enabled on a cgroup, the depended-upon subsystems are enabled
* together if available. Subsystems enabled due to dependency are
* not visible to userland until explicitly enabled. The following
* specifies the mask of subsystems that this one depends on.
*/
unsigned int depends_on;
};
#define SUBSYS(_x) extern struct cgroup_subsys _x ## _cgrp_subsys;
......
This diff is collapsed.
......@@ -480,5 +480,5 @@ struct cgroup_subsys freezer_cgrp_subsys = {
.css_free = freezer_css_free,
.attach = freezer_attach,
.fork = freezer_fork,
.base_cftypes = files,
.legacy_cftypes = files,
};
This diff is collapsed.
......@@ -8083,7 +8083,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
.can_attach = cpu_cgroup_can_attach,
.attach = cpu_cgroup_attach,
.exit = cpu_cgroup_exit,
.base_cftypes = cpu_files,
.legacy_cftypes = cpu_files,
.early_init = 1,
};
......
......@@ -278,6 +278,6 @@ void cpuacct_account_field(struct task_struct *p, int index, u64 val)
struct cgroup_subsys cpuacct_cgrp_subsys = {
.css_alloc = cpuacct_css_alloc,
.css_free = cpuacct_css_free,
.base_cftypes = files,
.legacy_cftypes = files,
.early_init = 1,
};
......@@ -358,9 +358,8 @@ static void __init __hugetlb_cgroup_file_init(int idx)
cft = &h->cgroup_files[4];
memset(cft, 0, sizeof(*cft));
WARN_ON(cgroup_add_cftypes(&hugetlb_cgrp_subsys, h->cgroup_files));
return;
WARN_ON(cgroup_add_legacy_cftypes(&hugetlb_cgrp_subsys,
h->cgroup_files));
}
void __init hugetlb_cgroup_file_init(void)
......
......@@ -6007,7 +6007,6 @@ static struct cftype mem_cgroup_files[] = {
},
{
.name = "use_hierarchy",
.flags = CFTYPE_INSANE,
.write_u64 = mem_cgroup_hierarchy_write,
.read_u64 = mem_cgroup_hierarchy_read,
},
......@@ -6411,6 +6410,29 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
__mem_cgroup_free(memcg);
}
/**
* mem_cgroup_css_reset - reset the states of a mem_cgroup
* @css: the target css
*
* Reset the states of the mem_cgroup associated with @css. This is
* invoked when the userland requests disabling on the default hierarchy
* but the memcg is pinned through dependency. The memcg should stop
* applying policies and should revert to the vanilla state as it may be
* made visible again.
*
* The current implementation only resets the essential configurations.
* This needs to be expanded to cover all the visible parts.
*/
static void mem_cgroup_css_reset(struct cgroup_subsys_state *css)
{
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
mem_cgroup_resize_limit(memcg, ULLONG_MAX);
mem_cgroup_resize_memsw_limit(memcg, ULLONG_MAX);
memcg_update_kmem_limit(memcg, ULLONG_MAX);
res_counter_set_soft_limit(&memcg->res, ULLONG_MAX);
}
#ifdef CONFIG_MMU
/* Handlers for move charge at task migration. */
#define PRECHARGE_COUNT_AT_ONCE 256
......@@ -7005,16 +7027,17 @@ static void mem_cgroup_move_task(struct cgroup_subsys_state *css,
/*
* Cgroup retains root cgroups across [un]mount cycles making it necessary
* to verify sane_behavior flag on each mount attempt.
* to verify whether we're attached to the default hierarchy on each mount
* attempt.
*/
static void mem_cgroup_bind(struct cgroup_subsys_state *root_css)
{
/*
* use_hierarchy is forced with sane_behavior. cgroup core
* use_hierarchy is forced on the default hierarchy. cgroup core
* guarantees that @root doesn't have any children, so turning it
* on for the root memcg is enough.
*/
if (cgroup_sane_behavior(root_css->cgroup))
if (cgroup_on_dfl(root_css->cgroup))
mem_cgroup_from_css(root_css)->use_hierarchy = true;
}
......@@ -7023,11 +7046,12 @@ struct cgroup_subsys memory_cgrp_subsys = {
.css_online = mem_cgroup_css_online,
.css_offline = mem_cgroup_css_offline,
.css_free = mem_cgroup_css_free,
.css_reset = mem_cgroup_css_reset,
.can_attach = mem_cgroup_can_attach,
.cancel_attach = mem_cgroup_cancel_attach,
.attach = mem_cgroup_move_task,
.bind = mem_cgroup_bind,
.base_cftypes = mem_cgroup_files,
.legacy_cftypes = mem_cgroup_files,
.early_init = 0,
};
......@@ -7044,7 +7068,8 @@ __setup("swapaccount=", enable_swap_account);
static void __init memsw_file_init(void)
{
WARN_ON(cgroup_add_cftypes(&memory_cgrp_subsys, memsw_cgroup_files));
WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys,
memsw_cgroup_files));
}
static void __init enable_swap_cgroup(void)
......
......@@ -107,5 +107,5 @@ struct cgroup_subsys net_cls_cgrp_subsys = {
.css_online = cgrp_css_online,
.css_free = cgrp_css_free,
.attach = cgrp_attach,
.base_cftypes = ss_files,
.legacy_cftypes = ss_files,
};
......@@ -249,7 +249,7 @@ struct cgroup_subsys net_prio_cgrp_subsys = {
.css_online = cgrp_css_online,
.css_free = cgrp_css_free,
.attach = net_prio_attach,
.base_cftypes = ss_files,
.legacy_cftypes = ss_files,
};
static int netprio_device_event(struct notifier_block *unused,
......
......@@ -222,7 +222,7 @@ static struct cftype tcp_files[] = {
static int __init tcp_memcontrol_init(void)
{
WARN_ON(cgroup_add_cftypes(&memory_cgrp_subsys, tcp_files));
WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, tcp_files));
return 0;
}
__initcall(tcp_memcontrol_init);
......@@ -796,7 +796,7 @@ struct cgroup_subsys devices_cgrp_subsys = {
.css_free = devcgroup_css_free,
.css_online = devcgroup_online,
.css_offline = devcgroup_offline,
.base_cftypes = dev_cgroup_files,
.legacy_cftypes = dev_cgroup_files,
};
/**
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment