1. 19 Aug, 2013 2 commits
    • Tejun Heo's avatar
      cgroup: fix subsystem file accesses on the root cgroup · 0bfb4aa6
      Tejun Heo authored
      105347ba ("cgroup: make cgroup_file_open() rcu_read_lock() around
      cgroup_css() and add cfent->css") added cfent->css to cache the
      associted cgroup_subsys_state across file operations.
      
      A cfent is associated with single css throughout its lifetime and the
      origimal commit initialized the cache pointer during cgroup_add_file()
      and verified that it matches the actual one in cgroup_file_open().
      While this works fine for !root cgroups, it's broken for root cgroups
      as files in a root cgroup are created before the css's are associated
      with the cgroup and thus cgroup_css() call in cgroup_add_file()
      returns NULL associating all cfents in the root cgroup with NULL css.
      This makes cgroup_file_open() trigger WARN and fail with -ENODEV for
      all !core subsystem files in the root cgroups.
      
      There's no reason to initialize cfent->css separately from
      cgroup_add_file().  As the association never changes,
      cgroup_file_open() can set it unconditionally every time and
      containing the logic in cgroup_file_open() makes more sense anyway as
      the only reason it's necessary is file->private_data being already
      occupied.
      
      Fix it by setting cfent->css unconditionally from cgroup_file_open().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      0bfb4aa6
    • Li Zefan's avatar
      cgroup: change cgroup_from_id() to css_from_id() · 1cb650b9
      Li Zefan authored
      Now we want cgroup core to always provide the css to use to the
      subsystems, so change this API to css_from_id().
      
      Uninline css_from_id(), because it's getting bigger and cgroup_css()
      has been unexported.
      
      While at it, remove the #ifdef, and shuffle the order of the args.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1cb650b9
  2. 16 Aug, 2013 1 commit
    • Li Zhong's avatar
      cgroup: use css_get() in cgroup_create() to check CSS_ROOT · 930913a3
      Li Zhong authored
      It seems that the root css doesn't have refcnt allocated(not needed?),
      and would cause the booting error attached.
      
      This patch tries to use css_get() to not increase the refcnt if parent
      is root.
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff810b37cc>] cgroup_mkdir+0x37c/0x740
        PGD 0
        Oops: 0002 [#1]
        Modules linked in:
        CPU: 0 PID: 1 Comm: systemd Not tainted 3.11.0-rc5-next-20130815+ #1
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
        task: ffff88007f868000 ti: ffff88007f864000 task.ti: ffff88007f864000
        RIP: 0010:[<ffffffff810b37cc>]  [<ffffffff810b37cc>] cgroup_mkdir+0x37c/0x740
        RSP: 0018:ffff88007f865df8  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffffffff81a46ee0 RCX: 0000000000000001
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81a415c0
        RBP: ffff88007f865ec8 R08: 0000000000000001 R09: 0000000000000000
        R10: ffff88007ce6d060 R11: 0000000000000000 R12: ffff88007ce6d000
        R13: ffff88007ce6d060 R14: ffffffff81a46d80 R15: ffff88007c6e8018
        FS:  00007f13dbf6f840(0000) GS:ffffffff81a23000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000007b7e5000 CR4: 00000000000006b0
        Stack:
         ffffffff810b380d 0000000000000002 ffff88007f865e18 ffffffff81167069
         ffff88007f865ed8 ffffffff8116a3f5 ffff880037454400 ffff88007c6e8018
         ffff88007c6e8028 ffff88007c6e8328 ffff88007c6e8000 ffff88007ce6d000
        Call Trace:
         [<ffffffff810b380d>] ? cgroup_mkdir+0x3bd/0x740
         [<ffffffff81167069>] ? lookup_hash+0x19/0x20
         [<ffffffff8116a3f5>] ? kern_path_create+0x95/0x170
         [<ffffffff8116ce3e>] vfs_mkdir+0x9e/0xf0
         [<ffffffff8116d7a0>] SyS_mkdirat+0x60/0xe0
         [<ffffffff8116d839>] SyS_mkdir+0x19/0x20
         [<ffffffff814c960d>] tracesys+0xcf/0xd4
        Code: ad 70 ff ff ff 48 89 9d 60 ff ff ff 4d 89 d5 4c 8b bd 68 ff ff ff 4c 8b 65 88 eb 50 0f 1f 00 48 8b 43 18 a8 03 0f 85 6c 03 00 00 <ff> 00 e8 1d 0a fb ff 85 c0 74 0d 80 3d f0 45 a1 00 00 0f 84 4c
        RIP  [<ffffffff810b37cc>] cgroup_mkdir+0x37c/0x740
         RSP <ffff88007f865df8>
        CR2: 0000000000000000
        ---[ end trace a4b14b49bc46fd60 ]---
      Signed-off-by: default avatarLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      930913a3
  3. 14 Aug, 2013 8 commits
    • Li Zefan's avatar
      cpuset: remove an unncessary forward declaration · ff58ac0d
      Li Zefan authored
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      ff58ac0d
    • Tejun Heo's avatar
      cgroup: RCU protect each cgroup_subsys_state release · 0c21ead1
      Tejun Heo authored
      With the planned unified hierarchy, individual css's will be created
      and destroyed dynamically across the lifetime of a cgroup.  To enable
      such usages, css destruction is being decoupled from cgroup
      destruction.  Most of the destruction path has been decoupled but the
      actual free of css still depends on cgroup free path.
      
      When all css refs are drained, css_release() kicks off
      css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
      reaches zero, cgroup_diput() is invoked which in turn schedules RCU
      free of the cgroup.  After a grace period, all css's are freed along
      with the cgroup itself.
      
      This patch moves the RCU grace period and css freeing from cgroup
      release path to css release path.  css_release(), instead of kicking
      off css_free_work_fn() directly, schedules RCU callback
      css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
      RCU grace period.  css_free_work_fn() is updated to free the css
      directly.
      
      The five-way punting - percpu ref kill confirmation, a work item,
      percpu ref release, RCU grace period, and again a work item - is quite
      hairy but the work items are there only to provide process context and
      the actual sequence is kill confirm -> release -> RCU free, which
      isn't simple but not too crazy.
      
      This removes cgroup_css() usage after offline_css() allowing clearing
      cgroup->subsys[] from offline_css(), which makes it consistent with
      online_css() and brings it closer to proper lifetime management for
      individual css's.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      0c21ead1
    • Tejun Heo's avatar
      cgroup: move subsys file removal to kill_css() · 3c14f8b4
      Tejun Heo authored
      With the planned unified hierarchy, individual css's will be created
      and destroyed dynamically across the lifetime of a cgroup.  To enable
      such usages, css destruction is being decoupled from cgroup
      destruction.  This patch moves subsys file removal from
      cgroup_destroy_locked() to kill_css().
      
      While this changes the order of destruction operations, the changes
      shouldn't be noticeable to cgroup subsystems or userland.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      3c14f8b4
    • Tejun Heo's avatar
      cgroup: factor out kill_css() · edae0c33
      Tejun Heo authored
      Factor out css ref killing from cgroup_destroy_locked() into
      kill_css().  We're gonna add more to the path and the factored out
      function will eventually be called from other places too.
      
      While at it, replace open coded percpu_ref_get() with css_get() for
      consistency.  This shouldn't cause any functional difference as the
      function is not used for root cgroups.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      edae0c33
    • Tejun Heo's avatar
      cgroup: decouple cgroup_subsys_state destruction from cgroup destruction · 09a503ea
      Tejun Heo authored
      Currently, css (cgroup_subsys_state) lifetime is tied to that of the
      associated cgroup.  css's are created when the associated cgroup is
      created and destroyed when it gets destroyed.  Also, individual css's
      aren't RCU protected but the whole cgroup is.  With the planned
      unified hierarchy, css's will need to be dynamically created and
      destroyed within the lifetime of a cgroup.
      
      To enable such usages, this patch decouples css destruction from
      cgroup destruction - offline_css() invocation and the final css_put()
      are moved from cgroup_destroy_css_killed() to css_killed_work_fn().
      Now each css is individually offlined and put as its reference count
      is killed instead of waiting for all css's attached to the cgroup to
      finish refcnt killing and then proceeding to offlining and putting
      them together.
      
      While this changes the order of destruction operations, the changes
      shouldn't be noticeable to cgroup subsystems or userland.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      09a503ea
    • Tejun Heo's avatar
      cgroup: replace cgroup->css_kill_cnt with ->nr_css · f20104de
      Tejun Heo authored
      Currently, css (cgroup_subsys_state) lifetime is tied to that of the
      associated cgroup.  With the planned unified hierarchy, css's will be
      dynamically created and destroyed within the lifetime of a cgroup.  To
      enable such usages, css's will be individually RCU protected instead
      of being tied to the cgroup.
      
      cgroup->css_kill_cnt is used during cgroup destruction to wait for css
      reference count disable; however, this model doesn't work once css's
      lifetimes are managed separately from cgroup's.  This patch replaces
      it with cgroup->nr_css which is an cgroup_mutex protected integer
      counting the number of attached css's.  The count is incremented from
      online_css() and decremented after refcnt kill is confirmed.  If the
      count reaches zero and the cgroup is marked dead, the second stage of
      cgroup destruction is kicked off.  If a cgroup doesn't have any css
      attached at the time of rmdir, cgroup_destroy_locked() now invokes the
      second stage directly as no css kill confirmation would happen.
      
      cgroup_offline_fn() - the second step of cgroup destruction - is
      renamed to cgroup_destroy_css_killed() and now expects to be called
      with cgroup_mutex held.
      
      While this patch changes how css destruction is punted to work items,
      it shouldn't change any visible behavior.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      f20104de
    • Tejun Heo's avatar
      cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item · 223dbc38
      Tejun Heo authored
      css (cgroup_subsys_state) offlining, which requires process context,
      will be moved to ref kill confirmation.  In preparation, bounce
      css_killed handling through css->destroy_work.
      
      css_ref_killed_fn() is renamed to css_killed_ref_fn() so that it's
      consistent with the new css_killed_work_fn().
      
      This patch adds an additional work item bouncing but doesn't change
      the actual logic.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      223dbc38
    • Tejun Heo's avatar
      cgroup: move cgroup->subsys[] assignment to online_css() · ae7f164a
      Tejun Heo authored
      Currently, css (cgroup_subsys_state) lifetime is tied to that of the
      associated cgroup.  With the planned unified hierarchy, css's will be
      dynamically created and destroyed within the lifetime of a cgroup.  To
      enable such usages, css's will be individually RCU protected instead
      of being tied to the cgroup.
      
      In preparation, this patch moves cgroup->subsys[] assignment from
      init_css() to online_css().  As this means that a newly initialized
      css should be remembered separately and that cgroup_css() returns NULL
      between init and online, cgroup_create() is updated so that it stores
      newly created css's in a local array css_ar[] and
      cgroup_init/load_subsys() are updated to use local variable @css
      instead of using cgroup_css().  This change also slightly simplifies
      error path of cgroup_create().
      
      While this patch changes when cgroup->subsys[] is initialized, this
      change isn't visible to subsystems or userland.
      
      v2: This patch wasn't updated accordingly after the previous "cgroup:
          reorganize css init / exit paths" was updated leading to missing a
          css_ar[] conversion in cgroup_create() and thus boot failure.  Fix
          it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      ae7f164a
  4. 13 Aug, 2013 7 commits
    • Tejun Heo's avatar
      cgroup: reorganize css init / exit paths · 623f926b
      Tejun Heo authored
      css (cgroup_subsys_state) lifetime management is about to be
      restructured.  In prepartion, make the following mostly trivial
      changes.
      
      * init_cgroup_css() is renamed to init_css() so that it's consistent
        with other css handling functions.
      
      * alloc_css_id(), online_css() and offline_css() updated to take @css
        instead of cgroups and subsys IDs.
      
      This patch doesn't make any functional changes.
      
      v2: v1 merged two for_each_root_subsys() loops in cgroup_create() but
          Li Zefan pointed out that it breaks error path.  Dropped.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      623f926b
    • Tejun Heo's avatar
      cgroup: add __rcu modifier to cgroup->subsys[] · 73e80ed8
      Tejun Heo authored
      For the planned unified hierarchy, each css (cgroup_subsys_state) will
      be RCU protected so that it can be created and destroyed individually
      while allowing RCU accesses.  Previous changes ensured that all
      cgroup->subsys[] accesses use the cgroup_css() accessor.  This patch
      adds __rcu modifier to cgroup->subsys[], add matching RCU dereference
      in cgroup_css() and convert all assignments to either
      rcu_assign_pointer() or RCU_INIT_POINTER().
      
      This change prepares for the actual RCUfication of css's and doesn't
      introduce any visible behavior change.  The conversion is verified
      with sparse and all accesses are properly RCU annotated.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      73e80ed8
    • Tejun Heo's avatar
      cgroup: make cgroup_file_open() rcu_read_lock() around cgroup_css() and add cfent->css · 105347ba
      Tejun Heo authored
      For the planned unified hierarchy, each css (cgroup_subsys_state) will
      be RCU protected so that it can be created and destroyed individually
      while allowing RCU accesses, and cgroup_css() will soon require either
      holding cgroup_mutex or RCU read lock.
      
      This patch updates cgroup_file_open() such that it acquires the
      associated css under rcu_read_lock().  While cgroup_file_css() usages
      in other file operations are safe due to the reference from open,
      cgroup_css() wouldn't know that and will still trigger warnings.  It'd
      be cleanest to store the acquired css in file->prvidate_data for
      further file operations but that's already used by seqfile.  This
      patch instead adds cfent->css to cache the associated css.  Note that
      while this field is initialized during cfe init, it should only be
      considered valid while the file is open.
      
      This patch doesn't change visible behavior.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      105347ba
    • Tejun Heo's avatar
      cgroup: cgroup_css_from_dir() now should be called with RCU read locked · b77d7b60
      Tejun Heo authored
      cgroup->subsys[] will become RCU protected and thus all cgroup_css()
      usages should either be under RCU read lock or cgroup_mutex.  This
      patch updates cgroup_css_from_dir() which returns the matching
      cgroup_subsys_state given a directory file and subsys_id so that it
      requires RCU read lock and updates its sole user
      perf_cgroup_connect().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      b77d7b60
    • Tejun Heo's avatar
      cgroup: add cgroup_subsys_state->parent · 0ae78e0b
      Tejun Heo authored
      With the planned unified hierarchy, css's (cgroup_subsys_state) will
      be RCU protected and allowed to be attached and detached dynamically
      over the course of a cgroup's lifetime.  This means that css's will
      stay accessible after being detached from its cgroup - the matching
      pointer in cgroup->subsys[] cleared - for ref draining and RCU grace
      period.
      
      cgroup core still wants to guarantee that the parent css is never
      destroyed before its children and css_parent() always returns the
      parent regardless of the state of the child css as long as it's
      accessible.
      
      This patch makes css's hold onto their parents and adds css->parent so
      that the parent css is never detroyed before its children and can be
      determined without consulting the cgroups.
      
      cgroup->dummy_css is also updated to point to the parent dummy_css;
      however, it doesn't need to worry about object lifetime as the parent
      cgroup is already pinned by the child.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      0ae78e0b
    • Tejun Heo's avatar
      cgroup: rename cgroup_subsys_state->dput_work and its callback function · 35ef10da
      Tejun Heo authored
      css (cgroup_subsys_state) will become RCU protected and there will be
      two stages which require punting to work item during release.  To
      prepare for using the work item for multiple times, rename
      css->dput_work to css->destroy_work and css_dput_fn() to
      css_free_work_fn() and move work item initialization from css init to
      right before the actual usage.
      
      This reorganization doesn't introduce any behavior change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      35ef10da
    • Tejun Heo's avatar
      cgroup: always use cgroup_css() · 40e93b39
      Tejun Heo authored
      cgroup_css() is the accessor for cgroup->subsys[] but is not used
      consistently.  cgroup->subsys[] will become RCU protected and
      cgroup_css() will grow synchronization sanity checks.  In preparation,
      make all cgroup->subsys[] dereferences use cgroup_css() consistently.
      
      This patch doesn't introduce any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      40e93b39
  5. 09 Aug, 2013 22 commits
    • Tejun Heo's avatar
      cgroup: make css_for_each_descendant() and friends include the origin css in the iteration · bd8815a6
      Tejun Heo authored
      Previously, all css descendant iterators didn't include the origin
      (root of subtree) css in the iteration.  The reasons were maintaining
      consistency with css_for_each_child() and that at the time of
      introduction more use cases needed skipping the origin anyway;
      however, given that css_is_descendant() considers self to be a
      descendant, omitting the origin css has become more confusing and
      looking at the accumulated use cases rather clearly indicates that
      including origin would result in simpler code overall.
      
      While this is a change which can easily lead to subtle bugs, cgroup
      API including the iterators has recently gone through major
      restructuring and no out-of-tree changes will be applicable without
      adjustments making this a relatively acceptable opportunity for this
      type of change.
      
      The conversions are mostly straight-forward.  If the iteration block
      had explicit origin handling before or after, it's moved inside the
      iteration.  If not, if (pos == origin) continue; is added.  Some
      conversions add extra reference get/put around origin handling by
      consolidating origin handling and the rest.  While the extra ref
      operations aren't strictly necessary, this shouldn't cause any
      noticeable difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      bd8815a6
    • Tejun Heo's avatar
      cgroup: unexport cgroup_css() · 95109b62
      Tejun Heo authored
      cgroup_css() no longer has any user left outside cgroup.c proper and
      we don't want subsystems to grow new usages of the function.  cgroup
      core should always provide the css to use to the subsystems, which
      will make dynamic creation and destruction of css's across the
      lifetime of a cgroup much more manageable than exposing the cgroup
      directly to subsystems and let them dereference css's from it.
      
      Make cgroup_css() a static function in cgroup.c.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      95109b62
    • Tejun Heo's avatar
      cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroup · d99c8727
      Tejun Heo authored
      cgroup is in the process of converting to css (cgroup_subsys_state)
      from cgroup as the principal subsystem interface handle.  This is
      mostly to prepare for the unified hierarchy support where css's will
      be created and destroyed dynamically but also helps cleaning up
      subsystem implementations as css is usually what they are interested
      in anyway.
      
      cgroup_taskset which is used by the subsystem attach methods is the
      last cgroup subsystem API which isn't using css as the handle.  Update
      cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and
      cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp.
      
      The conversions are pretty mechanical.  One exception is
      cpuset::cgroup_cs(), which lost its last user and got removed.
      
      This patch shouldn't introduce any functional changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      d99c8727
    • Tejun Heo's avatar
      cgroup: make cftype->[un]register_event() deal with cgroup_subsys_state instead of cgroup · 81eeaf04
      Tejun Heo authored
      cgroup is in the process of converting to css (cgroup_subsys_state)
      from cgroup as the principal subsystem interface handle.  This is
      mostly to prepare for the unified hierarchy support where css's will
      be created and destroyed dynamically but also helps cleaning up
      subsystem implementations as css is usually what they are interested
      in anyway.
      
      cftype->[un]register_event() is among the remaining couple interfaces
      which still use struct cgroup.  Convert it to cgroup_subsys_state.
      The conversion is mostly mechanical and removes the last users of
      mem_cgroup_from_cont() and cg_to_vmpressure(), which are removed.
      
      v2: indentation update as suggested by Li Zefan.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      81eeaf04
    • Tejun Heo's avatar
      cgroup: make task iterators deal with cgroup_subsys_state instead of cgroup · 72ec7029
      Tejun Heo authored
      cgroup is in the process of converting to css (cgroup_subsys_state)
      from cgroup as the principal subsystem interface handle.  This is
      mostly to prepare for the unified hierarchy support where css's will
      be created and destroyed dynamically but also helps cleaning up
      subsystem implementations as css is usually what they are interested
      in anyway.
      
      This patch converts task iterators to deal with css instead of cgroup.
      Note that under unified hierarchy, different sets of tasks will be
      considered belonging to a given cgroup depending on the subsystem in
      question and making the iterators deal with css instead cgroup
      provides them with enough information about the iteration.
      
      While at it, fix several function comment formats in cpuset.c.
      
      This patch doesn't introduce any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      72ec7029
    • Tejun Heo's avatar
      cgroup: remove struct cgroup_scanner · e535837b
      Tejun Heo authored
      cgroup_scan_tasks() takes a pointer to struct cgroup_scanner as its
      sole argument and the only function of that struct is packing the
      arguments of the function call which are consisted of five fields.
      It's not too unusual to pack parameters into a struct when the number
      of arguments gets excessive or the whole set needs to be passed around
      a lot, but neither holds here making it just weird.
      
      Drop struct cgroup_scanner and pass the params directly to
      cgroup_scan_tasks().  Note that struct cpuset_change_nodemask_arg was
      added to cpuset.c to pass both ->cs and ->newmems pointer to
      cpuset_change_nodemask() using single data pointer.
      
      This doesn't make any functional differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      e535837b
    • Tejun Heo's avatar
      cgroup: make cgroup_task_iter remember the cgroup being iterated · c59cd3d8
      Tejun Heo authored
      Currently all cgroup_task_iter functions require @cgrp to be passed
      in, which is superflous and increases chance of usage error.  Make
      cgroup_task_iter remember the cgroup being iterated and drop @cgrp
      argument from next and end functions.
      
      This patch doesn't introduce any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      c59cd3d8
    • Tejun Heo's avatar
      cgroup: rename cgroup_iter to cgroup_task_iter · 0942eeee
      Tejun Heo authored
      cgroup now has multiple iterators and it's quite confusing to have
      something which walks over tasks of a single cgroup named cgroup_iter.
      Let's rename it to cgroup_task_iter.
      
      While at it, reformat / update comments and replace the overview
      comment above the interface function decls with proper function
      comments.  Such overview can be useful but function comments should be
      more than enough here.
      
      This is pure rename and doesn't introduce any functional changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      0942eeee
    • Tejun Heo's avatar
      cgroup: relocate cgroup_advance_iter() · d515876e
      Tejun Heo authored
      For some reason, cgroup_advance_iter() is standing lonely all away
      from its iter comrades.  Relocate it.
      
      This is cosmetic.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      d515876e
    • Tejun Heo's avatar
      cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroup · 492eb21b
      Tejun Heo authored
      cgroup is currently in the process of transitioning to using css
      (cgroup_subsys_state) as the primary handle instead of cgroup in
      subsystem API.  For hierarchy iterators, this is beneficial because
      
      * In most cases, css is the only thing subsystems care about anyway.
      
      * On the planned unified hierarchy, iterations for different
        subsystems will need to skip over different subtrees of the
        hierarchy depending on which subsystems are enabled on each cgroup.
        Passing around css makes it unnecessary to explicitly specify the
        subsystem in question as css is intersection between cgroup and
        subsystem
      
      * For the planned unified hierarchy, css's would need to be created
        and destroyed dynamically independent from cgroup hierarchy.  Having
        cgroup core manage css iteration makes enforcing deref rules a lot
        easier.
      
      Most subsystem conversions are straight-forward.  Noteworthy changes
      are
      
      * blkio: cgroup_to_blkcg() is no longer used.  Removed.
      
      * freezer: cgroup_freezer() is no longer used.  Removed.
      
      * devices: cgroup_to_devcgroup() is no longer used.  Removed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      492eb21b
    • Tejun Heo's avatar
      cgroup: always use cgroup_next_child() to walk the children list · f48e3924
      Tejun Heo authored
      There are several places where the children list is accessed directly.
      This patch converts those places to use cgroup_next_child().  This
      will help updating the hierarchy iterators to use @css instead of
      @cgrp.
      
      While cgroup_next_child() can be heavy in pathological cases - e.g. a
      lot of dead children, this shouldn't cause any noticeable behavior
      differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      f48e3924
    • Tejun Heo's avatar
      cgroup: convert cgroup_next_sibling() to cgroup_next_child() · 3b287a50
      Tejun Heo authored
      cgroup is transitioning to using css (cgroup_subsys_state) as the main
      subsys interface handle instead of cgroup and the iterators will be
      updated to use css too.  The iterators need to walk the cgroup
      hierarchy and return the css's matching the origin css, which is a bit
      cumbersome to open code.
      
      This patch converts cgroup_next_sibling() to cgroup_next_child() so
      that it can handle all steps of direct child iteration.  This will be
      used to update iterators to take @css instead of @cgrp.  In addition
      to the new iteration init handling, cgroup_next_child() is
      restructured so that the different branches share the end of iteration
      condition check.
      
      This patch doesn't change any behavior.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      3b287a50
    • Tejun Heo's avatar
      cgroup: pass around cgroup_subsys_state instead of cgroup in file methods · 182446d0
      Tejun Heo authored
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup.
      Please see the previous commit which converts the subsystem methods
      for rationale.
      
      This patch converts all cftype file operations to take @css instead of
      @cgroup.  cftypes for the cgroup core files don't have their subsytem
      pointer set.  These will automatically use the dummy_css added by the
      previous patch and can be converted the same way.
      
      Most subsystem conversions are straight forwards but there are some
      interesting ones.
      
      * freezer: update_if_frozen() is also converted to take @css instead
        of @cgroup for consistency.  This will make the code look simpler
        too once iterators are converted to use css.
      
      * memory/vmpressure: mem_cgroup_from_css() needs to be exported to
        vmpressure while mem_cgroup_from_cont() can be made static.
        Updated accordingly.
      
      * cpu: cgroup_tg() doesn't have any user left.  Removed.
      
      * cpuacct: cgroup_ca() doesn't have any user left.  Removed.
      
      * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
        Removed.
      
      * net_cls: cgrp_cls_state() doesn't have any user left.  Removed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      182446d0
    • Tejun Heo's avatar
      cgroup: add cgroup->dummy_css · 67f4c36f
      Tejun Heo authored
      cgroup subsystem API is being converted to use css
      (cgroup_subsys_state) as the main handle, which makes things a bit
      awkward for subsystem agnostic core features - the "cgroup.*"
      interface files and various iterations - a bit awkward as they don't
      have a css to use.
      
      This patch adds cgroup->dummy_css which has NULL ->ss and whose only
      role is pointing back to the cgroup.  This will be used to support
      subsystem agnostic features on the coming css based API.
      
      css_parent() is updated to handle dummy_css's.  Note that css will
      soon grow its own ->parent field and css_parent() will be made
      trivial.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      67f4c36f
    • Tejun Heo's avatar
      cgroup: pin cgroup_subsys_state when opening a cgroupfs file · f7d58818
      Tejun Heo authored
      Previously, each file read/write operation relied on the inode
      reference count pinning the cgroup and simply checked whether the
      cgroup was marked dead before proceeding to invoke the per-subsystem
      callback.  This was rather silly as it didn't have any synchronization
      or css pinning around the check and the cgroup may be removed and all
      css refs drained between the DEAD check and actual method invocation.
      
      This patch pins the css between open() and release() so that it is
      guaranteed to be alive for all file operations and remove the silly
      DEAD checks from cgroup_file_read/write().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      f7d58818
    • Tejun Heo's avatar
      cgroup: add subsys backlink pointer to cftype · 2bb566cb
      Tejun Heo authored
      cgroup is transitioning to using css (cgroup_subsys_state) instead of
      cgroup as the primary subsystem handle.  The cgroupfs file interface
      will be converted to use css's which requires finding out the
      subsystem from cftype so that the matching css can be determined from
      the cgroup.
      
      This patch adds cftype->ss which points to the subsystem the file
      belongs to.  The field is initialized while a cftype is being
      registered.  This makes it unnecessary to explicitly specify the
      subsystem for other cftype handling functions.  @ss argument dropped
      from various cftype handling functions.
      
      This patch shouldn't introduce any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      2bb566cb
    • Tejun Heo's avatar
      cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods · eb95419b
      Tejun Heo authored
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup *
      in subsystem implementations for the following reasons.
      
      * With unified hierarchy, subsystems will be dynamically bound and
        unbound from cgroups and thus css's (cgroup_subsys_state) may be
        created and destroyed dynamically over the lifetime of a cgroup,
        which is different from the current state where all css's are
        allocated and destroyed together with the associated cgroup.  This
        in turn means that cgroup_css() should be synchronized and may
        return NULL, making it more cumbersome to use.
      
      * Differing levels of per-subsystem granularity in the unified
        hierarchy means that the task and descendant iterators should behave
        differently depending on the specific subsystem the iteration is
        being performed for.
      
      * In majority of the cases, subsystems only care about its part in the
        cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
        often obtain the matching css pointer from the cgroup and don't
        bother with the cgroup pointer itself.  Passing around css fits
        much better.
      
      This patch converts all cgroup_subsys methods to take @css instead of
      @cgroup.  The conversions are mostly straight-forward.  A few
      noteworthy changes are
      
      * ->css_alloc() now takes css of the parent cgroup rather than the
        pointer to the new cgroup as the css for the new cgroup doesn't
        exist yet.  Knowing the parent css is enough for all the existing
        subsystems.
      
      * In kernel/cgroup.c::offline_css(), unnecessary open coded css
        dereference is replaced with local variable access.
      
      This patch shouldn't cause any behavior differences.
      
      v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
          with local variable @css as suggested by Li Zefan.
      
          Rebased on top of new for-3.12 which includes for-3.11-fixes so
          that ->css_free() invocation added by da0a12ca ("cgroup: fix a
          leak when percpu_ref_init() fails") is converted too.  Suggested
          by Li Zefan.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      eb95419b
    • Tejun Heo's avatar
      cgroup: add css_parent() · 63876986
      Tejun Heo authored
      Currently, controllers have to explicitly follow the cgroup hierarchy
      to find the parent of a given css.  cgroup is moving towards using
      cgroup_subsys_state as the main controller interface construct, so
      let's provide a way to climb the hierarchy using just csses.
      
      This patch implements css_parent() which, given a css, returns its
      parent.  The function is guarnateed to valid non-NULL parent css as
      long as the target css is not at the top of the hierarchy.
      
      freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
      are converted to use css_parent() instead of accessing cgroup->parent
      directly.
      
      * __parent_ca() is dropped from cpuacct and its usage is replaced with
        parent_ca().  The only difference between the two was NULL test on
        cgroup->parent which is now embedded in css_parent() making the
        distinction moot.  Note that eventually a css->parent field will be
        added to css and the NULL check in css_parent() will go away.
      
      This patch shouldn't cause any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      63876986
    • Tejun Heo's avatar
      cgroup: add/update accessors which obtain subsys specific data from css · a7c6d554
      Tejun Heo authored
      css (cgroup_subsys_state) is usually embedded in a subsys specific
      data structure.  Subsystems either use container_of() directly to cast
      from css to such data structure or has an accessor function wrapping
      such cast.  As cgroup as whole is moving towards using css as the main
      interface handle, add and update such accessors to ease dealing with
      css's.
      
      All accessors explicitly handle NULL input and return NULL in those
      cases.  While this looks like an extra branch in the code, as all
      controllers specific data structures have css as the first field, the
      casting doesn't involve any offsetting and the compiler can trivially
      optimize out the branch.
      
      * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
        accessor.  Added.
      
      * memory, hugetlb and devices already had one but didn't explicitly
        handle NULL input.  Updated.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      a7c6d554
    • Tejun Heo's avatar
      cgroup: add subsystem pointer to cgroup_subsys_state · 72c97e54
      Tejun Heo authored
      Currently, given a cgroup_subsys_state, there's no way to find out
      which subsystem the css is for, which we'll need to convert the cgroup
      controller API to primarily use @css instead of @cgroup.  This patch
      adds cgroup_subsys_state->ss which points to the subsystem the @css
      belongs to.
      
      While at it, remove the comment about accessing @css->cgroup to
      determine the hierarchy.  cgroup core will provide API to traverse
      hierarchy of css'es and we don't want subsystems to directly walk
      cgroup hierarchies anymore.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      72c97e54
    • Tejun Heo's avatar
      hugetlb_cgroup: pass around @hugetlb_cgroup instead of @cgroup · 3f798518
      Tejun Heo authored
      cgroup controller API will be converted to primarily use struct
      cgroup_subsys_state instead of struct cgroup.  In preparation, make
      hugetlb_cgroup functions pass around struct hugetlb_cgroup instead of
      struct cgroup.
      
      This patch shouldn't cause any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      3f798518
    • Tejun Heo's avatar
      netprio_cgroup: pass around @css instead of @cgroup and kill struct cgroup_netprio_state · 6d37b974
      Tejun Heo authored
      cgroup controller API will be converted to primarily use struct
      cgroup_subsys_state instead of struct cgroup.  In preparation, make
      the internal functions of netprio_cgroup pass around @css instead of
      @cgrp.
      
      While at it, kill struct cgroup_netprio_state which only contained
      struct cgroup_subsys_state without serving any purpose.  All functions
      are converted to deal with @css directly.
      
      This patch shouldn't cause any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d37b974