1. 05 Nov, 2012 9 commits
    • Tejun Heo's avatar
      Merge branch 'cgroup-rmdir-updates' into cgroup/for-3.8 · 1db1e31b
      Tejun Heo authored
      Pull rmdir updates into for-3.8 so that further callback updates can
      be put on top.  This pull created a trivial conflict between the
      following two commits.
      
        8c7f6edb ("cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them")
        ed957793 ("cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs")
      
      The former added a field to cgroup_subsys and the latter removed one
      from it.  They happen to be colocated causing the conflict.  Keeping
      what's added and removing what's removed resolves the conflict.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1db1e31b
    • Tejun Heo's avatar
      cgroup: make ->pre_destroy() return void · bcf6de1b
      Tejun Heo authored
      All ->pre_destory() implementations return 0 now, which is the only
      allowed return value.  Make it return void.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      bcf6de1b
    • Michal Hocko's avatar
      hugetlb: do not fail in hugetlb_cgroup_pre_destroy · 9d093cb1
      Michal Hocko authored
      Now that pre_destroy callbacks are called from the context where neither
      any task can attach the group nor any children group can be added there
      is no other way to fail from hugetlb_pre_destroy.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarGlauber Costa <glommer@parallels.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9d093cb1
    • Michal Hocko's avatar
      memcg: make mem_cgroup_reparent_charges non failing · ab5196c2
      Michal Hocko authored
      Now that pre_destroy callbacks are called from the context where neither
      any task can attach the group nor any children group can be added there
      is no other way to fail from mem_cgroup_pre_destroy.
      mem_cgroup_pre_destroy doesn't have to take a reference to memcg's css
      because all css' are marked dead already.
      
      tj: Remove now unused local variable @cgrp from
          mem_cgroup_reparent_charges().
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarGlauber Costa <glommer@parallels.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      ab5196c2
    • Tejun Heo's avatar
      cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() · b25ed609
      Tejun Heo authored
      CGRP_WAIT_ON_RMDIR is another kludge which was added to make cgroup
      destruction rollback somewhat working.  cgroup_rmdir() used to drain
      CSS references and CGRP_WAIT_ON_RMDIR and the associated waitqueue and
      helpers were used to allow the task performing rmdir to wait for the
      next relevant event.
      
      Unfortunately, the wait is visible to controllers too and the
      mechanism got exposed to memcg by 88703267 ("cgroup avoid permanent
      sleep at rmdir").
      
      Now that the draining and retries are gone, CGRP_WAIT_ON_RMDIR is
      unnecessary.  Remove it and all the mechanisms supporting it.  Note
      that memcontrol.c changes are essentially revert of 88703267
      ("cgroup avoid permanent sleep at rmdir").
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      b25ed609
    • Tejun Heo's avatar
      cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() · 1a90dd50
      Tejun Heo authored
      Because ->pre_destroy() could fail and can't be called under
      cgroup_mutex, cgroup destruction did something very ugly.
      
        1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.
      
        2. Release cgroup_mutex and call ->pre_destroy().
      
        3. Re-grab cgroup_mutex and verify it can still be destroyed; fail
           otherwise.
      
        4. Continue destroying.
      
      In addition to being ugly, it has been always broken in various ways.
      For example, memcg ->pre_destroy() expects the cgroup to be inactive
      after it's done but tasks can be attached and detached between #2 and
      #3 and the conditions that memcg verified in ->pre_destroy() might no
      longer hold by the time control reaches #3.
      
      Now that ->pre_destroy() is no longer allowed to fail.  We can switch
      to the following.
      
        1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.
      
        2. Deactivate CSS's and mark the cgroup removed thus preventing any
           further operations which can invalidate the verification from #1.
      
        3. Release cgroup_mutex and call ->pre_destroy().
      
        4. Re-grab cgroup_mutex and continue destroying.
      
      After this change, controllers can safely assume that ->pre_destroy()
      will only be called only once for a given cgroup and, once
      ->pre_destroy() is called, the cgroup will stay dormant till it's
      destroyed.
      
      This removes the only reason ->pre_destroy() can fail - new task being
      attached or child cgroup being created inbetween.  Error out path is
      removed and ->pre_destroy() invocation is open coded in
      cgroup_rmdir().
      
      v2: cgroup_call_pre_destroy() removal moved to this patch per Michal.
          Commit message updated per Glauber.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Glauber Costa <glommer@parallels.com>
      1a90dd50
    • Tejun Heo's avatar
      cgroup: use cgroup_lock_live_group(parent) in cgroup_create() · 976c06bc
      Tejun Heo authored
      This patch makes cgroup_create() fail if @parent is marked removed.
      This is to prepare for further updates to cgroup_rmdir() path.
      
      Note that this change isn't strictly necessary.  cgroup can only be
      created via mkdir and the removed marking and dentry removal happen
      without releasing cgroup_mutex, so cgroup_create() can never race with
      cgroup_rmdir().  Even after the scheduled updates to cgroup_rmdir(),
      cgroup_mkdir() and cgroup_rmdir() are synchronized by i_mutex
      rendering the added liveliness check unnecessary.
      
      Do it anyway such that locking is contained inside cgroup proper and
      we don't get nasty surprises if we ever grow another caller of
      cgroup_create().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      976c06bc
    • Tejun Heo's avatar
      cgroup: kill CSS_REMOVED · e9316080
      Tejun Heo authored
      CSS_REMOVED is one of the several contortions which were necessary to
      support css reference draining on cgroup removal.  All css->refcnts
      which need draining should be deactivated and verified to equal zero
      atomically w.r.t. css_tryget().  If any one isn't zero, all refcnts
      needed to be re-activated and css_tryget() shouldn't fail in the
      process.
      
      This was achieved by letting css_tryget() busy-loop until either the
      refcnt is reactivated (failed removal attempt) or CSS_REMOVED is set
      (committing to removal).
      
      Now that css refcnt draining is no longer used, there's no need for
      atomic rollback mechanism.  css_tryget() simply can look at the
      reference count and fail if it's deactivated - it's never getting
      re-activated.
      
      This patch removes CSS_REMOVED and updates __css_tryget() to fail if
      the refcnt is deactivated.  As deactivation and removal are a single
      step now, they no longer need to be protected against css_tryget()
      happening from irq context.  Remove local_irq_disable/enable() from
      cgroup_rmdir().
      
      Note that this removes css_is_removed() whose only user is VM_BUG_ON()
      in memcontrol.c.  We can replace it with a check on the refcnt but
      given that the only use case is a debug assert, I think it's better to
      simply unexport it.
      
      v2: Comment updated and explanation on local_irq_disable/enable()
          added per Michal Hocko.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      e9316080
    • Tejun Heo's avatar
      cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs · ed957793
      Tejun Heo authored
      2ef37d3f ("memcg: Simplify mem_cgroup_force_empty_list error
      handling") removed the last user of __DEPRECATED_clear_css_refs.  This
      patch removes __DEPRECATED_clear_css_refs and mechanisms to support
      it.
      
      * Conditionals dependent on __DEPRECATED_clear_css_refs removed.
      
      * cgroup_clear_css_refs() can no longer fail.  All that needs to be
        done are deactivating refcnts, setting CSS_REMOVED and putting the
        base reference on each css.  Remove cgroup_clear_css_refs() and the
        failure path, and open-code the loops into cgroup_rmdir().
      
      This patch keeps the two for_each_subsys() loops separate while open
      coding them.  They can be merged now but there are scheduled changes
      which need them to be separate, so keep them separate to reduce the
      amount of churn.
      
      local_irq_save/restore() from cgroup_clear_css_refs() are replaced
      with local_irq_disable/enable() for simplicity.  This is safe as
      cgroup_rmdir() is always called with IRQ enabled.  Note that this IRQ
      switching is necessary to ensure that css_tryget() isn't called from
      IRQ context on the same CPU while lower context is between CSS
      deactivation and setting CSS_REMOVED as css_tryget() would hang
      forever in such cases waiting for CSS to be re-activated or
      CSS_REMOVED set.  This will go away soon.
      
      v2: cgroup_call_pre_destroy() removal dropped per Michal.  Commit
          message updated to explain local_irq_disable/enable() conversion.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      ed957793
  2. 29 Oct, 2012 3 commits
    • Michal Hocko's avatar
      memcg: Simplify mem_cgroup_force_empty_list error handling · 2ef37d3f
      Michal Hocko authored
      mem_cgroup_force_empty_list currently tries to remove all pages from
      the given LRU. To prevent from temoporary failures (EBUSY returned by
      mem_cgroup_move_parent) it uses a margin to the current LRU pages and
      returns the true if there are still some pages left on the list.
      
      If we consider that mem_cgroup_move_parent fails only when it is racing
      with somebody else removing (uncharging) the page or when the page is
      migrated then it is obvious that all those failures are only temporal
      and so we can safely retry later.
      Let's get rid of the safety margin and make the loop really wait for
      the empty LRU. The caller should still make sure that all charges have
      been removed from the res_counter because mem_cgroup_replace_page_cache
      might add a page to the LRU after the list_empty check (it doesn't touch
      res_counter though).
      This catches most of the cases except for shmem which might call
      mem_cgroup_replace_page_cache with a page which is not charged and on
      the LRU yet but this was the case also without this patch. In order to
      fix this we need a guarantee that try_get_mem_cgroup_from_page falls
      back to the current mm's cgroup so it needs css_tryget to fail. This
      will be fixed up in a later patch because it needs a help from cgroup
      core (pre_destroy has to be called after css is cleared).
      
      Although mem_cgroup_pre_destroy can still fail (if a new task or a new
      sub-group appears) there is no reason to retry pre_destroy callback from
      the cgroup core. This means that __DEPRECATED_clear_css_refs has lost
      its meaning and it can be removed.
      
      Changes since v2
      - remove __DEPRECATED_clear_css_refs
      
      Changes since v1
      - use kerndoc
      - be more specific about mem_cgroup_move_parent possible failures
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarGlauber Costa <glommer@parallels.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2ef37d3f
    • Michal Hocko's avatar
      memcg: root_cgroup cannot reach mem_cgroup_move_parent · d8423011
      Michal Hocko authored
      The root cgroup cannot be destroyed so we never hit it down the
      mem_cgroup_pre_destroy path and mem_cgroup_force_empty_write shouldn't
      even try to do anything if called for the root.
      
      This means that mem_cgroup_move_parent doesn't have to bother with the
      root cgroup and it can assume it can always move charges upwards.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarGlauber Costa <glommer@parallels.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d8423011
    • Michal Hocko's avatar
      memcg: split mem_cgroup_force_empty into reclaiming and reparenting parts · c26251f9
      Michal Hocko authored
      mem_cgroup_force_empty did two separate things depending on free_all
      parameter from the very beginning. It either reclaimed as many pages as
      possible and moved the rest to the parent or just moved charges to the
      parent. The first variant is used as memory.force_empty callback while
      the later is used from the mem_cgroup_pre_destroy.
      
      The whole games around gotos are far from being nice and there is no
      reason to keep those two functions inside one. Let's split them and
      also move the responsibility for css reference counting to their callers
      to make to code easier.
      
      This patch doesn't have any functional changes.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarGlauber Costa <glommer@parallels.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      c26251f9
  3. 26 Oct, 2012 1 commit
    • Oleg Nesterov's avatar
      freezer: change ptrace_stop/do_signal_stop to use freezable_schedule() · 5d8f72b5
      Oleg Nesterov authored
      try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
      to ensure that a task doing STOPPED/TRACED -> RUNNING transition
      can't escape freezing. This mostly works, but ptrace_stop() does
      not necessarily call schedule(), it can change task->state back to
      RUNNING and check freezing() without any lock/barrier in between.
      
      We could add the necessary barrier, but this patch changes
      ptrace_stop() and do_signal_stop() to use freezable_schedule().
      This fixes the race, freezer_count() and freezer_should_skip()
      carefully avoid the race.
      
      And this simplifies the code, try_to_freeze_tasks/update_if_frozen
      no longer need to use task_is_stopped_or_traced() checks with the
      non trivial assumptions. We can rely on the mechanism which was
      specially designed to mark the sleeping task as "frozen enough".
      
      v2: As Tejun pointed out, we can also change get_signal_to_deliver()
      and move try_to_freeze() up before 'relock' label.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5d8f72b5
  4. 20 Oct, 2012 3 commits
    • Tejun Heo's avatar
      cgroup_freezer: don't use cgroup_lock_live_group() · ead5c473
      Tejun Heo authored
      freezer_read/write() used cgroup_lock_live_group() to synchronize
      against task migration into and out of the target cgroup.
      cgroup_lock_live_group() grabs the internal cgroup lock and using it
      from outside cgroup core leads to complex and fragile locking
      dependency issues which are difficult to resolve.
      
      Now that freezer_can_attach() is replaced with freezer_attach() and
      update_if_frozen() updated, nothing requires excluding migration
      against freezer state reads and changes.
      
      This patch removes cgroup_lock_live_group() and the matching
      cgroup_unlock() usages.  The prone-to-bitrot, already outdated and
      unnecessary global lock hierarchy documentation is replaced with
      documentation in local scope.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Li Zefan <lizefan@huawei.com>
      ead5c473
    • Tejun Heo's avatar
      cgroup_freezer: prepare update_if_frozen() for locking change · b4d18311
      Tejun Heo authored
      Locking will change such that migration can happen while
      freezer_read/write() is in progress.  This means that
      update_if_frozen() can no longer assume that all tasks in the cgroup
      coform to the current freezer state - newly migrated tasks which
      haven't finished freezer_attach() yet might be in any state.
      
      This patch updates update_if_frozen() such that it no longer verifies
      task states against freezer state.  It now simply decides whether
      FREEZING stage is complete.
      
      This removal of verification makes it meaningless to call from
      freezer_change_state().  Drop it and move the fast exit test from
      freezer_read() - the only left caller - to update_if_frozen().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Li Zefan <lizefan@huawei.com>
      b4d18311
    • Tejun Heo's avatar
      cgroup_freezer: allow moving tasks in and out of a frozen cgroup · 8755ade6
      Tejun Heo authored
      cgroup_freezer is one of the few users of cgroup_subsys->can_attach()
      and uses it to prevent tasks from being migrated into or out of a
      frozen cgroup.  This makes cgroup_freezer cumbersome to use especially
      when co-mounted with other controllers.
      
      ->can_attach() is problematic in general as it can make co-mounting
      multiple cgroups difficult - migrating tasks may fail for reasons
      completely irrelevant for other controllers.  freezer_can_attach() in
      particular is more problematic because it messes with cgroup internal
      locking to ensure that the state verification performed at
      freezer_can_attach() stays valid until migration is complete.
      
      This patch replaces freezer_can_attach() with freezer_attach() so that
      tasks are always allowed to migrate - they are nudged into the
      conforming state from freezer_attach().  This means that there can be
      tasks which are being migrated which don't conform to the current
      cgroup_freezer state until freezer_attach() is complete.  Under the
      current locking scheme, the only such place is freezer_fork() which is
      updated to handle such window.
      
      While this patch doesn't remove the use of internal cgroup locking
      from freezer_read/write() paths, it removes the requirement to keep
      the freezer state constant while migrating and enables such change.
      
      Note that this creates a userland visible behavior change - FROZEN
      cgroup can no longer be used to lock migrations in and out of the
      cgroup.  This behavior change is intended.  I don't think the feature
      is necessary - userland should coordinate accesses to cgroup fs anyway
      - and even if the feature is needed cgroup_freezer is the completely
      wrong place to implement it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      LKML-Reference: <1350426526-14254-1-git-send-email-tj@kernel.org>
      Cc: Matt Helsley <matthltc@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Li Zefan <lizefan@huawei.com>
      8755ade6
  5. 16 Oct, 2012 4 commits
    • Tejun Heo's avatar
      cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks · 3c426d5e
      Tejun Heo authored
      cgroup_freezer doesn't transition from FREEZING to FROZEN if the
      cgroup contains PF_NOFREEZE tasks or tasks sleeping with
      PF_FREEZER_SKIP set.
      
      Only kernel tasks can be non-freezable (PF_NOFREEZE) and there's
      nothing cgroup_freezer or userland can do about or to it.  It's
      pointless to stall the transition for PF_NOFREEZE tasks.
      
      PF_FREEZER_SKIP indicates that the task can be skipped when
      determining whether frozen state is reached.  A task with
      PF_FREEZER_SKIP is guaranteed to perform try_to_freeze() after it
      wakes up and can be considered frozen much like stopped or traced
      tasks.  Note that a vfork parent uses PF_FREEZER_SKIP while waiting
      for the child.
      
      This updates update_if_frozen() such that it only considers freezable
      tasks and treats %true freezer_should_skip() tasks as frozen.
      
      This allows cgroups w/ kthreads and vfork parents successfully reach
      FROZEN state.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      3c426d5e
    • Tejun Heo's avatar
      cgroup_freezer: make it official that writes to freezer.state don't fail · 51f246ed
      Tejun Heo authored
      try_to_freeze_cgroup() has condition checks which are intended to fail
      the write operation to freezer.state if there are tasks which can't be
      frozen.  The condition checks have been broken for quite some time
      now.  freeze_task() returns %false if the target task can't be frozen,
      so num_cant_freeze_now is never incremented.
      
      In addition, strangely, cgroup freezing proceeds even after the write
      is failed, which is rather broken.
      
      This patch rips out the non-working code intended to fail the write to
      freezer.state when the cgroup contains non-freezable tasks and makes
      it official that writes to freezer.state succeed whether there are
      non-freezable tasks in the cgroup or not.
      
      This leaves is_task_frozen_enough() with only one user -
      upste_if_frozen().  Collapse it into the caller.  Note that this
      removes an extra call to freezing().
      
      This doesn't cause any userland behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      51f246ed
    • Tejun Heo's avatar
      freezer: add missing mb's to freezer_count() and freezer_should_skip() · dd67d32d
      Tejun Heo authored
      A task is considered frozen enough between freezer_do_not_count() and
      freezer_count() and freezers use freezer_should_skip() to test this
      condition.  This supposedly works because freezer_count() always calls
      try_to_freezer() after clearing %PF_FREEZER_SKIP.
      
      However, there currently is nothing which guarantees that
      freezer_count() sees %true freezing() after clearing %PF_FREEZER_SKIP
      when freezing is in progress, and vice-versa.  A task can escape the
      freezing condition in effect by freezer_count() seeing !freezing() and
      freezer_should_skip() seeing %PF_FREEZER_SKIP.
      
      This patch adds smp_mb()'s to freezer_count() and
      freezer_should_skip() such that either %true freezing() is visible to
      freezer_count() or !PF_FREEZER_SKIP is visible to
      freezer_should_skip().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@vger.kernel.org
      dd67d32d
    • Tejun Heo's avatar
      cgroup: cgroup_subsys->fork() should be called after the task is added to css_set · 5edee61e
      Tejun Heo authored
      cgroup core has a bug which violates a basic rule about event
      notifications - when a new entity needs to be added, you add that to
      the notification list first and then make the new entity conform to
      the current state.  If done in the reverse order, an event happening
      inbetween will be lost.
      
      cgroup_subsys->fork() is invoked way before the new task is added to
      the css_set.  Currently, cgroup_freezer is the only user of ->fork()
      and uses it to make new tasks conform to the current state of the
      freezer.  If FROZEN state is requested while fork is in progress
      between cgroup_fork_callbacks() and cgroup_post_fork(), the child
      could escape freezing - the cgroup isn't frozen when ->fork() is
      called and the freezer couldn't see the new task on the css_set.
      
      This patch moves cgroup_subsys->fork() invocation to
      cgroup_post_fork() after the new task is added to the css_set.
      cgroup_fork_callbacks() is removed.
      
      Because now a task may be migrated during cgroup_subsys->fork(),
      freezer_fork() is updated so that it adheres to the usual RCU locking
      and the rather pointless comment on why locking can be different there
      is removed (if it doesn't make anything simpler, why even bother?).
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@vger.kernel.org
      5edee61e
  6. 14 Oct, 2012 6 commits
    • Linus Torvalds's avatar
      Linux 3.7-rc1 · ddffeb8c
      Linus Torvalds authored
      ddffeb8c
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · a5ef3f7d
      Linus Torvalds authored
      Pull MIPS update from Ralf Baechle:
       "Cleanups and fixes for breakage that occured earlier during this merge
        phase.  Also a few patches that didn't make the first pull request.
        Of those is the Alchemy work that merges code for many of the SOCs and
        evaluation boards thus among other code shrinkage, reduces the number
        of MIPS defconfigs by 5."
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (22 commits)
        MIPS: SNI: Switch RM400 serial to SCCNXP driver
        MIPS: Remove unused empty_bad_pmd_table[] declaration.
        MIPS: MT: Remove kspd.
        MIPS: Malta: Fix section mismatch.
        MIPS: asm-offset.c: Delete unused irq_cpustat_t struct offsets.
        MIPS: Alchemy: Merge PB1100/1500 support into DB1000 code.
        MIPS: Alchemy: merge PB1550 support into DB1550 code
        MIPS: Alchemy: Single kernel for DB1200/1300/1550
        MIPS: Optimize TLB refill for RI/XI configurations.
        MIPS: proc: Cleanup printing of ASEs.
        MIPS: Hardwire detection of DSP ASE Rev 2 for systems, as required.
        MIPS: Add detection of DSP ASE Revision 2.
        MIPS: Optimize pgd_init and pmd_init
        MIPS: perf: Add perf functionality for BMIPS5000
        MIPS: perf: Split the Kconfig option CONFIG_MIPS_MT_SMP
        MIPS: perf: Remove unnecessary #ifdef
        MIPS: perf: Add cpu feature bit for PCI (performance counter interrupt)
        MIPS: perf: Change the "mips_perf_event" table unsupported indicator.
        MIPS: Align swapper_pg_dir to 64K for better TLB Refill code.
        vmlinux.lds.h: Allow architectures to add sections to the front of .bss
        ...
      a5ef3f7d
    • Linus Torvalds's avatar
      Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · d25282d1
      Linus Torvalds authored
      Pull module signing support from Rusty Russell:
       "module signing is the highlight, but it's an all-over David Howells frenzy..."
      
      Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.
      
      * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
        X.509: Fix indefinite length element skip error handling
        X.509: Convert some printk calls to pr_devel
        asymmetric keys: fix printk format warning
        MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
        MODSIGN: Make mrproper should remove generated files.
        MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
        MODSIGN: Use the same digest for the autogen key sig as for the module sig
        MODSIGN: Sign modules during the build process
        MODSIGN: Provide a script for generating a key ID from an X.509 cert
        MODSIGN: Implement module signature checking
        MODSIGN: Provide module signing public keys to the kernel
        MODSIGN: Automatically generate module signing keys if missing
        MODSIGN: Provide Kconfig options
        MODSIGN: Provide gitignore and make clean rules for extra files
        MODSIGN: Add FIPS policy
        module: signature checking hook
        X.509: Add a crypto key parser for binary (DER) X.509 certificates
        MPILIB: Provide a function to read raw data into an MPI
        X.509: Add an ASN.1 decoder
        X.509: Add simple ASN.1 grammar compiler
        ...
      d25282d1
    • Matt Fleming's avatar
      x86, boot: Explicitly include autoconf.h for hostprogs · b6eea87f
      Matt Fleming authored
      The hostprogs need access to the CONFIG_* symbols found in
      include/generated/autoconf.h.  But commit abbf1590 ("UAPI: Partition
      the header include path sets and add uapi/ header directories") replaced
      $(LINUXINCLUDE) with $(USERINCLUDE) which doesn't contain the necessary
      include paths.
      
      This has the undesirable effect of breaking the EFI boot stub because
      the #ifdef CONFIG_EFI_STUB code in arch/x86/boot/tools/build.c is
      never compiled.
      
      It should also be noted that because $(USERINCLUDE) isn't exported by
      the top-level Makefile it's actually empty in arch/x86/boot/Makefile.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6eea87f
    • Ingo Molnar's avatar
      perf: Fix UAPI fallout · 7d380c8f
      Ingo Molnar authored
      The UAPI commits forgot to test tooling builds such as tools/perf/,
      and this fixes the fallout.
      
      Manual conversion.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d380c8f
    • Linus Torvalds's avatar
      Merge branch 'late-for-linus' of git://git.linaro.org/people/rmk/linux-arm · 3d6ee36d
      Linus Torvalds authored
      Pull ARM update from Russell King:
       "This is the final round of stuff for ARM, left until the end of the
        merge window to reduce the number of conflicts.  This set contains the
        ARM part of David Howells UAPI changes, and a fix to the ordering of
        'select' statements in ARM Kconfig files (see the appropriate commit
        for why this happened - thanks to Andrew Morton for pointing out the
        problem.)
      
        I've left this as long as I dare for this window to avoid conflicts,
        and I regenerated the config patch yesterday, posting it to our
        mailing list for review and testing.  I have several acks which
        include successful test reports for it.
      
        However, today I notice we've got new conflicts with previously unseen
        code...  though that conflict should be trivial (it's my changes vs a
        one liner.)"
      
      * 'late-for-linus' of git://git.linaro.org/people/rmk/linux-arm:
        ARM: config: make sure that platforms are ordered by option string
        ARM: config: sort select statements alphanumerically
        UAPI: (Scripted) Disintegrate arch/arm/include/asm
      
      Fix up fairly conflict in arch/arm/Kconfig (the select re-organization
      vs recent addition of GENERIC_KERNEL_EXECVE)
      3d6ee36d
  7. 13 Oct, 2012 14 commits