• Michal Hocko's avatar
    Revert "memcg: get rid of percpu_charge_mutex lock" · 9f50fad6
    Michal Hocko authored
    This reverts commit 8521fc50.
    
    The patch incorrectly assumes that using atomic FLUSHING_CACHED_CHARGE
    bit operations is sufficient but that is not true.  Johannes Weiner has
    reported a crash during parallel memory cgroup removal:
    
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
      Oops: 0000 [#1] PREEMPT SMP
      Pid: 19677, comm: rmdir Tainted: G        W   3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
      RIP: 0010:[<ffffffff81083b70>]  css_is_ancestor+0x20/0x70
      RSP: 0018:ffff880077b09c88  EFLAGS: 00010202
      Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
      Call Trace:
       [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
       [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
       [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
       [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
       [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
       [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
       [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
       [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
       [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
    
    We are crashing because we try to dereference cached memcg when we are
    checking whether we should wait for draining on the cache.  The cache is
    already cleaned up, though.
    
    There is also a theoretical chance that the cached memcg gets freed
    between we test for the FLUSHING_CACHED_CHARGE and dereference it in
    mem_cgroup_same_or_subtree:
    
            CPU0                    CPU1                         CPU2
      mem=stock->cached
      stock->cached=NULL
                                  clear_bit
                                                            test_and_set_bit
      test_bit()                    ...
      <preempted>             mem_cgroup_destroy
      use after free
    
    The percpu_charge_mutex protected from this race because sync draining
    is exclusive.
    
    It is safer to revert now and come up with a more parallel
    implementation later.
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
    Reported-by: default avatarJohannes Weiner <jweiner@redhat.com>
    Acked-by: default avatarJohannes Weiner <jweiner@redhat.com>
    Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: stable@kernel.org
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9f50fad6
memcontrol.c 145 KB