• Tetsuo Handa's avatar
    oom, oom_reaper: do not enqueue same task twice · 9bcdeb51
    Tetsuo Handa authored
    Arkadiusz reported that enabling memcg's group oom killing causes
    strange memcg statistics where there is no task in a memcg despite the
    number of tasks in that memcg is not 0.  It turned out that there is a
    bug in wake_oom_reaper() which allows enqueuing same task twice which
    makes impossible to decrease the number of tasks in that memcg due to a
    refcount leak.
    
    This bug existed since the OOM reaper became invokable from
    task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
    
      T1@P1     |T2@P1     |T3@P1     |OOM reaper
      ----------+----------+----------+------------
                                       # Processing an OOM victim in a different memcg domain.
                            try_charge()
                              mem_cgroup_out_of_memory()
                                mutex_lock(&oom_lock)
                 try_charge()
                   mem_cgroup_out_of_memory()
                     mutex_lock(&oom_lock)
      try_charge()
        mem_cgroup_out_of_memory()
          mutex_lock(&oom_lock)
                                out_of_memory()
                                  oom_kill_process(P1)
                                    do_send_sig_info(SIGKILL, @P1)
                                    mark_oom_victim(T1@P1)
                                    wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
                                mutex_unlock(&oom_lock)
                     out_of_memory()
                       mark_oom_victim(T2@P1)
                       wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
                     mutex_unlock(&oom_lock)
          out_of_memory()
            mark_oom_victim(T1@P1)
            wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
          mutex_unlock(&oom_lock)
                                       # Completed processing an OOM victim in a different memcg domain.
                                       spin_lock(&oom_reaper_lock)
                                       # T1P1 is dequeued.
                                       spin_unlock(&oom_reaper_lock)
    
    but memcg's group oom killing made it easier to trigger this bug by
    calling wake_oom_reaper() on the same task from one out_of_memory()
    request.
    
    Fix this bug using an approach used by commit 855b0183 ("oom,
    oom_reaper: disable oom_reaper for oom_kill_allocating_task").  As a
    side effect of this patch, this patch also avoids enqueuing multiple
    threads sharing memory via task_will_free_mem(current) path.
    
    Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
    Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
    Fixes: af8e15cc ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
    Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Reported-by: default avatarArkadiusz Miskiewicz <arekm@maven.pl>
    Tested-by: default avatarArkadiusz Miskiewicz <arekm@maven.pl>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarRoman Gushchin <guro@fb.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Aleksa Sarai <asarai@suse.de>
    Cc: Jay Kamat <jgkamat@fb.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9bcdeb51
oom_kill.c 31.2 KB