Commit ba535c1c authored by Suren Baghdasaryan's avatar Suren Baghdasaryan Committed by Linus Torvalds

mm/oom_kill: allow process_mrelease to run under mmap_lock protection

With exit_mmap holding mmap_write_lock during free_pgtables call,
process_mrelease does not need to elevate mm->mm_users in order to
prevent exit_mmap from destrying pagetables while __oom_reap_task_mm is
walking the VMA tree.  The change prevents process_mrelease from calling
the last mmput, which can lead to waiting for IO completion in exit_aio.

Link: https://lkml.kernel.org/r/20211209191325.3069345-3-surenb@google.comSigned-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
Acked-by: default avatarMichal Hocko <mhocko@suse.com>
Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian@brauner.io>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent cc6dcfee
...@@ -1170,15 +1170,15 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) ...@@ -1170,15 +1170,15 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
goto put_task; goto put_task;
} }
if (mmget_not_zero(p->mm)) { mm = p->mm;
mm = p->mm; mmgrab(mm);
if (task_will_free_mem(p))
reap = true; if (task_will_free_mem(p))
else { reap = true;
/* Error only if the work has not been done already */ else {
if (!test_bit(MMF_OOM_SKIP, &mm->flags)) /* Error only if the work has not been done already */
ret = -EINVAL; if (!test_bit(MMF_OOM_SKIP, &mm->flags))
} ret = -EINVAL;
} }
task_unlock(p); task_unlock(p);
...@@ -1189,13 +1189,16 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) ...@@ -1189,13 +1189,16 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
ret = -EINTR; ret = -EINTR;
goto drop_mm; goto drop_mm;
} }
if (!__oom_reap_task_mm(mm)) /*
* Check MMF_OOM_SKIP again under mmap_read_lock protection to ensure
* possible change in exit_mmap is seen
*/
if (!test_bit(MMF_OOM_SKIP, &mm->flags) && !__oom_reap_task_mm(mm))
ret = -EAGAIN; ret = -EAGAIN;
mmap_read_unlock(mm); mmap_read_unlock(mm);
drop_mm: drop_mm:
if (mm) mmdrop(mm);
mmput(mm);
put_task: put_task:
put_task_struct(task); put_task_struct(task);
return ret; return ret;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment