Commit a53b8315 authored by Oleg Nesterov's avatar Oleg Nesterov Committed by Linus Torvalds

exit: pidns: fix/update the comments in zap_pid_ns_processes()

The comments in zap_pid_ns_processes() are not clear, we need to explain
how this code actually works.

1. "Ignore SIGCHLD" looks like optimization but it is not, we also
   need this for correctness.

2. The comment above sys_wait4() could tell more.

   EXIT_ZOMBIE child is only possible if it has exited before we
   ignored SIGCHLD. Or if it is traced from the parent namespace,
   but in this case it will be reaped by debugger after detach,
   sys_wait4() acts as a synchronization point.

3. The comment about TASK_DEAD (EXIT_DEAD in fact) children is
   outdated. Contrary to what it says we do not need to make sure
   they all go away after 0a01f2cc "pidns: Make the pidns proc
   mount/umount logic obvious".

   At the same time, we do need to wait for nr_hashed==init_pids,
   but the reasons are quite different and not obvious: setns().
Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Sterling Alexander <stalexan@redhat.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 24c037eb
...@@ -190,7 +190,11 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) ...@@ -190,7 +190,11 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
/* Don't allow any more processes into the pid namespace */ /* Don't allow any more processes into the pid namespace */
disable_pid_allocation(pid_ns); disable_pid_allocation(pid_ns);
/* Ignore SIGCHLD causing any terminated children to autoreap */ /*
* Ignore SIGCHLD causing any terminated children to autoreap.
* This speeds up the namespace shutdown, plus see the comment
* below.
*/
spin_lock_irq(&me->sighand->siglock); spin_lock_irq(&me->sighand->siglock);
me->sighand->action[SIGCHLD - 1].sa.sa_handler = SIG_IGN; me->sighand->action[SIGCHLD - 1].sa.sa_handler = SIG_IGN;
spin_unlock_irq(&me->sighand->siglock); spin_unlock_irq(&me->sighand->siglock);
...@@ -223,15 +227,31 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) ...@@ -223,15 +227,31 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
} }
read_unlock(&tasklist_lock); read_unlock(&tasklist_lock);
/* Firstly reap the EXIT_ZOMBIE children we may have. */ /*
* Reap the EXIT_ZOMBIE children we had before we ignored SIGCHLD.
* sys_wait4() will also block until our children traced from the
* parent namespace are detached and become EXIT_DEAD.
*/
do { do {
clear_thread_flag(TIF_SIGPENDING); clear_thread_flag(TIF_SIGPENDING);
rc = sys_wait4(-1, NULL, __WALL, NULL); rc = sys_wait4(-1, NULL, __WALL, NULL);
} while (rc != -ECHILD); } while (rc != -ECHILD);
/* /*
* sys_wait4() above can't reap the TASK_DEAD children. * sys_wait4() above can't reap the EXIT_DEAD children but we do not
* Make sure they all go away, see free_pid(). * really care, we could reparent them to the global init. We could
* exit and reap ->child_reaper even if it is not the last thread in
* this pid_ns, free_pid(nr_hashed == 0) calls proc_cleanup_work(),
* pid_ns can not go away until proc_kill_sb() drops the reference.
*
* But this ns can also have other tasks injected by setns()+fork().
* Again, ignoring the user visible semantics we do not really need
* to wait until they are all reaped, but they can be reparented to
* us and thus we need to ensure that pid->child_reaper stays valid
* until they all go away. See free_pid()->wake_up_process().
*
* We rely on ignored SIGCHLD, an injected zombie must be autoreaped
* if reparented.
*/ */
for (;;) { for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE); set_current_state(TASK_UNINTERRUPTIBLE);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment