Commit 778e9a9c authored by Alexey Kuznetsov's avatar Alexey Kuznetsov Committed by Linus Torvalds

pi-futex: fix exit races and locking problems

1. New entries can be added to tsk->pi_state_list after task completed
   exit_pi_state_list(). The result is memory leakage and deadlocks.

2. handle_mm_fault() is called under spinlock. The result is obvious.

3. results in self-inflicted deadlock inside glibc.
   Sometimes futex_lock_pi returns -ESRCH, when it is not expected
   and glibc enters to for(;;) sleep() to simulate deadlock. This problem
   is quite obvious and I think the patch is right. Though it looks like
   each "if" in futex_lock_pi() got some stupid special case "else if". :-)

4. sometimes futex_lock_pi() returns -EDEADLK,
   when nobody has the lock. The reason is also obvious (see comment
   in the patch), but correct fix is far beyond my comprehension.
   I guess someone already saw this, the chunk:

                        if (rt_mutex_trylock(&q.pi_state->pi_mutex))
                                ret = 0;

   is obviously from the same opera. But it does not work, because the
   rtmutex is really taken at this point: wake_futex_pi() of previous
   owner reassigned it to us. My fix works. But it looks very stupid.
   I would think about removal of shift of ownership in wake_futex_pi()
   and making all the work in context of process taking lock.

From: Thomas Gleixner <tglx@linutronix.de>

Fix 1) Avoid the tasklist lock variant of the exit race fix by adding
    an additional state transition to the exit code.

    This fixes also the issue, when a task with recursive segfaults
    is not able to release the futexes.

Fix 2) Cleanup the lookup_pi_state() failure path and solve the -ESRCH
    problem finally.

Fix 3) Solve the fixup_pi_state_owner() problem which needs to do the fixup
    in the lock protected section by using the in_atomic userspace access
    functions.

    This removes also the ugly lock drop / unqueue inside of fixup_pi_state()

Fix 4) Fix a stale lock in the error path of futex_wake_pi()

Added some error checks for verification.

The -EDEADLK problem is solved by the rtmutex fixups.
Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
Acked-by: default avatarIngo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 1a539a87
...@@ -1162,6 +1162,7 @@ static inline void put_task_struct(struct task_struct *t) ...@@ -1162,6 +1162,7 @@ static inline void put_task_struct(struct task_struct *t)
/* Not implemented yet, only for 486*/ /* Not implemented yet, only for 486*/
#define PF_STARTING 0x00000002 /* being created */ #define PF_STARTING 0x00000002 /* being created */
#define PF_EXITING 0x00000004 /* getting shut down */ #define PF_EXITING 0x00000004 /* getting shut down */
#define PF_EXITPIDONE 0x00000008 /* pi exit done on shut down */
#define PF_FORKNOEXEC 0x00000040 /* forked but didn't exec */ #define PF_FORKNOEXEC 0x00000040 /* forked but didn't exec */
#define PF_SUPERPRIV 0x00000100 /* used super-user privileges */ #define PF_SUPERPRIV 0x00000100 /* used super-user privileges */
#define PF_DUMPCORE 0x00000200 /* dumped core */ #define PF_DUMPCORE 0x00000200 /* dumped core */
......
...@@ -892,13 +892,29 @@ fastcall NORET_TYPE void do_exit(long code) ...@@ -892,13 +892,29 @@ fastcall NORET_TYPE void do_exit(long code)
if (unlikely(tsk->flags & PF_EXITING)) { if (unlikely(tsk->flags & PF_EXITING)) {
printk(KERN_ALERT printk(KERN_ALERT
"Fixing recursive fault but reboot is needed!\n"); "Fixing recursive fault but reboot is needed!\n");
/*
* We can do this unlocked here. The futex code uses
* this flag just to verify whether the pi state
* cleanup has been done or not. In the worst case it
* loops once more. We pretend that the cleanup was
* done as there is no way to return. Either the
* OWNER_DIED bit is set by now or we push the blocked
* task into the wait for ever nirwana as well.
*/
tsk->flags |= PF_EXITPIDONE;
if (tsk->io_context) if (tsk->io_context)
exit_io_context(); exit_io_context();
set_current_state(TASK_UNINTERRUPTIBLE); set_current_state(TASK_UNINTERRUPTIBLE);
schedule(); schedule();
} }
/*
* tsk->flags are checked in the futex code to protect against
* an exiting task cleaning up the robust pi futexes.
*/
spin_lock_irq(&tsk->pi_lock);
tsk->flags |= PF_EXITING; tsk->flags |= PF_EXITING;
spin_unlock_irq(&tsk->pi_lock);
if (unlikely(in_atomic())) if (unlikely(in_atomic()))
printk(KERN_INFO "note: %s[%d] exited with preempt_count %d\n", printk(KERN_INFO "note: %s[%d] exited with preempt_count %d\n",
...@@ -965,6 +981,12 @@ fastcall NORET_TYPE void do_exit(long code) ...@@ -965,6 +981,12 @@ fastcall NORET_TYPE void do_exit(long code)
* Make sure we are holding no locks: * Make sure we are holding no locks:
*/ */
debug_check_no_locks_held(tsk); debug_check_no_locks_held(tsk);
/*
* We can do this unlocked here. The futex code uses this flag
* just to verify whether the pi state cleanup has been done
* or not. In the worst case it loops once more.
*/
tsk->flags |= PF_EXITPIDONE;
if (tsk->io_context) if (tsk->io_context)
exit_io_context(); exit_io_context();
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment