• Yang Tao's avatar
    futex: Prevent robust futex exit race · ca16d5be
    Yang Tao authored
    Robust futexes utilize the robust_list mechanism to allow the kernel to
    release futexes which are held when a task exits. The exit can be voluntary
    or caused by a signal or fault. This prevents that waiters block forever.
    
    The futex operations in user space store a pointer to the futex they are
    either locking or unlocking in the op_pending member of the per task robust
    list.
    
    After a lock operation has succeeded the futex is queued in the robust list
    linked list and the op_pending pointer is cleared.
    
    After an unlock operation has succeeded the futex is removed from the
    robust list linked list and the op_pending pointer is cleared.
    
    The robust list exit code checks for the pending operation and any futex
    which is queued in the linked list. It carefully checks whether the futex
    value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and
    tries to wake up a potential waiter.
    
    This is race free for the lock operation but unlock has two race scenarios
    where waiters might not be woken up. These issues can be observed with
    regular robust pthread mutexes. PI aware pthread mutexes are not affected.
    
    (1) Unlocking task is killed after unlocking the futex value in user space
        before being able to wake a waiter.
    
            pthread_mutex_unlock()
                    |
                    V
            atomic_exchange_rel (&mutex->__data.__lock, 0)
                            <------------------------killed
                lll_futex_wake ()                   |
                                                    |
                                                    |(__lock = 0)
                                                    |(enter kernel)
                                                    |
                                                    V
                                                do_exit()
                                                exit_mm()
                                              mm_release()
                                            exit_robust_list()
                                            handle_futex_death()
                                                    |
                                                    |(__lock = 0)
                                                    |(uval = 0)
                                                    |
                                                    V
            if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
                    return 0;
    
        The sanity check which ensures that the user space futex is owned by
        the exiting task prevents the wakeup of waiters which in consequence
        block infinitely.
    
    (2) Waiting task is killed after a wakeup and before it can acquire the
        futex in user space.
    
            OWNER                         WAITER
    				futex_wait()      		
       pthread_mutex_unlock()               |
                    |                       |
                    |(__lock = 0)           |
                    |                       |
                    V                       |
             futex_wake() ------------>  wakeup()
                                            |
                                            |(return to userspace)
                                            |(__lock = 0)
                                            |
                                            V
                            oldval = mutex->__data.__lock
                                              <-----------------killed
        atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,  |
                            id | assume_other_futex_waiters, 0)      |
                                                                     |
                                                                     |
                                                       (enter kernel)|
                                                                     |
                                                                     V
                                                             do_exit()
                                                            |
                                                            |
                                                            V
                                            handle_futex_death()
                                            |
                                            |(__lock = 0)
                                            |(uval = 0)
                                            |
                                            V
            if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
                    return 0;
    
        The sanity check which ensures that the user space futex is owned
        by the exiting task prevents the wakeup of waiters, which seems to
        be correct as the exiting task does not own the futex value, but
        the consequence is that other waiters wont be woken up and block
        infinitely.
    
    In both scenarios the following conditions are true:
    
       - task->robust_list->list_op_pending != NULL
       - user space futex value == 0
       - Regular futex (not PI)
    
    If these conditions are met then it is reasonably safe to wake up a
    potential waiter in order to prevent the above problems.
    
    As this might be a false positive it can cause spurious wakeups, but the
    waiter side has to handle other types of unrelated wakeups, e.g. signals
    gracefully anyway. So such a spurious wakeup will not affect the
    correctness of these operations.
    
    This workaround must not touch the user space futex value and cannot set
    the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting
    OWNER_DIED in this case would result in inconsistent state and subsequently
    in malfunction of the owner died handling in user space.
    
    The rest of the user space state is still consistent as no other task can
    observe the list_op_pending entry in the exiting tasks robust list.
    
    The eventually woken up waiter will observe the uncontended lock value and
    take it over.
    
    [ tglx: Massaged changelog and comment. Made the return explicit and not
      	depend on the subsequent check and added constants to hand into
      	handle_futex_death() instead of plain numbers. Fixed a few coding
    	style issues. ]
    
    Fixes: 0771dfef ("[PATCH] lightweight robust futexes: core")
    Signed-off-by: default avatarYang Tao <yang.tao172@zte.com.cn>
    Signed-off-by: default avatarYi Wang <wang.yi59@zte.com.cn>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1573010582-35297-1-git-send-email-wang.yi59@zte.com.cn
    Link: https://lkml.kernel.org/r/20191106224555.943191378@linutronix.de
    ca16d5be
futex.c 106 KB