• Sebastian Andrzej Siewior's avatar
    futex: Prevent the reuse of stale pi_state · e626cb02
    Sebastian Andrzej Siewior authored
    Jiri Slaby reported a futex state inconsistency resulting in -EINVAL during
    a lock operation for a PI futex. It requires that the a lock process is
    interrupted by a timeout or signal:
    
      T1 Owns the futex in user space.
    
      T2 Tries to acquire the futex in kernel (futex_lock_pi()). Allocates a
         pi_state and attaches itself to it.
    
      T2 Times out and removes its rt_waiter from the rt_mutex. Drops the
         rtmutex lock and tries to acquire the hash bucket lock to remove
         the futex_q. The lock is contended and T2 schedules out.
    
      T1 Unlocks the futex (futex_unlock_pi()). Finds a futex_q but no
         rt_waiter. Unlocks the futex (do_uncontended) and makes it available
         to user space.
    
      T3 Acquires the futex in user space.
    
      T4 Tries to acquire the futex in kernel (futex_lock_pi()). Finds the
         existing futex_q of T2 and tries to attach itself to the existing
         pi_state.  This (attach_to_pi_state()) fails with -EINVAL because uval
         contains the TID of T3 but pi_state points to T1.
    
    It's incorrect to unlock the futex and make it available for user space to
    acquire as long as there is still an existing state attached to it in the
    kernel.
    
    T1 cannot hand over the futex to T2 because T2 already gave up and started
    to clean up and is blocked on the hash bucket lock, so T2's futex_q with
    the pi_state pointing to T1 is still queued.
    
    T2 observes the futex_q, but ignores it as there is no waiter on the
    corresponding rt_mutex and takes the uncontended path which allows the
    subsequent caller of futex_lock_pi() (T4) to observe that stale state.
    
    To prevent this the unlock path must dequeue all futex_q entries which
    point to the same pi_state when there is no waiter on the rt mutex. This
    requires obviously to make the dequeue conditional in the locking path to
    prevent a double dequeue. With that it's guaranteed that user space cannot
    observe an uncontended futex which has kernel state attached.
    
    Fixes: fbeb558b ("futex/pi: Fix recursive rt_mutex waiter state")
    Reported-by: default avatarJiri Slaby <jirislaby@kernel.org>
    Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarJiri Slaby <jirislaby@kernel.org>
    Link: https://lore.kernel.org/r/20240118115451.0TkD_ZhB@linutronix.de
    Closes: https://lore.kernel.org/all/4611bcf2-44d0-4c34-9b84-17406f881003@kernel.org
    e626cb02
core.c 32.4 KB