1. 01 Nov, 2012 1 commit
    • Thomas Gleixner's avatar
      futex: Handle futex_pi OWNER_DIED take over correctly · 59fa6245
      Thomas Gleixner authored
      Siddhesh analyzed a failure in the take over of pi futexes in case the
      owner died and provided a workaround.
      See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076
      
      The detailed problem analysis shows:
      
      Futex F is initialized with PTHREAD_PRIO_INHERIT and
      PTHREAD_MUTEX_ROBUST_NP attributes.
      
      T1 lock_futex_pi(F);
      
      T2 lock_futex_pi(F);
         --> T2 blocks on the futex and creates pi_state which is associated
             to T1.
      
      T1 exits
         --> exit_robust_list() runs
             --> Futex F userspace value TID field is set to 0 and
                 FUTEX_OWNER_DIED bit is set.
      
      T3 lock_futex_pi(F);
         --> Succeeds due to the check for F's userspace TID field == 0
         --> Claims ownership of the futex and sets its own TID into the
             userspace TID field of futex F
         --> returns to user space
      
      T1 --> exit_pi_state_list()
             --> Transfers pi_state to waiter T2 and wakes T2 via
             	   rt_mutex_unlock(&pi_state->mutex)
      
      T2 --> acquires pi_state->mutex and gains real ownership of the
             pi_state
         --> Claims ownership of the futex and sets its own TID into the
             userspace TID field of futex F
         --> returns to user space
      
      T3 --> observes inconsistent state
      
      This problem is independent of UP/SMP, preemptible/non preemptible
      kernels, or process shared vs. private. The only difference is that
      certain configurations are more likely to expose it.
      
      So as Siddhesh correctly analyzed the following check in
      futex_lock_pi_atomic() is the culprit:
      
      	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
      
      We check the userspace value for a TID value of 0 and take over the
      futex unconditionally if that's true.
      
      AFAICT this check is there as it is correct for a different corner
      case of futexes: the WAITERS bit became stale.
      
      Now the proposed change
      
      -	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
      +       if (unlikely(ownerdied ||
      +                       !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) {
      
      solves the problem, but it's not obvious why and it wreckages the
      "stale WAITERS bit" case.
      
      What happens is, that due to the WAITERS bit being set (T2 is blocked
      on that futex) it enforces T3 to go through lookup_pi_state(), which
      in the above case returns an existing pi_state and therefor forces T3
      to legitimately fight with T2 over the ownership of the pi_state (via
      pi_state->mutex). Probelm solved!
      
      Though that does not work for the "WAITERS bit is stale" problem
      because if lookup_pi_state() does not find existing pi_state it
      returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
      return -ESRCH to user space because the OWNER_DIED bit is not set.
      
      Now there is a different solution to that problem. Do not look at the
      user space value at all and enforce a lookup of possibly available
      pi_state. If pi_state can be found, then the new incoming locker T3
      blocks on that pi_state and legitimately races with T2 to acquire the
      rt_mutex and the pi_state and therefor the proper ownership of the
      user space futex.
      
      lookup_pi_state() has the correct order of checks. It first tries to
      find a pi_state associated with the user space futex and only if that
      fails it checks for futex TID value = 0. If no pi_state is available
      nothing can create new state at that point because this happens with
      the hash bucket lock held.
      
      So the above scenario changes to:
      
      T1 lock_futex_pi(F);
      
      T2 lock_futex_pi(F);
         --> T2 blocks on the futex and creates pi_state which is associated
             to T1.
      
      T1 exits
         --> exit_robust_list() runs
             --> Futex F userspace value TID field is set to 0 and
                 FUTEX_OWNER_DIED bit is set.
      
      T3 lock_futex_pi(F);
         --> Finds pi_state and blocks on pi_state->rt_mutex
      
      T1 --> exit_pi_state_list()
             --> Transfers pi_state to waiter T2 and wakes it via
             	   rt_mutex_unlock(&pi_state->mutex)
      
      T2 --> acquires pi_state->mutex and gains ownership of the pi_state
         --> Claims ownership of the futex and sets its own TID into the
             userspace TID field of futex F
         --> returns to user space
      
      This covers all gazillion points on which T3 might come in between
      T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
      also solves the "WAITERS bit stale" problem by forcing the take over.
      
      Another benefit of changing the code this way is that it makes it less
      dependent on untrusted user space values and therefor minimizes the
      possible wreckage which might be inflicted.
      
      As usual after staring for too long at the futex code my brain hurts
      so much that I really want to ditch that whole optimization of
      avoiding the syscall for the non contended case for PI futexes and rip
      out the maze of corner case handling code. Unfortunately we can't as
      user space relies on that existing behaviour, but at least thinking
      about it helps me to preserve my mental sanity. Maybe we should
      nevertheless :)
      Reported-and-tested-by: default avatarSiddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionosAcked-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      59fa6245
  2. 31 Oct, 2012 20 commits
  3. 30 Oct, 2012 19 commits