1. 04 Nov, 2015 2 commits
    • Linus Torvalds's avatar
      atomic: remove all traces of READ_ONCE_CTRL() and atomic*_read_ctrl() · 105ff3cb
      Linus Torvalds authored
      This seems to be a mis-reading of how alpha memory ordering works, and
      is not backed up by the alpha architecture manual.  The helper functions
      don't do anything special on any other architectures, and the arguments
      that support them being safe on other architectures also argue that they
      are safe on alpha.
      
      Basically, the "control dependency" is between a previous read and a
      subsequent write that is dependent on the value read.  Even if the
      subsequent write is actually done speculatively, there is no way that
      such a speculative write could be made visible to other cpu's until it
      has been committed, which requires validating the speculation.
      
      Note that most weakely ordered architectures (very much including alpha)
      do not guarantee any ordering relationship between two loads that depend
      on each other on a control dependency:
      
          read A
          if (val == 1)
              read B
      
      because the conditional may be predicted, and the "read B" may be
      speculatively moved up to before reading the value A.  So we require the
      user to insert a smp_rmb() between the two accesses to be correct:
      
          read A;
          if (A == 1)
              smp_rmb()
              read B
      
      Alpha is further special in that it can break that ordering even if the
      *address* of B depends on the read of A, because the cacheline that is
      read later may be stale unless you have a memory barrier in between the
      pointer read and the read of the value behind a pointer:
      
          read ptr
          read offset(ptr)
      
      whereas all other weakly ordered architectures guarantee that the data
      dependency (as opposed to just a control dependency) will order the two
      accesses.  As a result, alpha needs a "smp_read_barrier_depends()" in
      between those two reads for them to be ordered.
      
      The coontrol dependency that "READ_ONCE_CTRL()" and "atomic_read_ctrl()"
      had was a control dependency to a subsequent *write*, however, and
      nobody can finalize such a subsequent write without having actually done
      the read.  And were you to write such a value to a "stale" cacheline
      (the way the unordered reads came to be), that would seem to lose the
      write entirely.
      
      So the things that make alpha able to re-order reads even more
      aggressively than other weak architectures do not seem to be relevant
      for a subsequent write.  Alpha memory ordering may be strange, but
      there's no real indication that it is *that* strange.
      
      Also, the alpha architecture reference manual very explicitly talks
      about the definition of "Dependence Constraints" in section 5.6.1.7,
      where a preceding read dominates a subsequent write.
      
      Such a dependence constraint admittedly does not impose a BEFORE (alpha
      architecture term for globally visible ordering), but it does guarantee
      that there can be no "causal loop".  I don't see how you could avoid
      such a loop if another cpu could see the stored value and then impact
      the value of the first read.  Put another way: the read and the write
      could not be seen as being out of order wrt other cpus.
      
      So I do not see how these "x_ctrl()" functions can currently be necessary.
      
      I may have to eat my words at some point, but in the absense of clear
      proof that alpha actually needs this, or indeed even an explanation of
      how alpha could _possibly_ need it, I do not believe these functions are
      called for.
      
      And if it turns out that alpha really _does_ need a barrier for this
      case, that barrier still should not be "smp_read_barrier_depends()".
      We'd have to make up some new speciality barrier just for alpha, along
      with the documentation for why it really is necessary.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul E McKenney <paulmck@us.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      105ff3cb
    • Linus Torvalds's avatar
      Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d63a9788
      Linus Torvalds authored
      Pull locking changes from Ingo Molnar:
       "The main changes in this cycle were:
      
         - More gradual enhancements to atomic ops: new atomic*_read_ctrl()
           ops, synchronize atomic_{read,set}() ordering requirements between
           architectures, add atomic_long_t bitops.  (Peter Zijlstra)
      
         - Add _{relaxed|acquire|release}() variants for inc/dec atomics and
           use them in various locking primitives: mutex, rtmutex, mcs, rwsem.
           This enables weakly ordered architectures (such as arm64) to make
           use of more locking related optimizations.  (Davidlohr Bueso)
      
         - Implement atomic[64]_{inc,dec}_relaxed() on ARM.  (Will Deacon)
      
         - Futex kernel data cache footprint micro-optimization.  (Rasmus
           Villemoes)
      
         - pvqspinlock runtime overhead micro-optimization.  (Waiman Long)
      
         - misc smaller fixlets"
      
      * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec}
        locking/rwsem: Use acquire/release semantics
        locking/mcs: Use acquire/release semantics
        locking/rtmutex: Use acquire/release semantics
        locking/mutex: Use acquire/release semantics
        locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics
        atomic: Implement atomic_read_ctrl()
        atomic, arch: Audit atomic_{read,set}()
        atomic: Add atomic_long_t bitops
        futex: Force hot variables into a single cache line
        locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL
        locking/osq: Relax atomic semantics
        locking/qrwlock: Rename ->lock to ->wait_lock
        locking/Documentation/lockstat: Fix typo - lokcing -> locking
        locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h
      d63a9788
  2. 03 Nov, 2015 38 commits