• Waiman Long's avatar
    locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures · 34d54f3d
    Waiman Long authored
    All the locking related cmpxchg's in the following functions are
    replaced with the _acquire variants:
    
     - pv_queued_spin_steal_lock()
     - trylock_clear_pending()
    
    This change should help performance on architectures that use LL/SC.
    
    The cmpxchg in pv_kick_node() is replaced with a relaxed version
    with explicit memory barrier to make sure that it is fully ordered
    in the writing of next->lock and the reading of pn->state whether
    the cmpxchg is a success or failure without affecting performance in
    non-LL/SC architectures.
    
    On a 2-socket 12-core 96-thread Power8 system with pvqspinlock
    explicitly enabled, the performance of a locking microbenchmark
    with and without this patch on a 4.13-rc4 kernel with Xinhui's PPC
    qspinlock patch were as follows:
    
      # of thread     w/o patch    with patch      % Change
      -----------     ---------    ----------      --------
           8         5054.8 Mop/s  5209.4 Mop/s     +3.1%
          16         3985.0 Mop/s  4015.0 Mop/s     +0.8%
          32         2378.2 Mop/s  2396.0 Mop/s     +0.7%
    Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andrea Parri <parri.andrea@gmail.com>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will.deacon@arm.com>
    Link: http://lkml.kernel.org/r/1502741222-24360-1-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    34d54f3d
qspinlock_paravirt.h 15.6 KB