• Linus Torvalds's avatar
    Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e1928328
    Linus Torvalds authored
    Pull locking updates from Ingo Molnar:
     "The main changes in this cycle are:
    
       - rwsem scalability improvements, phase #2, by Waiman Long, which are
         rather impressive:
    
           "On a 2-socket 40-core 80-thread Skylake system with 40 reader
            and writer locking threads, the min/mean/max locking operations
            done in a 5-second testing window before the patchset were:
    
             40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
             40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255
    
            After the patchset, they became:
    
             40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
             40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"
    
         There's a lot of changes to the locking implementation that makes
         it similar to qrwlock, including owner handoff for more fair
         locking.
    
         Another microbenchmark shows how across the spectrum the
         improvements are:
    
           "With a locking microbenchmark running on 5.1 based kernel, the
            total locking rates (in kops/s) on a 2-socket Skylake system
            with equal numbers of readers and writers (mixed) before and
            after this patchset were:
    
            # of Threads   Before Patch      After Patch
            ------------   ------------      -----------
                 2            2,618             4,193
                 4            1,202             3,726
                 8              802             3,622
                16              729             3,359
                32              319             2,826
                64              102             2,744"
    
         The changes are extensive and the patch-set has been through
         several iterations addressing various locking workloads. There
         might be more regressions, but unless they are pathological I
         believe we want to use this new implementation as the baseline
         going forward.
    
       - jump-label optimizations by Daniel Bristot de Oliveira: the primary
         motivation was to remove IPI disturbance of isolated RT-workload
         CPUs, which resulted in the implementation of batched jump-label
         updates. Beyond the improvement of the real-time characteristics
         kernel, in one test this patchset improved static key update
         overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
         as well.
    
       - atomic64_t cross-arch type cleanups by Mark Rutland: over the last
         ~10 years of atomic64_t existence the various types used by the
         APIs only had to be self-consistent within each architecture -
         which means they became wildly inconsistent across architectures.
         Mark puts and end to this by reworking all the atomic64
         implementations to use 's64' as the base type for atomic64_t, and
         to ensure that this type is consistently used for parameters and
         return values in the API, avoiding further problems in this area.
    
       - A large set of small improvements to lockdep by Yuyang Du: type
         cleanups, output cleanups, function return type and othr cleanups
         all around the place.
    
       - A set of percpu ops cleanups and fixes by Peter Zijlstra.
    
       - Misc other changes - please see the Git log for more details"
    
    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
      locking/lockdep: increase size of counters for lockdep statistics
      locking/atomics: Use sed(1) instead of non-standard head(1) option
      locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
      x86/jump_label: Make tp_vec_nr static
      x86/percpu: Optimize raw_cpu_xchg()
      x86/percpu, sched/fair: Avoid local_clock()
      x86/percpu, x86/irq: Relax {set,get}_irq_regs()
      x86/percpu: Relax smp_processor_id()
      x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
      locking/rwsem: Guard against making count negative
      locking/rwsem: Adaptive disabling of reader optimistic spinning
      locking/rwsem: Enable time-based spinning on reader-owned rwsem
      locking/rwsem: Make rwsem->owner an atomic_long_t
      locking/rwsem: Enable readers spinning on writer
      locking/rwsem: Clarify usage of owner's nonspinaable bit
      locking/rwsem: Wake up almost all readers in wait queue
      locking/rwsem: More optimal RT task handling of null owner
      locking/rwsem: Always release wait_lock before waking up tasks
      locking/rwsem: Implement lock handoff to prevent lock starvation
      locking/rwsem: Make rwsem_spin_on_owner() return owner state
      ...
    e1928328
atomic.h 13.1 KB