• Waiman Long's avatar
    locking/rwsem: Remove arch specific rwsem files · 46ad0840
    Waiman Long authored
    As the generic rwsem-xadd code is using the appropriate acquire and
    release versions of the atomic operations, the arch specific rwsem.h
    files will not be that much faster than the generic code as long as the
    atomic functions are properly implemented. So we can remove those arch
    specific rwsem.h and stop building asm/rwsem.h to reduce maintenance
    effort.
    
    Currently, only x86, alpha and ia64 have implemented architecture
    specific fast paths. I don't have access to alpha and ia64 systems for
    testing, but they are legacy systems that are not likely to be updated
    to the latest kernel anyway.
    
    By using a rwsem microbenchmark, the total locking rates on a 4-socket
    56-core 112-thread x86-64 system before and after the patch were as
    follows (mixed means equal # of read and write locks):
    
                          Before Patch              After Patch
       # of Threads  wlock   rlock   mixed     wlock   rlock   mixed
       ------------  -----   -----   -----     -----   -----   -----
            1        29,201  30,143  29,458    28,615  30,172  29,201
            2         6,807  13,299   1,171     7,725  15,025   1,804
            4         6,504  12,755   1,520     7,127  14,286   1,345
            8         6,762  13,412     764     6,826  13,652     726
           16         6,693  15,408     662     6,599  15,938     626
           32         6,145  15,286     496     5,549  15,487     511
           64         5,812  15,495      60     5,858  15,572      60
    
    There were some run-to-run variations for the multi-thread tests. For
    x86-64, using the generic C code fast path seems to be a little bit
    faster than the assembly version with low lock contention.  Looking at
    the assembly version of the fast paths, there are assembly to/from C
    code wrappers that save and restore all the callee-clobbered registers
    (7 registers on x86-64). The assembly generated from the generic C
    code doesn't need to do that. That may explain the slight performance
    gain here.
    
    The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h
    with no code change as no other code other than those under
    kernel/locking needs to access the internal rwsem macros and functions.
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-c6x-dev@linux-c6x.org
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linux-riscv@lists.infradead.org
    Cc: linux-um@lists.infradead.org
    Cc: linux-xtensa@linux-xtensa.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: nios2-dev@lists.rocketboards.org
    Cc: openrisc@lists.librecores.org
    Cc: uclinux-h8-devel@lists.sourceforge.jp
    Link: https://lkml.kernel.org/r/20190322143008.21313-2-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    46ad0840
percpu-rwsem.c 5.14 KB