1. 14 Jan, 2017 8 commits
    • Peter Zijlstra's avatar
      locking/mutex: Fix mutex handoff · e274795e
      Peter Zijlstra authored
      While reviewing the ww_mutex patches, I noticed that it was still
      possible to (incorrectly) succeed for (incorrect) code like:
      
      	mutex_lock(&a);
      	mutex_lock(&a);
      
      This was possible if the second mutex_lock() would block (as expected)
      but then receive a spurious wakeup. At that point it would find itself
      at the front of the queue, request a handoff and instantly claim
      ownership and continue, since owner would point to itself.
      
      Avoid this scenario and simplify the code by introducing a third low
      bit to signal handoff pickup. So once we request handoff, unlock
      clears the handoff bit and sets the pickup bit along with the new
      owner.
      
      This also removes the need for the .handoff argument to
      __mutex_trylock(), since that becomes superfluous with PICKUP.
      
      In order to guarantee enough low bits, ensure task_struct alignment is
      at least L1_CACHE_BYTES (which seems a good ideal regardless).
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9d659ae1 ("locking/mutex: Add lock handoff to avoid starvation")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e274795e
    • Davidlohr Bueso's avatar
      locking/percpu-rwsem: Replace waitqueue with rcuwait · 52b94129
      Davidlohr Bueso authored
      The use of any kind of wait queue is an overkill for pcpu-rwsems.
      While one option would be to use the less heavy simple (swait)
      flavor, this is still too much for what pcpu-rwsems needs. For one,
      we do not care about any sort of queuing in that the only (rare) time
      writers (and readers, for that matter) are queued is when trying to
      acquire the regular contended rw_sem. There cannot be any further
      queuing as writers are serialized by the rw_sem in the first place.
      
      Given that percpu_down_write() must not be called after exit_notify(),
      we can replace the bulky waitqueue with rcuwait such that a writer
      can wait for its turn to take the lock. As such, we can avoid the
      queue handling and locking overhead.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1484148146-14210-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      52b94129
    • Davidlohr Bueso's avatar
      sched/wait, RCU: Introduce rcuwait machinery · 8f95c90c
      Davidlohr Bueso authored
      rcuwait provides support for (single) RCU-safe task wait/wake functionality,
      with the caveat that it must not be called after exit_notify(), such that
      we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
      being rcu unaware in this context -- for which we similarly have
      task_rcu_dereference() magic, but with different return semantics, which
      can conflict with the wakeup side.
      
      The interfaces are quite straightforward:
      
        rcuwait_wait_event()
        rcuwait_wake_up()
      
      More details are in the comments, but it's perhaps worth mentioning at least,
      that users must provide proper serialization when waiting on a condition, and
      avoid corrupting a concurrent waiter. Also care must be taken between the task
      and the condition for when calling the wakeup -- we cannot miss wakeups. When
      porting users, this is for example, a given when using waitqueues in that
      everything is done under the q->lock. As such, it can remove sources of non
      preemptable unbounded work for realtime.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1484148146-14210-2-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8f95c90c
    • Davidlohr Bueso's avatar
      sched/core: Remove set_task_state() · 642fa448
      Davidlohr Bueso authored
      This is a nasty interface and setting the state of a foreign task must
      not be done. As of the following commit:
      
        be628be0 ("bcache: Make gc wakeup sane, remove set_task_state()")
      
      ... everyone in the kernel calls set_task_state() with current, allowing
      the helper to be removed.
      
      However, as the comment indicates, it is still around for those archs
      where computing current is more expensive than using a pointer, at least
      in theory. An important arch that is affected is arm64, however this has
      been addressed now [1] and performance is up to par making no difference
      with either calls.
      
      Of all the callers, if any, it's the locking bits that would care most
      about this -- ie: we end up passing a tsk pointer to a lot of the lock
      slowpath, and setting ->state on that. The following numbers are based
      on two tests: a custom ad-hoc microbenchmark that just measures
      latencies (for ~65 million calls) between get_task_state() vs
      get_current_state().
      
      Secondly for a higher overview, an unlink microbenchmark was used,
      which pounds on a single file with open, close,unlink combos with
      increasing thread counts (up to 4x ncpus). While the workload is quite
      unrealistic, it does contend a lot on the inode mutex or now rwsem.
      
      [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com
      
      == 1. x86-64 ==
      
      Avg runtime set_task_state():    601 msecs
      Avg runtime set_current_state(): 552 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
      Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
      Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
      Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
      Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
      Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
      Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
      Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
      Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
      Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
      Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
      Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
      Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)
      
      == 2. ppc64le ==
      
      Avg runtime set_task_state():  938 msecs
      Avg runtime set_current_state: 940 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
      Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
      Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
      Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
      Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
      Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
      Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
      Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
      Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
      Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
      Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
      Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
      Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
      Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
      Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
      Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)
      
      x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
      and ppc64 (with paca) show similar improvements in the unlink microbenches.
      The small delta for ppc64 (2ms), does not represent the gains on the unlink
      runs. In the case of x86, there was a decent amount of variation in the
      latency runs, but always within a 20 to 50ms increase), ppc was more constant.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      642fa448
    • Davidlohr Bueso's avatar
      kernel/locking: Compute 'current' directly · d269a8b8
      Davidlohr Bueso authored
      This patch effectively replaces the tsk pointer dereference
      (which is obviously == current), to directly use get_current()
      macro. This is to make the removal of setting foreign task
      states smoother and painfully obvious. Performance win on some
      archs such as x86-64 and ppc64. On a microbenchmark that calls
      set_task_state() vs set_current_state() and an inode rwsem
      pounding benchmark doing unlink:
      
      == 1. x86-64 ==
      
      Avg runtime set_task_state():    601 msecs
      Avg runtime set_current_state(): 552 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
      Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
      Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
      Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
      Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
      Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
      Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
      Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
      Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
      Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
      Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
      Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
      Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)
      
      == 2. ppc64le ==
      
      Avg runtime set_task_state():  938 msecs
      Avg runtime set_current_state: 940 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
      Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
      Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
      Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
      Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
      Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
      Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
      Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
      Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
      Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
      Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
      Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
      Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
      Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
      Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
      Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-4-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d269a8b8
    • Davidlohr Bueso's avatar
      drivers/tty: Compute 'current' directly · 5376f2e7
      Davidlohr Bueso authored
      This patch effectively replaces the tsk pointer dereference
      (which is obviously == current), to directly use get_current()
      macro. This is to make the removal of setting foreign task
      states smoother and painfully obvious. Performance win on some
      archs such as x86-64 and ppc64 -- arm64 is no longer an issue.
      Signed-off-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5376f2e7
    • Davidlohr Bueso's avatar
      kernel/exit: Compute 'current' directly · 0039962a
      Davidlohr Bueso authored
      This patch effectively replaces the tsk pointer dereference (which is
      obviously == current), to directly use get_current() macro. In this
      case, do_exit() always passes current to exit_mm(), hence we can
      simply get rid of the argument. This is also a performance win on some
      archs such as x86-64 and ppc64 -- arm64 is no longer an issue.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-2-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0039962a
    • Waiman Long's avatar
      locking/spinlocks/x86, paravirt: Remove paravirt_ticketlocks_enabled · aef591cd
      Waiman Long authored
      This is a follow-up of commit:
      
        cfd8983f ("x86, locking/spinlocks: Remove ticket (spin)lock implementation")
      
      The static_key structure 'paravirt_ticketlocks_enabled' is now removed as it is
      no longer used.
      
      As a result, the init functions kvm_spinlock_init_jump() and
      xen_init_spinlocks_jump() are also removed.
      
      A simple build and boot test was done to verify it.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1484252878-1962-1-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aef591cd
  2. 12 Jan, 2017 3 commits
  3. 11 Jan, 2017 29 commits