1. 03 Jun, 2016 5 commits
    • Jason Low's avatar
      locking/rwsem: Optimize write lock by reducing operations in slowpath · c0fcb6c2
      Jason Low authored
      When acquiring the rwsem write lock in the slowpath, we first try
      to set count to RWSEM_WAITING_BIAS. When that is successful,
      we then atomically add the RWSEM_WAITING_BIAS in cases where
      there are other tasks on the wait list. This causes write lock
      operations to often issue multiple atomic operations.
      
      We can instead make the list_is_singular() check first, and then
      set the count accordingly, so that we issue at most 1 atomic
      operation when acquiring the write lock and reduce unnecessary
      cacheline contention.
      Signed-off-by: default avatarJason Low <jason.low2@hpe.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: Waiman Long<Waiman.Long@hpe.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Terry Rudd <terry.rudd@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1463445486-16078-2-git-send-email-jason.low2@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c0fcb6c2
    • Davidlohr Bueso's avatar
      locking/rwsem: Rework zeroing reader waiter->task · e3851390
      Davidlohr Bueso authored
      Readers that are awoken will expect a nil ->task indicating
      that a wakeup has occurred. Because of the way readers are
      implemented, there's a small chance that the waiter will never
      block in the slowpath (rwsem_down_read_failed), and therefore
      requires some form of reference counting to avoid the following
      scenario:
      
      rwsem_down_read_failed()		rwsem_wake()
        get_task_struct();
        spin_lock_irq(&wait_lock);
        list_add_tail(&waiter.list)
        spin_unlock_irq(&wait_lock);
      					  raw_spin_lock_irqsave(&wait_lock)
      					  __rwsem_do_wake()
        while (1) {
          set_task_state(TASK_UNINTERRUPTIBLE);
      					    waiter->task = NULL
          if (!waiter.task) // true
            break;
          schedule() // never reached
      
         __set_task_state(TASK_RUNNING);
       do_exit();
      					    wake_up_process(tsk); // boom
      
      ... and therefore race with do_exit() when the caller returns.
      
      There is also a mismatch between the smp_mb() and its documentation,
      in that the serialization is done between reading the task and the
      nil store. Furthermore, in addition to having the overlapping of
      loads and stores to waiter->task guaranteed to be ordered within
      that CPU, both wake_up_process() originally and now wake_q_add()
      already imply barriers upon successful calls, which serves the
      comment.
      
      Now, as an alternative to perhaps inverting the checks in the blocker
      side (which has its own penalty in that schedule is unavoidable),
      with lockless wakeups this situation is naturally addressed and we
      can just use the refcount held by wake_q_add(), instead doing so
      explicitly. Of course, we must guarantee that the nil store is done
      as the _last_ operation in that the task must already be marked for
      deletion to not fall into the race above. Spurious wakeups are also
      handled transparently in that the task's reference is only removed
      when wake_up_q() is actually called _after_ the nil store.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hpe.com
      Cc: dave@stgolabs.net
      Cc: jason.low2@hp.com
      Cc: peter@hurleysoftware.com
      Link: http://lkml.kernel.org/r/1463165787-25937-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e3851390
    • Davidlohr Bueso's avatar
      locking/rwsem: Enable lockless waiter wakeup(s) · 133e89ef
      Davidlohr Bueso authored
      As wake_qs gain users, we can teach rwsems about them such that
      waiters can be awoken without the wait_lock. This is for both
      readers and writer, the former being the most ideal candidate
      as we can batch the wakeups shortening the critical region that
      much more -- ie writer task blocking a bunch of tasks waiting to
      service page-faults (mmap_sem readers).
      
      In general applying wake_qs to rwsem (xadd) is not difficult as
      the wait_lock is intended to be released soon _anyways_, with
      the exception of when a writer slowpath will proactively wakeup
      any queued readers if it sees that the lock is owned by a reader,
      in which we simply do the wakeups with the lock held (see comment
      in __rwsem_down_write_failed_common()).
      
      Similar to other locking primitives, delaying the waiter being
      awoken does allow, at least in theory, the lock to be stolen in
      the case of writers, however no harm was seen in this (in fact
      lock stealing tends to be a _good_ thing in most workloads), and
      this is a tiny window anyways.
      
      Some page-fault (pft) and mmap_sem intensive benchmarks show some
      pretty constant reduction in systime (by up to ~8 and ~10%) on a
      2-socket, 12 core AMD box. In addition, on an 8-core Westmere doing
      page allocations (page_test)
      
      aim9:
      	 4.6-rc6				4.6-rc6
      						rwsemv2
      Min      page_test   378167.89 (  0.00%)   382613.33 (  1.18%)
      Min      exec_test      499.00 (  0.00%)      502.67 (  0.74%)
      Min      fork_test     3395.47 (  0.00%)     3537.64 (  4.19%)
      Hmean    page_test   395433.06 (  0.00%)   414693.68 (  4.87%)
      Hmean    exec_test      499.67 (  0.00%)      505.30 (  1.13%)
      Hmean    fork_test     3504.22 (  0.00%)     3594.95 (  2.59%)
      Stddev   page_test    17426.57 (  0.00%)    26649.92 (-52.93%)
      Stddev   exec_test        0.47 (  0.00%)        1.41 (-199.05%)
      Stddev   fork_test       63.74 (  0.00%)       32.59 ( 48.86%)
      Max      page_test   429873.33 (  0.00%)   456960.00 (  6.30%)
      Max      exec_test      500.33 (  0.00%)      507.66 (  1.47%)
      Max      fork_test     3653.33 (  0.00%)     3650.90 ( -0.07%)
      
      	     4.6-rc6     4.6-rc6
      			 rwsemv2
      User            1.12        0.04
      System          0.23        0.04
      Elapsed       727.27      721.98
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hpe.com
      Cc: dave@stgolabs.net
      Cc: jason.low2@hp.com
      Cc: peter@hurleysoftware.com
      Link: http://lkml.kernel.org/r/1463165787-25937-2-git-send-email-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      133e89ef
    • Chris Wilson's avatar
      locking/ww_mutex: Report recursive ww_mutex locking early · 0422e83d
      Chris Wilson authored
      Recursive locking for ww_mutexes was originally conceived as an
      exception. However, it is heavily used by the DRM atomic modesetting
      code. Currently, the recursive deadlock is checked after we have queued
      up for a busy-spin and as we never release the lock, we spin until
      kicked, whereupon the deadlock is discovered and reported.
      
      A simple solution for the now common problem is to move the recursive
      deadlock discovery to the first action when taking the ww_mutex.
      Suggested-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0422e83d
    • Peter Zijlstra's avatar
      locking/seqcount: Re-fix raw_read_seqcount_latch() · 55eed755
      Peter Zijlstra authored
      Commit 50755bc1 ("seqlock: fix raw_read_seqcount_latch()") broke
      raw_read_seqcount_latch().
      
      If you look at the comment that was modified; the thing that changes is
      the seq count, not the latch pointer.
      
       * void latch_modify(struct latch_struct *latch, ...)
       * {
       *	smp_wmb();	<- Ensure that the last data[1] update is visible
       *	latch->seq++;
       *	smp_wmb();	<- Ensure that the seqcount update is visible
       *
       *	modify(latch->data[0], ...);
       *
       *	smp_wmb();	<- Ensure that the data[0] update is visible
       *	latch->seq++;
       *	smp_wmb();	<- Ensure that the seqcount update is visible
       *
       *	modify(latch->data[1], ...);
       * }
       *
       * The query will have a form like:
       *
       * struct entry *latch_query(struct latch_struct *latch, ...)
       * {
       *	struct entry *entry;
       *	unsigned seq, idx;
       *
       *	do {
       *		seq = lockless_dereference(latch->seq);
      
      So here we have:
      
      		seq = READ_ONCE(latch->seq);
      		smp_read_barrier_depends();
      
      Which is exactly what we want; the new code:
      
      		seq = ({ p = READ_ONCE(latch);
      			 smp_read_barrier_depends(); p })->seq;
      
      is just wrong; because it looses the volatile read on seq, which can now
      be torn or worse 'optimized'. And the read_depend barrier is also placed
      wrong, we want it after the load of seq, to match the above data[]
      up-to-date wmb()s.
      
      Such that when we dereference latch->data[] below, we're guaranteed to
      observe the right data.
      
       *
       *		idx = seq & 0x01;
       *		entry = data_query(latch->data[idx], ...);
       *
       *		smp_rmb();
       *	} while (seq != latch->seq);
       *
       *	return entry;
       * }
      
      So yes, not passing a pointer is not pretty, but the code was correct,
      and isn't anymore now.
      
      Change to explicit READ_ONCE()+smp_read_barrier_depends() to avoid
      confusion and allow strict lockless_dereference() checking.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 50755bc1 ("seqlock: fix raw_read_seqcount_latch()")
      Link: http://lkml.kernel.org/r/20160527111117.GL3192@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      55eed755
  2. 01 Jun, 2016 5 commits
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 719af93a
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Here are three pin control fixes for v4.7.  Not much, and just driver
        fixes:
      
         - add device tree matches to MAINTAINERS
      
         - inversion bug in the Nomadik driver
      
         - dual edge handling bug in the mediatek driver"
      
      * tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: mediatek: fix dual-edge code defect
        MAINTAINERS: Add file patterns for pinctrl device tree bindings
        pinctrl: nomadik: fix inversion of gpio direction
      719af93a
    • Linus Torvalds's avatar
      Merge tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf · ebb8cb2b
      Linus Torvalds authored
      Pull dma-buf updates from Sumit Semwal:
      
       - use of vma_pages instead of explicit computation
      
       - DocBook and headerdoc updates for dma-buf
      
      * tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
        dma-buf: use vma_pages()
        fence: add missing descriptions for fence
        doc: update/fixup dma-buf related DocBook
        reservation: add headerdoc comments
        dma-buf: headerdoc fixes
      ebb8cb2b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6b15d665
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi.
      
       2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized
          properly.  From Ezequiel Garcia.
      
       3) Missing spinlock init in mvneta driver, from Gregory CLEMENT.
      
       4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT.
      
       5) Fix deadlock on team->lock when propagating features, from Ivan
          Vecera.
      
       6) Work around buffer offset hw bug in alx chips, from Feng Tang.
      
       7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin
          Long.
      
       8) Various statistics bug fixes in mlx4 from Eric Dumazet.
      
       9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann.
      
      10) All of l2tp was namespace aware, but the ipv6 support code was not
          doing so.  From Shmulik Ladkani.
      
      11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck.
      
      12) Propagate MAC changes properly through VLAN devices, from Mike
          Manning.
      
      13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
        sfc: Track RPS flow IDs per channel instead of per function
        usbnet: smsc95xx: fix link detection for disabled autonegotiation
        virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
        bnx2x: avoid leaking memory on bnx2x_init_one() failures
        fou: fix IPv6 Kconfig options
        openvswitch: update checksum in {push,pop}_mpls
        sctp: sctp_diag should dump sctp socket type
        net: fec: update dirty_tx even if no skb
        vlan: Propagate MAC address to VLANs
        atm: iphase: off by one in rx_pkt()
        atm: firestream: add more reserved strings
        vxlan: Accept user specified MTU value when create new vxlan link
        net: pktgen: Call destroy_hrtimer_on_stack()
        timer: Export destroy_hrtimer_on_stack()
        net: l2tp: Make l2tp_ip6 namespace aware
        Documentation: ip-sysctl.txt: clarify secure_redirects
        sfc: use flow dissector helpers for aRFS
        ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
        net: nps_enet: Disable interrupts before napi reschedule
        net/lapb: tuse %*ph to dump buffers
        ...
      6b15d665
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 58c1f995
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "sparc64 mmu context allocation and trap return bug fixes"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix return from trap window fill crashes.
        sparc: Harden signal return frame checks.
        sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
      58c1f995
    • Jon Cooper's avatar
      sfc: Track RPS flow IDs per channel instead of per function · faf8dcc1
      Jon Cooper authored
      Otherwise we get confused when two flows on different channels get the
       same flow ID.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faf8dcc1
  3. 31 May, 2016 21 commits
  4. 30 May, 2016 9 commits