1. 03 Sep, 2013 1 commit
    • Linus Torvalds's avatar
      Merge branch 'lockref' (locked reference counts) · fc6d0b03
      Linus Torvalds authored
      Merge lockref infrastructure code by me and Waiman Long.
      
      I already merged some of the preparatory patches that didn't actually do
      any semantic changes earlier, but this merges the actual _reason_ for
      those preparatory patches.
      
      The "lockref" structure is a combination "spinlock and reference count"
      that allows optimized reference count accesses.  In particular, it
      guarantees that the reference count will be updated AS IF the spinlock
      was held, but using atomic accesses that cover both the reference count
      and the spinlock words, we can often do the update without actually
      having to take the lock.
      
      This allows us to avoid the nastiest cases of spinlock contention on
      large machines under heavy pathname lookup loads.  When updating the
      dentry reference counts on a large system, we'll still end up with the
      cache line bouncing around, but that's much less noticeable than
      actually having to spin waiting for the lock.
      
      * lockref:
        lockref: implement lockless reference count updates using cmpxchg()
        lockref: uninline lockref helper functions
        vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()
        vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()
        lockref: add 'lockref_get_or_lock() helper
      fc6d0b03
  2. 02 Sep, 2013 9 commits
    • Linus Torvalds's avatar
      Linux 3.11 · 6e466452
      Linus Torvalds authored
      6e466452
    • Linus Torvalds's avatar
      lockref: implement lockless reference count updates using cmpxchg() · bc08b449
      Linus Torvalds authored
      Instead of taking the spinlock, the lockless versions atomically check
      that the lock is not taken, and do the reference count update using a
      cmpxchg() loop.  This is semantically identical to doing the reference
      count update protected by the lock, but avoids the "wait for lock"
      contention that you get when accesses to the reference count are
      contended.
      
      Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
      Even when the lockref reference counts are updated atomically with
      cmpxchg, the fact that they also verify the state of the spinlock means
      that the lockless updates can never happen while somebody else holds the
      spinlock.
      
      So while "lockref_put_or_lock()" looks a lot like just another name for
      "atomic_dec_and_lock()", and both optimize to lockless updates, they are
      fundamentally different: the decrement done by atomic_dec_and_lock() is
      truly independent of any lock (as long as it doesn't decrement to zero),
      so a locked region can still see the count change.
      
      The lockref structure, in contrast, really is a *locked* reference
      count.  If you hold the spinlock, the reference count will be stable and
      you can modify the reference count without using atomics, because even
      the lockless updates will see and respect the state of the lock.
      
      In order to enable the cmpxchg lockless code, the architecture needs to
      do three things:
      
       (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
           in an aligned u64, and have a "cmpxchg()" implementation that works
           on such a u64 data type.
      
       (2) define a helper function to test for a spinlock being unlocked
           ("arch_spin_value_unlocked()")
      
       (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
           Kconfig file.
      
      This enables it for x86-64 (but not 32-bit, we'd need to make sure
      cmpxchg() turns into the proper cmpxchg8b in order to enable it for
      32-bit mode).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc08b449
    • Linus Torvalds's avatar
      lockref: uninline lockref helper functions · 2f4f12e5
      Linus Torvalds authored
      They aren't very good to inline, since they already call external
      functions (the spinlock code), and we're going to create rather more
      complicated versions of them that can do the reference count updates
      locklessly.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f4f12e5
    • Linus Torvalds's avatar
      vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() · 15570086
      Linus Torvalds authored
      This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
      and re-implements it using the lockref infrastructure instead.  It also
      adds a lot of comments about what is actually going on, because turning
      a dentry that was looked up using RCU into a long-lived reference
      counted entry is one of the more subtle parts of the rcu walk.
      
      We also used to be _particularly_ subtle in unlazy_walk() where we
      re-validate both the dentry and its parent using the same sequence
      count.  We used to do it by nesting the locks and then verifying the
      sequence count just once.
      
      That was silly, because nested locking is expensive, but the sequence
      count check is not.  So this just re-validates the dentry and the parent
      separately, avoiding the nested locking, and making the lockref lookup
      possible.
      Acked-by: default avatarWaiman Long <waiman.long@hp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15570086
    • Waiman Long's avatar
      vfs: use lockref_get_not_zero() for optimistic lockless dget_parent() · df3d0bbc
      Waiman Long authored
      A valid parent pointer is always going to have a non-zero reference
      count, but if we look up the parent optimistically without locking, we
      have to protect against the (very unlikely) race against renaming
      changing the parent from under us.
      
      We do that by using lockref_get_not_zero(), and then re-checking the
      parent pointer after getting a valid reference.
      
      [ This is a re-implementation of a chunk from the original patch by
        Waiman Long: "dcache: Enable lockless update of dentry's refcount".
        I've completely rewritten the patch-series and split it up, but I'm
        attributing this part to Waiman as it's close enough to his earlier
        patch  - Linus ]
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df3d0bbc
    • Linus Torvalds's avatar
      lockref: add 'lockref_get_or_lock() helper · b3abd802
      Linus Torvalds authored
      This behaves like "lockref_get_not_zero()", but instead of doing nothing
      if the count was zero, it returns with the lock held.
      
      This allows callers to revalidate the lockref-protected data structure
      if required even if the count was zero to begin with, and possibly
      increment the count if it passes muster.
      
      In particular, the dentry code wants this when it wants to turn an
      RCU-protected dentry into a stable refcounted one: if the dentry count
      it zero, but the sequence number still validates the dentry, we can take
      a reference to it.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3abd802
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 248d296d
      Linus Torvalds authored
      Pull SCSI fix from James Bottomley:
       "This is a bug fix for the pm80xx driver.  It turns out that when the
        new hardware support was added in 3.10 the IO command size was kept at
        the old hard coded value.  This means that the driver attaches to some
        new cards and then simply hangs the system"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        [SCSI] pm80xx: fix Adaptec 71605H hang
      248d296d
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e09a1fa9
      Linus Torvalds authored
      Pull x86 boot fix from Peter Anvin:
       "A single very small boot fix for very large memory systems (> 0.5T)"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM
      e09a1fa9
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · ac0bc789
      Linus Torvalds authored
      Pull slave-dma fix from Vinod Koul:
       "A fix for resolving TI_EDMA driver's build error in allmodconfig to
        have filter function built in""
      
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dma/Kconfig: TI_EDMA needs to be boolean
      ac0bc789
  3. 31 Aug, 2013 3 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a8787645
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) There was a simplification in the ipv6 ndisc packet sending
          attempted here, which avoided using memory accounting on the
          per-netns ndisc socket for sending NDISC packets.  It did fix some
          important issues, but it causes regressions so it gets reverted here
          too.  Specifically, the problem with this change is that the IPV6
          output path really depends upon there being a valid skb->sk
          attached.
      
          The reason we want to do this change in some form when we figure out
          how to do it right, is that if a device goes down the ndisc_sk
          socket send queue will fill up and block NDISC packets that we want
          to send to other devices too.  That's really bad behavior.
      
          Hopefully Thomas can come up with a better version of this change.
      
       2) Fix a severe TCP performance regression by reverting a change made
          to dev_pick_tx() quite some time ago.  From Eric Dumazet.
      
       3) TIPC returns wrongly signed error codes, fix from Erik Hugne.
      
       4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the
          skb->sk too early.  Fix from Li Hongjun.
      
       5) RAW ipv4 sockets can use the wrong routing key during lookup, from
          Chris Clark.
      
       6) Similar to #1 revert an older change that tried to use plain
          alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner
          mark which needs to see the skb->sk for such frames.  From Phil
          Oester.
      
       7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz,
          specifically in the handling of virtual functions.
      
       8) IPSEC path error propagations to sockets is not done properly when
          we have v4 in v6, and v6 in v4 type rules.  Fix from Hannes Frederic
          Sowa.
      
       9) Fix missing channel context release in mac80211, from Johannes Berg.
      
      10) Fix network namespace handing wrt.  SCM_RIGHTS, from Andy
          Lutomirski.
      
      11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic
          drivers.  From Michal Schmidt.
      
      12) Hopefully a complete and correct fix for the genetlink dump locking
          and module reference counting.  From Pravin B Shelar.
      
      13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir.
      
      14) Fix handling of timestamp offset when restoring a snapshotted TCP
          socket.  From Andrew Vagin.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
        net: fec: fix time stamping logic after napi conversion
        net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
        mISDN: return -EINVAL on error in dsp_control_req()
        net: revert 8728c544 ("net: dev_pick_tx() fix")
        Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages"
        ipv4 tunnels: fix an oops when using ipip/sit with IPsec
        tipc: set sk_err correctly when connection fails
        tcp: tcp_make_synack() should use sock_wmalloc
        bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones
        ipv6: Don't depend on per socket memory for neighbour discovery messages
        ipv4: sendto/hdrincl: don't use destination address found in header
        tcp: don't apply tsoffset if rcv_tsecr is zero
        tcp: initialize rcv_tstamp for restored sockets
        net: xilinx: fix memleak
        net: usb: Add HP hs2434 device to ZLP exception table
        net: add cpu_relax to busy poll loop
        net: stmmac: fixed the pbl setting with DT
        genl: Hold reference on correct module while netlink-dump.
        genl: Fix genl dumpit() locking.
        xfrm: Fix potential null pointer dereference in xdst_queue_output
        ...
      a8787645
    • Ian Campbell's avatar
      MAINTAINERS: change my DT related maintainer address · de80963e
      Ian Campbell authored
      Filtering capabilities on my work email are pretty much non-existent and this
      has turned out to be something of a firehose...
      
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Acked-by: default avatarPawel Moll <pawel.moll@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de80963e
    • Linus Torvalds's avatar
      Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 936dbcc3
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This contains two Oops fixes (opti9xx and HD-audio) and a simple fixup
        for an Acer laptop.  All marked as stable patches"
      
      * tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: opti9xx: Fix conflicting driver object name
        ALSA: hda - Fix NULL dereference with CONFIG_SND_DYNAMIC_MINORS=n
        ALSA: hda - Add inverted digital mic fixup for Acer Aspire One
      936dbcc3
  4. 30 Aug, 2013 15 commits
  5. 29 Aug, 2013 12 commits