1. 09 Jul, 2014 18 commits
    • Paul E. McKenney's avatar
      Merge branches 'doc.2014.07.08a', 'fixes.2014.07.09a',... · 1823172a
      Paul E. McKenney authored
      Merge branches 'doc.2014.07.08a', 'fixes.2014.07.09a', 'maintainers.2014.07.08b', 'nocbs.2014.07.07a' and 'torture.2014.07.07a' into HEAD
      
      doc.2014.07.08a: Documentation updates.
      fixes.2014.07.09a: Miscellaneous fixes.
      maintainers.2014.07.08b: Maintainership updates.
      nocbs.2014.07.07a: Callback-offloading fixes.
      torture.2014.07.07a: Torture-test updates.
      1823172a
    • Pranith Kumar's avatar
      rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() · b41d1b92
      Pranith Kumar authored
      This commit annotates rcu_report_unblock_qs_rnp() in order to fix the
      following sparse warning:
      
      kernel/rcu/tree_plugin.h:990:13: warning: context imbalance in 'rcu_report_unblock_qs_rnp' - unexpected unlock
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      b41d1b92
    • Pranith Kumar's avatar
      rcu: Fix a sparse warning in rcu_initiate_boost() · 615e41c6
      Pranith Kumar authored
      This commit annotates rcu_initiate_boost() fixes the following sparse
      warning:
      
      	kernel/rcu/tree_plugin.h:1494:13: warning: context imbalance in 'rcu_initiate_boost' - unexpected unlock
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      615e41c6
    • Paul E. McKenney's avatar
      rcu: Fix __rcu_reclaim() to use true/false for bool · 406e3e53
      Paul E. McKenney authored
      The __rcu_reclaim() function returned 0/1, which is not proper for a
      function of type bool.  This commit therefore converts to false/true.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      406e3e53
    • Paul E. McKenney's avatar
      rcu: Remove CONFIG_PROVE_RCU_DELAY · 11992c70
      Paul E. McKenney authored
      The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
      effective at finding race conditions, so this commit removes it.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      [ paulmck: Remove definition and uses as noted by Paul Bolle. ]
      11992c70
    • Shan Wei's avatar
      rcu: Use __this_cpu_read() instead of per_cpu_ptr() · d860d403
      Shan Wei authored
      The __this_cpu_read() function produces better code than does
      per_cpu_ptr() on both ARM and x86.  For example, gcc (Ubuntu/Linaro
      4.7.3-12ubuntu1) 4.7.3 produces the following:
      
      ARMv7 per_cpu_ptr():
      
      force_quiescent_state:
          mov    r3, sp    @,
          bic    r1, r3, #8128    @ tmp171,,
          ldr    r2, .L98    @ tmp169,
          bic    r1, r1, #63    @ tmp170, tmp171,
          ldr    r3, [r0, #220]    @ __ptr, rsp_6(D)->rda
          ldr    r1, [r1, #20]    @ D.35903_68->cpu, D.35903_68->cpu
          mov    r6, r0    @ rsp, rsp
          ldr    r2, [r2, r1, asl #2]    @ tmp173, __per_cpu_offset
          add    r3, r3, r2    @ tmp175, __ptr, tmp173
          ldr    r5, [r3, #12]    @ rnp_old, D.29162_13->mynode
      
      ARMv7 __this_cpu_read():
      
      force_quiescent_state:
          ldr    r3, [r0, #220]    @ rsp_7(D)->rda, rsp_7(D)->rda
          mov    r6, r0    @ rsp, rsp
          add    r3, r3, #12    @ __ptr, rsp_7(D)->rda,
          ldr    r5, [r2, r3]    @ rnp_old, *D.29176_13
      
      Using gcc 4.8.2:
      
      x86_64 per_cpu_ptr():
      
          movl %gs:cpu_number,%edx    # cpu_number, pscr_ret__
          movslq    %edx, %rdx    # pscr_ret__, pscr_ret__
          movq    __per_cpu_offset(,%rdx,8), %rdx    # __per_cpu_offset, tmp93
          movq    %rdi, %r13    # rsp, rsp
          movq    1000(%rdi), %rax    # rsp_9(D)->rda, __ptr
          movq    24(%rdx,%rax), %r12    # _15->mynode, rnp_old
      
      x86_64 __this_cpu_read():
      
          movq    %rdi, %r13    # rsp, rsp
          movq    1000(%rdi), %rax    # rsp_9(D)->rda, rsp_9(D)->rda
          movq %gs:24(%rax),%r12    # _10->mynode, rnp_old
      
      Because this change produces significant benefits for these two very
      diverse architectures, this commit makes this change.
      Signed-off-by: default avatarShan Wei <davidshan@tencent.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      d860d403
    • Paul E. McKenney's avatar
      rcu: Don't use NMIs to dump other CPUs' stacks · bc1dce51
      Paul E. McKenney authored
      Although NMI-based stack dumps are in principle more accurate, they are
      also more likely to trigger deadlocks.  This commit therefore replaces
      all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
      that the CPU detecting an RCU CPU stall does the stack dumping.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      bc1dce51
    • Paul E. McKenney's avatar
      rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs · c0f489d2
      Paul E. McKenney authored
      Binding the grace-period kthreads to the timekeeping CPU resulted in
      significant performance decreases for some workloads.  For more detail,
      see:
      
      https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
      
      https://lkml.org/lkml/2014/6/4/218 for CPU statistics
      
      It turns out that it is necessary to bind the grace-period kthreads
      to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
      on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
      In other cases, it suffices to bind the grace-period kthreads to the
      set of non-nohz_full CPUs.
      
      This commit therefore creates a tick_nohz_not_full_mask that is the
      complement of tick_nohz_full_mask, and then binds the grace-period
      kthread to the set of CPUs indicated by this new mask, which covers
      the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
      case still binds the grace-period kthreads to the timekeeping CPU.
      This commit also includes the tick_nohz_full_enabled() check suggested
      by Frederic Weisbecker.
      Reported-by: default avatarJet Chen <jet.chen@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Created housekeeping_affine() and housekeeping_mask per
        fweisbec feedback. ]
      c0f489d2
    • Paul E. McKenney's avatar
      rcu: Simplify priority boosting by putting rt_mutex in rcu_node · abaa93d9
      Paul E. McKenney authored
      RCU priority boosting currently checks for boosting via a pointer in
      task_struct.  However, this is not needed: As Oleg noted, if the
      rt_mutex is placed in the rcu_node instead of on the booster's stack,
      the boostee can simply check it see if it owns the lock.  This commit
      makes this change, shrinking task_struct by one pointer and the kernel
      by thirteen lines.
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      abaa93d9
    • Pranith Kumar's avatar
      rcu: Check both root and current rcu_node when setting up future grace period · 48bd8e9b
      Pranith Kumar authored
      The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
      and ->completed twice, once without ACCESS_ONCE() and once with it.
      Which is pointless because we hold that rcu_node's ->lock at that point.
      The intent was to check the current rcu_node structure and the root
      rcu_node structure, the latter locklessly with ACCESS_ONCE().  This
      commit therefore makes that change.
      
      The reason that it is safe to locklessly check the root rcu_nodes's
      ->gpnum and ->completed fields is that we hold the current rcu_node's
      ->lock, which constrains the root rcu_node's ability to change its
      ->gpnum and ->completed fields.  Of course, if there is a single rcu_node
      structure, then rnp_root==rnp, and holding the lock prevents all changes.
      If there is more than one rcu_node structure, then the code updates the
      fields in the following order:
      
      1.	Increment rnp_root->gpnum to start new grace period.
      2.	Increment rnp->gpnum to initialize the current rcu_node,
      	continuing initialization for the new grace period.
      3.	Increment rnp_root->completed to end the current grace period.
      4.	Increment rnp->completed to continue cleaning up after the
      	old grace period.
      
      So there are four possible combinations of relative values of these
      four fields:
      
      N   N   N   N:  RCU idle, new grace period must be initiated.
      		Although rnp_root->gpnum might be incremented immediately
      		after we check, that will just result in unnecessary work.
      		The grace period already started, and we try to start it.
      
      N+1 N   N   N:  RCU grace period just started.  No further change is
      		possible because we hold rnp->lock, so the checks of
      		rnp_root->gpnum and rnp_root->completed are stable.
      		We know that our request for a future grace period will
      		be seen during grace-period cleanup.
      
      N+1 N   N+1 N:  RCU grace period is ongoing.  Because rnp->gpnum is
      		different than rnp->completed, we won't even look at
      		rnp_root->gpnum and rnp_root->completed, so the possible
      		concurrent change to rnp_root->completed does not matter.
      		We know that our request for a future grace period will
      		be seen during grace-period cleanup, which cannot pass
      		this rcu_node because we hold its ->lock.
      
      N+1 N+1 N+1 N:  RCU grace period has ended, but not yet been cleaned up.
      		Because rnp->gpnum is different than rnp->completed, we
      		won't look at rnp_root->gpnum and rnp_root->completed, so
      		the possible concurrent change to rnp_root->completed does
      		not matter.  We know that our request for a future grace
      		period will be seen during grace-period cleanup, which
      		cannot pass this rcu_node because we hold its ->lock.
      
      Therefore, despite initial appearances, the lockless check is safe.
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      [ paulmck: Update comment to say why the lockless check is safe. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      48bd8e9b
    • Paul E. McKenney's avatar
      rcu: Allow post-unlock reference for rt_mutex · dfeb9765
      Paul E. McKenney authored
      The current approach to RCU priority boosting uses an rt_mutex strictly
      for its priority-boosting side effects.  The rt_mutex_init_proxy_locked()
      function is used by the booster to initialize the lock as held by the
      boostee.  The booster then uses rt_mutex_lock() to acquire this rt_mutex,
      which priority-boosts the boostee.  When the boostee reaches the end
      of its outermost RCU read-side critical section, it checks a field in
      its task structure to see whether it has been boosted, and, if so, uses
      rt_mutex_unlock() to release the rt_mutex.  The booster can then go on
      to boost the next task that is blocking the current RCU grace period.
      
      But reasonable implementations of rt_mutex_unlock() might result in the
      boostee referencing the rt_mutex's data after releasing it.  But the
      booster might have re-initialized the rt_mutex between the time that the
      boostee released it and the time that it later referenced it.  This is
      clearly asking for trouble, so this commit introduces a completion that
      forces the booster to wait until the boostee has completely finished with
      the rt_mutex, thus avoiding the case where the booster is re-initializing
      the rt_mutex before the last boostee's last reference to that rt_mutex.
      
      This of course does introduce some overhead, but the priority-boosting
      code paths are miles from any possible fastpath, and the overhead of
      executing the completion will normally be quite small compared to the
      overhead of priority boosting and deboosting, so this should be OK.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dfeb9765
    • Paul E. McKenney's avatar
      rcu: Loosen __call_rcu()'s rcu_head alignment constraint · 1146edcb
      Paul E. McKenney authored
      The m68k architecture aligns only to 16-bit boundaries, which can cause
      the align-to-32-bits check in __call_rcu() to trigger.  Because there is
      currently no known potential need for more than one low-order bit, this
      commit loosens the check to 16-bit boundaries.
      Reported-by: default avatarGreg Ungerer <gerg@uclinux.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      1146edcb
    • Paul E. McKenney's avatar
      rcu: Eliminate read-modify-write ACCESS_ONCE() calls · a792563b
      Paul E. McKenney authored
      RCU contains code of the following forms:
      
      	ACCESS_ONCE(x)++;
      	ACCESS_ONCE(x) += y;
      	ACCESS_ONCE(x) -= y;
      
      Now these constructs do operate correctly, but they really result in a
      pair of volatile accesses, one to do the load and another to do the store.
      This can be confusing, as the casual reader might well assume that (for
      example) gcc might generate a memory-to-memory add instruction for each
      of these three cases.  In fact, gcc will do no such thing.  Also, there
      is a good chance that the kernel will move to separate load and store
      variants of ACCESS_ONCE(), and constructs like the above could easily
      confuse both people and scripts attempting to make that sort of change.
      Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
      only need the store to be volatile, so that the read-modify-write form
      might be misleading.
      
      This commit therefore changes the above forms in RCU so that each instance
      of ACCESS_ONCE() either does a load or a store, but not both.  In a few
      cases, ACCESS_ONCE() was not critical, for example, for maintaining
      statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
      entirely.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a792563b
    • Paul E. McKenney's avatar
      rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu · 4da117cf
      Paul E. McKenney authored
      In kernels built with CONFIG_NO_HZ_FULL, tick_do_timer_cpu is constant
      once boot completes.  Thus, there is no need to wrap it in ACCESS_ONCE()
      in code that is built only when CONFIG_NO_HZ_FULL.  This commit therefore
      removes the redundant ACCESS_ONCE().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      4da117cf
    • Fabian Frederick's avatar
      rcu: Make rcu node arrays static const char * const · b4426b49
      Fabian Frederick authored
      Those two arrays are being passed to lockdep_init_map(), which expects
      const char *, and are stored in lockdep_map the same way.
      
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b4426b49
    • Paul E. McKenney's avatar
      signal: Explain local_irq_save() call · c41247e1
      Paul E. McKenney authored
      The explicit local_irq_save() in __lock_task_sighand() is needed to avoid
      a potential deadlock condition, as noted in a841796f (signal:
      align __lock_task_sighand() irq disabling and RCU).  However, someone
      reading the code might be forgiven for concluding that this separate
      local_irq_save() was completely unnecessary.  This commit therefore adds
      a comment referencing the shiny new block comment on rcu_read_unlock().
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      c41247e1
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
  2. 08 Jul, 2014 7 commits
  3. 07 Jul, 2014 5 commits
  4. 26 Jun, 2014 2 commits
  5. 23 Jun, 2014 2 commits
    • Paul E. McKenney's avatar
      rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832
      Paul E. McKenney authored
      Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
      fixed a problem where a CPU looping in the kernel with but one runnable
      task would give RCU CPU stall warnings, even if the in-kernel loop
      contained cond_resched() calls.  Unfortunately, in so doing, it introduced
      performance regressions in Anton Blanchard's will-it-scale "open1" test.
      The problem appears to be not so much the increased cond_resched() path
      length as an increase in the rate at which grace periods complete, which
      increased per-update grace-period overhead.
      
      This commit takes a different approach to fixing this bug, mainly by
      moving the RCU-visible quiescent state from cond_resched() to
      rcu_note_context_switch(), and by further reducing the check to a
      simple non-zero test of a single per-CPU variable.  However, this
      approach requires that the force-quiescent-state processing send
      resched IPIs to the offending CPUs.  These will be sent only once
      the grace period has reached an age specified by the boot/sysfs
      parameter rcutree.jiffies_till_sched_qs, or once the grace period
      reaches an age halfway to the point at which RCU CPU stall warnings
      will be emitted, whichever comes first.
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
        ktest build robot.  Also fixed smp_mb() comment as noted by
        Oleg Nesterov. ]
      
      Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4a81e832
    • Paul E. McKenney's avatar
      rcu: Export debug_init_rcu_head() and and debug_init_rcu_head() · 546a9d85
      Paul E. McKenney authored
      Currently, call_rcu() relies on implicit allocation and initialization
      for the debug-objects handling of RCU callbacks.  If you hammer the
      kernel hard enough with Sasha's modified version of trinity, you can end
      up with the sl*b allocators recursing into themselves via this implicit
      call_rcu() allocation.
      
      This commit therefore exports the debug_init_rcu_head() and
      debug_rcu_head_free() functions, which permits the allocators to allocated
      and pre-initialize the debug-objects information, so that there no longer
      any need for call_rcu() to do that initialization, which in turn prevents
      the recursion into the memory allocators.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Looks-good-to: Christoph Lameter <cl@linux.com>
      546a9d85
  6. 16 Jun, 2014 4 commits
    • Linus Torvalds's avatar
      Linux 3.16-rc1 · 7171511e
      Linus Torvalds authored
      7171511e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a9be2242
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix checksumming regressions, from Tom Herbert.
      
       2) Undo unintentional permissions changes for SCTP rto_alpha and
          rto_beta sysfs knobs, from Denial Borkmann.
      
       3) VXLAN, like other IP tunnels, should advertize it's encapsulation
          size using dev->needed_headroom instead of dev->hard_header_len.
          From Cong Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: sctp: fix permissions for rto_alpha and rto_beta knobs
        vxlan: Checksum fixes
        net: add skb_pop_rcv_encapsulation
        udp: call __skb_checksum_complete when doing full checksum
        net: Fix save software checksum complete
        net: Fix GSO constants to match NETIF flags
        udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
        vxlan: use dev->needed_headroom instead of dev->hard_header_len
        MAINTAINERS: update cxgb4 maintainer
      a9be2242
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux · dd1845af
      Linus Torvalds authored
      Pull more clock framework updates from Mike Turquette:
       "This contains the second half the of the clk changes for 3.16.
      
        They are simply fixes and code refactoring for the OMAP clock drivers.
        The sunxi clock driver changes include splitting out the one
        mega-driver into several smaller pieces and adding support for the A31
        SoC clocks"
      
      * tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits)
        clk: sunxi: document PRCM clock compatible strings
        clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support
        clk: sun6i: Protect SDRAM gating bit
        clk: sun6i: Protect CPU clock
        clk: sunxi: Rework clock protection code
        clk: sunxi: Move the GMAC clock to a file of its own
        clk: sunxi: Move the 24M oscillator to a file of its own
        clk: sunxi: Remove calls to clk_put
        clk: sunxi: document new A31 USB clock compatible
        clk: sunxi: Implement A31 USB clock
        ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies
        CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies
        ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC)
        CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck
        CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)
        dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings
        ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock
        CLK: TI: gate: add composite interface clock to OMAP2 only build
        ARM: OMAP2: clock: add DT boot support for cpufreq_ck
        CLK: TI: OMAP2: add clock init support
        ...
      dd1845af
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/users/willy/linux-nvme · b55b3902
      Linus Torvalds authored
      Pull NVMe update from Matthew Wilcox:
       "Mostly bugfixes again for the NVMe driver.  I'd like to call out the
        exported tracepoint in the block layer; I believe Keith has cleared
        this with Jens.
      
        We've had a few reports from people who're really pounding on NVMe
        devices at scale, hence the timeout changes (and new module
        parameters), hotplug cpu deadlock, tracepoints, and minor performance
        tweaks"
      
      [ Jens hadn't seen that tracepoint thing, but is ok with it - it will
        end up going away when mq conversion happens ]
      
      * git://git.infradead.org/users/willy/linux-nvme: (22 commits)
        NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.
        NVMe: Use Log Page constants in SCSI emulation
        NVMe: Define Log Page constants
        NVMe: Fix hot cpu notification dead lock
        NVMe: Rename io_timeout to nvme_io_timeout
        NVMe: Use last bytes of f/w rev SCSI Inquiry
        NVMe: Adhere to request queue block accounting enable/disable
        NVMe: Fix nvme get/put queue semantics
        NVMe: Delete NVME_GET_FEAT_TEMP_THRESH
        NVMe: Make admin timeout a module parameter
        NVMe: Make iod bio timeout a parameter
        NVMe: Prevent possible NULL pointer dereference
        NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD)
        NVMe: Update data structures for NVMe 1.2
        NVMe: Enable BUILD_BUG_ON checks
        NVMe: Update namespace and controller identify structures to the 1.1a spec
        NVMe: Flush with data support
        NVMe: Configure support for block flush
        NVMe: Add tracepoints
        NVMe: Protect against badly formatted CQEs
        ...
      b55b3902
  7. 15 Jun, 2014 2 commits
    • Daniel Borkmann's avatar
      net: sctp: fix permissions for rto_alpha and rto_beta knobs · b58537a1
      Daniel Borkmann authored
      Commit 3fd091e7 ("[SCTP]: Remove multiple levels of msecs
      to jiffies conversions.") has silently changed permissions for
      rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
      this was to discourage users from tweaking rto_alpha and
      rto_beta knobs in production environments since they are key
      to correctly compute rtt/srtt.
      
      RFC4960 under section 6.3.1. RTO Calculation says regarding
      rto_alpha and rto_beta under rule C3 and C4:
      
        [...]
        C3)  When a new RTT measurement R' is made, set
      
             RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|
      
             and
      
             SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'
      
             Note: The value of SRTT used in the update to RTTVAR
             is its value before updating SRTT itself using the
             second assignment. After the computation, update
             RTO <- SRTT + 4 * RTTVAR.
      
        C4)  When data is in flight and when allowed by rule C5
             below, a new RTT measurement MUST be made each round
             trip. Furthermore, new RTT measurements SHOULD be
             made no more than once per round trip for a given
             destination transport address. There are two reasons
             for this recommendation: First, it appears that
             measuring more frequently often does not in practice
             yield any significant benefit [ALLMAN99]; second,
             if measurements are made more often, then the values
             of RTO.Alpha and RTO.Beta in rule C3 above should be
             adjusted so that SRTT and RTTVAR still adjust to
             changes at roughly the same rate (in terms of how many
             round trips it takes them to reflect new values) as
             they would if making only one measurement per
             round-trip and using RTO.Alpha and RTO.Beta as given
             in rule C3. However, the exact nature of these
             adjustments remains a research issue.
        [...]
      
      While it is discouraged to adjust rto_alpha and rto_beta
      and not further specified how to adjust them, the RFC also
      doesn't explicitly forbid it, but rather gives a RECOMMENDED
      default value (rto_alpha=3, rto_beta=2). We have a couple
      of users relying on the old permissions before they got
      changed. That said, if someone really has the urge to adjust
      them, we could allow it with a warning in the log.
      
      Fixes: 3fd091e7 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b58537a1
    • David S. Miller's avatar
      Merge branch 'csum_fixes' · e4f7ae93
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      Fixes related to some recent checksum modifications.
      
      - Fix GSO constants to match NETIF flags
      - Fix logic in saving checksum complete in __skb_checksum_complete
      - Call __skb_checksum_complete from UDP if we are checksumming over
        whole packet in order to save checksum.
      - Fixes to VXLAN to work correctly with checksum complete
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4f7ae93