1. 03 Dec, 2009 4 commits
    • Paul E. McKenney's avatar
      rcu: Make RCU's CPU-stall detector be default · 8bfb2f8e
      Paul E. McKenney authored
      The RCU_CPU_STALL_DETECTOR costs almost nothing and has located
      some bugs that might otherwise have been difficult to track
      down.  Make it be default for the TREE RCU implementations.
      
      The vmlinux size impact is limited (on 64-bit x86 defconfig):
      
         text	   data	    bss	    dec	    hex	filename
         8440248	1260076	 995588	10695912	 a334e8	vmlinux.before
         8440774	1260060	 995588	10696422	 a336e6	vmlinux.after
      
      +526 bytes - acceptable default cost.
      
      For RAM starved systems, TINY_RCU does not support CPU-stall detection
      and is much smaller, but then again it is a uniprocessor...
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846162906-git-send-email->
      [ v2: added image size calculations to the changelog ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8bfb2f8e
    • Paul E. McKenney's avatar
      rcu: Add expedited grace-period support for preemptible RCU · d9a3da06
      Paul E. McKenney authored
      Implement an synchronize_rcu_expedited() for preemptible RCU
      that actually is expedited.  This uses
      synchronize_sched_expedited() to force all threads currently
      running in a preemptible-RCU read-side critical section onto the
      appropriate ->blocked_tasks[] list, then takes a snapshot of all
      of these lists and waits for them to drain.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1259784616158-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d9a3da06
    • Paul E. McKenney's avatar
      rcu: Enable fourth level of TREE_RCU hierarchy · cf244dc0
      Paul E. McKenney authored
      Enable a fourth level of rcu_node hierarchy for TREE_RCU and
      TREE_PREEMPT_RCU.  This is for stress-testing and experiemental
      purposes only, although in theory this would enable 16,777,216
      CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
      systems. Normal experimental use of this fourth level will
      normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
      though the more adventurous (and more fortunate) experimenters
      may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
      CONFIG_RCU_FANOUT=4 for 256-CPU systems.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846161257-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cf244dc0
    • Paul E. McKenney's avatar
      rcu: Rename "quiet" functions · d3f6bad3
      Paul E. McKenney authored
      The number of "quiet" functions has grown recently, and the
      names are no longer very descriptive.  The point of all of these
      functions is to do some portion of the task of reporting a
      quiescent state, so rename them accordingly:
      
      o	cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
      	quiescent state to the per-CPU rcu_data structure.  If this
      	turns out to be a new quiescent state for this grace period,
      	then rcu_report_qs_rnp() will be invoked to propagate the
      	quiescent state up the rcu_node hierarchy.
      
      o	cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
      	a quiescent state for a given CPU (or possibly a set of CPUs)
      	up the rcu_node hierarchy.
      
      o	cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
      	reports a full set of quiescent states to the global rcu_state
      	structure.
      
      o	task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
      	a quiescent state due to a task exiting an RCU read-side critical
      	section that had previously blocked in that same critical section.
      	As indicated by the new name, this type of quiescent state is
      	reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
      	to do so).
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846163698-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d3f6bad3
  2. 22 Nov, 2009 3 commits
    • Paul E. McKenney's avatar
      rcu: Re-arrange code to reduce #ifdef pain · 6ebb237b
      Paul E. McKenney authored
      Remove #ifdefs from kernel/rcupdate.c and
      include/linux/rcupdate.h by moving code to
      include/linux/rcutiny.h, include/linux/rcutree.h, and
      kernel/rcutree.c.
      
      Also remove some definitions that are no longer used.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1258908830885-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6ebb237b
    • Paul E. McKenney's avatar
      rcu: Eliminate unneeded function wrapping · 9f680ab4
      Paul E. McKenney authored
      The functions rcu_init() is a wrapper for __rcu_init(), and also
      sets up the CPU-hotplug notifier for rcu_barrier_cpu_hotplug().
      But TINY_RCU doesn't need CPU-hotplug notification, and the
      rcu_barrier_cpu_hotplug() is a simple wrapper for
      rcu_cpu_notify().
      
      So push rcu_init() out to kernel/rcutree.c and kernel/rcutiny.c
      and get rid of the wrapper function rcu_barrier_cpu_hotplug().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12589088302320-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9f680ab4
    • Paul E. McKenney's avatar
      rcu: Fix grace-period-stall bug on large systems with CPU hotplug · b668c9cf
      Paul E. McKenney authored
      When the last CPU of a given leaf rcu_node structure goes
      offline, all of the tasks queued on that leaf rcu_node structure
      (due to having blocked in their current RCU read-side critical
      sections) are requeued onto the root rcu_node structure.  This
      requeuing is carried out by rcu_preempt_offline_tasks().
      However, it is possible that these queued tasks are the only
      thing preventing the leaf rcu_node structure from reporting a
      quiescent state up the rcu_node hierarchy.  Unfortunately, the
      old code would fail to do this reporting, resulting in a
      grace-period stall given the following sequence of events:
      
      1.	Kernel built for more than 32 CPUs on 32-bit systems or for more
      	than 64 CPUs on 64-bit systems, so that there is more than one
      	rcu_node structure.  (Or CONFIG_RCU_FANOUT is artificially set
      	to a number smaller than CONFIG_NR_CPUS.)
      
      2.	The kernel is built with CONFIG_TREE_PREEMPT_RCU.
      
      3.	A task running on a CPU associated with a given leaf rcu_node
      	structure blocks while in an RCU read-side critical section
      	-and- that CPU has not yet passed through a quiescent state
      	for the current RCU grace period.  This will cause the task
      	to be queued on the leaf rcu_node's blocked_tasks[] array, in
      	particular, on the element of this array corresponding to the
      	current grace period.
      
      4.	Each of the remaining CPUs corresponding to this same leaf rcu_node
      	structure pass through a quiescent state.  However, the task is
      	still in its RCU read-side critical section, so these quiescent
      	states cannot be reported further up the rcu_node hierarchy.
      	Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
      	field are now zero.
      
      5.	Each of the remaining CPUs go offline.  (The events in step
      	#4 and #5 can happen in any order as long as each CPU passes
      	through a quiescent state before going offline.)
      
      6.	When the last CPU goes offline, __rcu_offline_cpu() will invoke
      	rcu_preempt_offline_tasks(), which will move the task to the
      	root rcu_node structure, but without reporting a quiescent state
      	up the rcu_node hierarchy (and this failure to report a quiescent
      	state is the bug).
      
      	But because this leaf rcu_node structure's ->qsmask field is
      	already zero and its ->block_tasks[] entries are all empty,
      	force_quiescent_state() will skip this rcu_node structure.
      
      	Therefore, grace periods are now hung.
      
      This patch abstracts some code out of rcu_read_unlock_special(),
      calling the result task_quiet() by analogy with cpu_quiet(), and
      invokes task_quiet() from both rcu_read_lock_special() and
      __rcu_offline_cpu().  Invoking task_quiet() from
      __rcu_offline_cpu() reports the quiescent state up the rcu_node
      hierarchy, fixing the bug.  This ends up requiring a separate
      lock_class_key per level of the rcu_node hierarchy, which this
      patch also provides.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12589088301770-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b668c9cf
  3. 14 Nov, 2009 2 commits
    • Paul E. McKenney's avatar
      rcu: Eliminate __rcu_pending() false positives · 2f51f988
      Paul E. McKenney authored
      Now that there are both ->gpnum and ->completed fields in the
      rcu_node structure, __rcu_pending() should check rdp->gpnum and
      rdp->completed against rnp->gpnum and rdp->completed, respectively,
      instead of the prior comparison against the rcu_state fields
      rsp->gpnum and rsp->completed.
      
      Given the old comparison, __rcu_pending() could return 1, resulting
      in a needless raise_softirq(RCU_SOFTIRQ).  This useless work would
      happen if RCU responded to a scheduling-clock interrupt after the
      rcu_state fields had been updated, but before the rcu_node fields
      had been updated.
      
      Changing the comparison from the rcu_state fields to the rcu_node
      fields prevents this useless work from happening.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12581706991966-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2f51f988
    • Paul E. McKenney's avatar
      rcu: Further cleanups of use of lastcomp · 560d4bc0
      Paul E. McKenney authored
      Now that a copy of the rsp->completed flag is available in all
      rcu_node structures, make full use of it.  It is still
      legitimate to access rsp->completed while holding the root
      rcu_node structure's lock, however.
      
      Also, tighten up force_quiescent_state()'s checks for end of
      current grace period.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1258170699933-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      560d4bc0
  4. 13 Nov, 2009 2 commits
    • Paul E. McKenney's avatar
      rcu: Simplify association of forced quiescent states with grace periods · 8e9aa8f0
      Paul E. McKenney authored
      The force_quiescent_state() function also took a snapshot
      of the ->completed field, which was as obnoxious as it was in
      rcu_sched_qs() and friends.  So snapshot ->gpnum-1.
      
      Also, since the dyntick_record_completed() and
      dyntick_recall_completed() functions are now simple assignments
      that are independent of CONFIG_NO_HZ, and since their names are
      now misleading, get rid of them.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12580941042308-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8e9aa8f0
    • Paul E. McKenney's avatar
      rcu: Accelerate callback processing on CPUs not detecting GP end · b32e9eb6
      Paul E. McKenney authored
      An earlier fix for a race resulted in a situation where the CPUs
      other than the CPU that detected the end of the grace period would
      not process their callbacks until the next grace period started.
      
      This means that these other CPUs would unnecessarily demand that an
      extra grace period be started.
      
      This patch eliminates this extra grace period and speeds callback
      processing by propagating rsp->completed to the rcu_node structures
      in the case where the CPU detecting the end of the grace period
      sees no reason to start a new grace period.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1258094104417-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b32e9eb6
  5. 11 Nov, 2009 1 commit
  6. 10 Nov, 2009 8 commits
    • Paul E. McKenney's avatar
      rcu: Simplify association of quiescent states with grace periods · c64ac3ce
      Paul E. McKenney authored
      The rdp->passed_quiesc_completed fields are used to properly
      associate the recorded quiescent state with a grace period.  It
      is OK to wrongly associate a given quiescent state with a
      preceding grace period, but it is fatal to associate a given
      quiescent state with a grace period that begins after the
      quiescent state occurred.  Grace periods are numbered, and the
      following fields track them:
      
      o	->gpnum is the number of the grace period currently in
      	progress, or the number of the last grace period to
      	complete if no grace period is currently in progress.
      
      o	->completed is the number of the last grace period to
      	have completed.
      
      These two fields are equal if there is no grace period in
      progress, otherwise ->gpnum is one greater than ->completed.
      But the rdp->passed_quiesc_completed field compared against
      ->completed, and if equal, the quiescent state is presumed to
      count against the current grace period.
      
      The earlier code copied rdp->completed to
      rdp->passed_quiesc_completed, which has been made to work, but
      is error-prone.  In contrast, copying one less than rdp->gpnum
      is guaranteed safe, because rdp->gpnum is not incremented until
      after the start of the corresponding grace period. At the end of
      the grace period, when ->completed has incremented, then any
      quiescent periods recorded previously will be discarded.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890421011-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c64ac3ce
    • Paul E. McKenney's avatar
      rcu: Rename dynticks_completed to completed_fqs · 4bcfe055
      Paul E. McKenney authored
      This field is used whether or not CONFIG_NO_HZ is set, so the
      old name of ->dynticks_completed is quite misleading.
      
      Change to ->completed_fqs, given that it the value that
      force_quiescent_state() is trying to drive the ->completed field
      away from.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890423298-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4bcfe055
    • Paul E. McKenney's avatar
      rcu: Enable synchronize_sched_expedited() fastpath · 956539b7
      Paul E. McKenney authored
      This patch adds a counter increment to enable tasks to actually
      take the synchronize_sched_expedited() function's fastpath.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1257889042435-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      956539b7
    • Paul E. McKenney's avatar
      rcu: Remove inline from forward-referenced functions · dbe01350
      Paul E. McKenney authored
      Some variants of gcc are reputed to dislike forward references
      to functions declared "inline".  Remove the "inline" keyword
      from such functions.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890422402-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      dbe01350
    • Paul E. McKenney's avatar
      rcu: Fix note_new_gpnum() uses of ->gpnum · 9160306e
      Paul E. McKenney authored
      Impose a clear locking design on the note_new_gpnum()
      function's use of the ->gpnum counter.  This is done by updating
      rdp->gpnum only from the corresponding leaf rcu_node structure's
      rnp->gpnum field, and even then only under the protection of
      that same rcu_node structure's ->lock field.  Performance and
      scalability are maintained using a form of double-checked
      locking, and excessive spinning is avoided by use of the
      spin_trylock() function.  The use of spin_trylock() is safe due
      to the fact that CPUs who fail to acquire this lock will try
      again later. The hierarchical nature of the rcu_node data
      structure limits contention (which could be limited further if
      need be using the RCU_FANOUT kernel parameter).
      
      Without this patch, obscure but quite possible races could
      result in a quiescent state that occurred during one grace
      period to be accounted to the following grace period, causing
      this following grace period to end prematurely.  Not good!
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987492350-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9160306e
    • Paul E. McKenney's avatar
      rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter · d09b62df
      Paul E. McKenney authored
      Impose a clear locking design on the rcu_process_gp_end()
      function's use of the ->completed counter.  This is done by
      creating a ->completed field in the rcu_node structure, which
      can safely be accessed under the protection of that structure's
      lock.  Performance and scalability are maintained by using a
      form of double-checked locking, so that rcu_process_gp_end()
      only acquires the leaf rcu_node structure's ->lock if a grace
      period has recently ended.
      
      This fix reduces rcutorture failure rate by at least two orders
      of magnitude under heavy stress with force_quiescent_state()
      being invoked artificially often.  Without this fix,
      unsynchronized access to the ->completed field can cause
      rcu_process_gp_end() to advance callbacks whose grace period has
      not yet expired.  (Bad idea!)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987494069-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d09b62df
    • Paul E. McKenney's avatar
      rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter · 281d150c
      Paul E. McKenney authored
      Impose a clear locking design on non-NO_HZ handling of the
      ->completed counter.  This increases the distance between the
      RCU and the CPU-hotplug mechanisms.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987491353-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      281d150c
    • Ingo Molnar's avatar
      Merge branch 'core/urgent' into core/rcu · 7e1a2766
      Ingo Molnar authored
      Merge reason: Pick up RCU fixlet to base further commits on.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7e1a2766
  7. 02 Nov, 2009 3 commits
    • Lai Jiangshan's avatar
      rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls · c5e0cb3d
      Lai Jiangshan authored
      Currently, rcu_irq_exit() is invoked only for CONFIG_NO_HZ,
      while rcu_irq_enter() is invoked unconditionally.  This patch
      moves rcu_irq_exit() out from under CONFIG_NO_HZ so that the
      calls are balanced.
      
      This patch has no effect on the behavior of the kernel because
      both rcu_irq_enter() and rcu_irq_exit() are empty for
      !CONFIG_NO_HZ, but the code is easier to understand if the calls
      are obviously balanced in all cases.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12567428891605-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c5e0cb3d
    • Paul E. McKenney's avatar
      rcu: Fix long-grace-period race between forcing and initialization · 83f5b01f
      Paul E. McKenney authored
      Very long RCU read-side critical sections (50 milliseconds or
      so) can cause a race between force_quiescent_state() and
      rcu_start_gp() as follows on kernel builds with multi-level
      rcu_node hierarchies:
      
      1.	CPU 0 calls force_quiescent_state(), sees that there is a
      	grace period in progress, and acquires ->fsqlock.
      
      2.	CPU 1 detects the end of the grace period, and so
      	cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
      	This operation is carried out under the root rnp->lock,
      	but CPU 0 has not yet acquired that lock.  Note that
      	rsp->signaled is still RCU_SAVE_DYNTICK from the last
      	grace period.
      
      3.	CPU 1 calls rcu_start_gp(), but no one wants a new grace
      	period, so it drops the root rnp->lock and returns.
      
      4.	CPU 0 acquires the root rnp->lock and picks up rsp->completed
      	and rsp->signaled, then drops rnp->lock.  It then enters the
      	RCU_SAVE_DYNTICK leg of the switch statement.
      
      5.	CPU 2 invokes call_rcu(), and now needs a new grace period.
      	It calls rcu_start_gp(), which acquires the root rnp->lock, sets
      	rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
      	the RCU_SAVE_DYNTICK leg of the switch statement!)  and starts
      	initializing the rcu_node hierarchy.  If there are multiple
      	levels to the hierarchy, it will drop the root rnp->lock and
      	initialize the lower levels of the hierarchy.
      
      6.	CPU 0 notes that rsp->completed has not changed, which permits
              both CPU 2 and CPU 0 to try updating it concurrently.  If CPU 0's
      	update prevails, later calls to force_quiescent_state() can
      	count old quiescent states against the new grace period, which
      	can in turn result in premature ending of grace periods.
      
      	Not good.
      
      This patch adds an RCU_GP_IDLE state for rsp->signaled that is
      set initially at boot time and any time a grace period ends.
      This prevents CPU 0 from getting into the workings of
      force_quiescent_state() in step 4.  Additional locking and
      checks prevent the concurrent update of rsp->signaled in step 6.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1256742889199-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      83f5b01f
    • Thomas Gleixner's avatar
      uids: Prevent tear down race · b00bc0b2
      Thomas Gleixner authored
      Ingo triggered the following warning:
      
      WARNING: at lib/debugobjects.c:255 debug_print_object+0x42/0x50()
      Hardware name: System Product Name
      ODEBUG: init active object type: timer_list
      Modules linked in:
      Pid: 2619, comm: dmesg Tainted: G        W  2.6.32-rc5-tip+ #5298
      Call Trace:
       [<81035443>] warn_slowpath_common+0x6a/0x81
       [<8120e483>] ? debug_print_object+0x42/0x50
       [<81035498>] warn_slowpath_fmt+0x29/0x2c
       [<8120e483>] debug_print_object+0x42/0x50
       [<8120ec2a>] __debug_object_init+0x279/0x2d7
       [<8120ecb3>] debug_object_init+0x13/0x18
       [<810409d2>] init_timer_key+0x17/0x6f
       [<81041526>] free_uid+0x50/0x6c
       [<8104ed2d>] put_cred_rcu+0x61/0x72
       [<81067fac>] rcu_do_batch+0x70/0x121
      
      debugobjects warns about an enqueued timer being initialized. If
      CONFIG_USER_SCHED=y the user management code uses delayed work to
      remove the user from the hash table and tear down the sysfs objects.
      
      free_uid is called from RCU and initializes/schedules delayed work if
      the usage count of the user_struct is 0. The init/schedule happens
      outside of the uidhash_lock protected region which allows a concurrent
      caller of find_user() to reference the about to be destroyed
      user_struct w/o preventing the work from being scheduled. If the next
      free_uid call happens before the work timer expired then the active
      timer is initialized and the work scheduled again.
      
      The race was introduced in commit 5cb350ba (sched: group scheduling,
      sysfs tunables) and made more prominent by commit 3959214f (sched:
      delayed cleanup of user_struct)
      
      Move the init/schedule_delayed_work inside of the uidhash_lock
      protected region to prevent the race.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarDhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: stable@kernel.org
      b00bc0b2
  8. 28 Oct, 2009 1 commit
    • Thomas Gleixner's avatar
      futex: Fix spurious wakeup for requeue_pi really · 11df6ddd
      Thomas Gleixner authored
      The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
      NULL test) nor does it use the wake_list of futex_wake() which where
      the reason for commit 41890f2 (futex: Handle spurious wake up)
      
      See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>
      
      The changes in this fix to the wait_requeue_pi path were considered to
      be a likely unecessary, but harmless safety net. But it turns out that
      due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
      as EAGAIN we built an endless loop in the code path which returns
      correctly EWOULDBLOCK.
      
      Spurious wakeups in wait_requeue_pi code path are unlikely so we do
      the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
      it deal with the spurious wakeup.
      
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      LKML-Reference: <4AE23C74.1090502@us.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      11df6ddd
  9. 27 Oct, 2009 1 commit
    • Paul E. McKenney's avatar
      rcu: Fix TINY_RCU #elif condition · 2c28e245
      Paul E. McKenney authored
      Some compilers are happy with "#elif CONFIG_RCU_TINY", while
      others strongly prefer "#elif defined(CONFIG_RCU_TINY)".  Change
      to the latter to make more compilers happy.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12565906642768-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2c28e245
  10. 26 Oct, 2009 7 commits
    • Peter Zijlstra's avatar
      rcu: Simplify creating of lockdep class for root rcu_node · 88b91c7c
      Peter Zijlstra authored
      Use lockdep_set_class() to simplify the code and to avoid any
      additional overhead in the !LOCKDEP case.  Also move the
      definition of rcu_root_class into kernel/rcutree.c, as suggested
      by Lai Jiangshan.
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1256577871443-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      88b91c7c
    • Ingo Molnar's avatar
      rcu: Do tiny cleanups in rcutiny · 4ce5b903
      Ingo Molnar authored
      No change in functionality - just straighten out a few small
      stylistic details.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226351355-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4ce5b903
    • Paul E. McKenney's avatar
      rcu: Improve rcutorture diagnostics when bad torture_type specified · cf886c44
      Paul E. McKenney authored
      Make rcutorture list the available torture_type values when it
      doesn't like the one specified.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226351868-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cf886c44
    • Paul E. McKenney's avatar
      rcu: Add synchronize_srcu_expedited() to the documentation · 64179861
      Paul E. McKenney authored
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226354176-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      64179861
    • Paul E. McKenney's avatar
      rcu: Add synchronize_srcu_expedited() to the rcutorture test suite · 804bb837
      Paul E. McKenney authored
      Adds the "srcu_expedited" torture type, and also renames
      sched_ops_sync to sched_sync_ops for consistency while we are in
      this file.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226353636-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      804bb837
    • Paul E. McKenney's avatar
      rcu: Add synchronize_srcu_expedited() · 0cd397d3
      Paul E. McKenney authored
      This patch creates a synchronize_srcu_expedited() that uses
      synchronize_sched_expedited() where synchronize_srcu()
      uses synchronize_sched().  The synchronize_srcu() and
      synchronize_srcu_expedited() functions become one-liners that
      pass synchronize_sched() or synchronize_sched_expedited(),
      repectively, to a new __synchronize_srcu() function.
      
      While in the file, move the EXPORT_SYMBOL_GPL()s to immediately
      follow the corresponding functions.
      Requested-by: default avatarAvi Kivity <avi@redhat.com>
      Tested-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: avi@redhat.com
      LKML-Reference: <12565226354038-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0cd397d3
    • Paul E. McKenney's avatar
      rcu: "Tiny RCU", The Bloatwatch Edition · 9b1d82fa
      Paul E. McKenney authored
      This patch is a version of RCU designed for !SMP provided for a
      small-footprint RCU implementation.  In particular, the
      implementation of synchronize_rcu() is extremely lightweight and
      high performance. It passes rcutorture testing in each of the
      four relevant configurations (combinations of NO_HZ and PREEMPT)
      on x86.  This saves about 1K bytes compared to old Classic RCU
      (which is no longer in mainline), and more than three kilobytes
      compared to Hierarchical RCU (updated to 2.6.30):
      
      	CONFIG_TREE_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	    183       4       0     187     kernel/rcupdate.o
      	   2783     520      36    3339     kernel/rcutree.o
      				   3526 Total (vs 4565 for v7)
      
      	CONFIG_TREE_PREEMPT_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	    263       4       0     267     kernel/rcupdate.o
      	   4594     776      52    5422     kernel/rcutree.o
      	   			   5689 Total (6155 for v7)
      
      	CONFIG_TINY_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	     96       4       0     100     kernel/rcupdate.o
      	    734      24       0     758     kernel/rcutiny.o
      	    			    858 Total (vs 848 for v7)
      
      The above is for x86.  Your mileage may vary on other platforms.
      Further compression is possible, but is being procrastinated.
      
      Changes from v7 (http://lkml.org/lkml/2009/10/9/388)
      
      o	Apply Lai Jiangshan's review comments (aside from
      might_sleep() 	in synchronize_sched(), which is covered by SMP builds).
      
      o	Fix up expedited primitives.
      
      Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
      
      o	Forward ported to put it into the 2.6.33 stream.
      
      o	Added lockdep support.
      
      o	Make lightweight rcu_barrier.
      
      Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
      
      o	Ported to latest pre-2.6.32 merge window kernel.
      
      	- Renamed rcu_qsctr_inc() to rcu_sched_qs().
      	- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
      	- Provided trivial rcu_cpu_notify().
      	- Provided trivial exit_rcu().
      	- Provided trivial rcu_needs_cpu().
      	- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
      
      o	Removed the dependence on EMBEDDED, with a view to making
      	TINY_RCU default for !SMP at some time in the future.
      
      o	Added (trivial) support for expedited grace periods.
      
      Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
      
      o	Squeeze the size down a bit further by removing the
      	->completed field from struct rcu_ctrlblk.
      
      o	This permits synchronize_rcu() to become the empty function.
      	Previous concerns about rcutorture were unfounded, as
      	rcutorture correctly handles a constant value from
      	rcu_batches_completed() and rcu_batches_completed_bh().
      
      Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
      
      o	Changed rcu_batches_completed(), rcu_batches_completed_bh()
      	rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
      	rcu_nmi_exit(), to be static inlines, as suggested by David
      	Howells.  Doing this saves about 100 bytes from rcutiny.o.
      	(The numbers between v3 and this v4 of the patch are not directly
      	comparable, since they are against different versions of Linux.)
      
      Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
      
      o	Fix whitespace issues.
      
      o	Change short-circuit "||" operator to instead be "+" in order
      to 	fix performance bug noted by "kraai" on LWN.
      
      		(http://lwn.net/Articles/324348/)
      
      Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
      
      o	This version depends on EMBEDDED as well as !SMP, as suggested
      	by Ingo.
      
      o	Updated rcu_needs_cpu() to unconditionally return zero,
      	permitting the CPU to enter dynticks-idle mode at any time.
      	This works because callbacks can be invoked upon entry to
      	dynticks-idle mode.
      
      o	Paul is now OK with this being included, based on a poll at
      the 	Kernel Miniconf at linux.conf.au, where about ten people said
      	that they cared about saving 900 bytes on single-CPU systems.
      
      o	Applies to both mainline and tip/core/rcu.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226351355-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9b1d82fa
  11. 16 Oct, 2009 1 commit
    • Darren Hart's avatar
      futex: Move drop_futex_key_refs out of spinlock'ed region · 89061d3d
      Darren Hart authored
      When requeuing tasks from one futex to another, the reference held
      by the requeued task to the original futex location needs to be
      dropped eventually.
      
      Dropping the reference may ultimately lead to a call to
      "iput_final" and subsequently call into filesystem- specific code -
      which may be non-atomic.
      
      It is therefore safer to defer this drop operation until after the
      futex_hash_bucket spinlock has been dropped.
      
      Originally-From: Helge Bahmann <hcb@chaoticmind.net>
      Signed-off-by: default avatarDarren Hart <dvhltc@us.ibm.com>
      Cc: <stable@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <4AD7A298.5040802@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      89061d3d
  12. 15 Oct, 2009 6 commits
    • Paul E. McKenney's avatar
      rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang · 237c80c5
      Paul E. McKenney authored
      If the following sequence of events occurs, then
      TREE_PREEMPT_RCU will hang waiting for a grace period to
      complete, eventually OOMing the system:
      
      o	A TREE_PREEMPT_RCU build of the kernel is booted on a system
      	with more than 64 physical CPUs present (32 on a 32-bit system).
      	Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
      	with RCU_FANOUT set to a sufficiently small value that the
      	physical CPUs populate two or more leaf rcu_node structures.
      
      o	A task is preempted in an RCU read-side critical section
      	while running on a CPU corresponding to a given leaf rcu_node
      	structure.
      
      o	All CPUs corresponding to this same leaf rcu_node structure
      	record quiescent states for the current grace period.
      
      o	All of these same CPUs go offline (hence the need for enough
      	physical CPUs to populate more than one leaf rcu_node structure).
      	This causes the preempted task to be moved to the root rcu_node
      	structure.
      
      At this point, there is nothing left to cause the quiescent
      state to be propagated up the rcu_node tree, so the current
      grace period never completes.
      
      The simplest fix, especially after considering the deadlock
      possibilities, is to detect this situation when the last CPU is
      offlined, and to set that CPU's ->qsmask bit in its leaf
      rcu_node structure.  This will cause the next invocation of
      force_quiescent_state() to end the grace period.
      
      Without this fix, this hang can be triggered in an hour or so on
      some machines with rcutorture and random CPU onlining/offlining.
      With this fix, these same machines pass a full 10 hours of this
      sort of abuse.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      237c80c5
    • Paul E. McKenney's avatar
      rcu: Update trace.txt documentation for blocked-tasks lists · 0edf1a68
      Paul E. McKenney authored
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592804-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0edf1a68
    • Paul E. McKenney's avatar
      rcu: Update trace.txt documentation to reflect recent changes · bd58b430
      Paul E. McKenney authored
      o	Remove the CONFIG_PREEMPT_RCU documentation since this
      	config option has now been removed.
      
      o	Change the now-incorrect references to "rcu" labels to
      	instead be "rcu_sched".
      
      o	Add notes stating that CONFIG_TREE_PREEMPT_RCU kernels will
      	have additional "rcu_preempt" output.
      
      o	Note the new "oqlen" field in the rcuhier output (for
      	RCU callbacks orphaned by an offlined CPU).
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <1255540559799-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bd58b430
    • Paul E. McKenney's avatar
      rcu: Add rnp->blocked_tasks to tracing · 3397e040
      Paul E. McKenney authored
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      Cc: Josh Triplett <josh@joshtriplett.org>
      LKML-Reference: <20091014233638.GE6763@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
       kernel/rcutree_trace.c |    8 ++++++--
       1 file changed, 6 insertions(+), 2 deletions(-)
      3397e040
    • Paul E. McKenney's avatar
      rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU · 019129d5
      Paul E. McKenney authored
      For the short term, map synchronize_rcu_expedited() to
      synchronize_rcu() for TREE_PREEMPT_RCU and to
      synchronize_sched_expedited() for TREE_RCU.
      
      Longer term, there needs to be a real expedited grace period for
      TREE_PREEMPT_RCU, but candidate patches to date are considerably
      more complex and intrusive.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592331-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      019129d5
    • Paul E. McKenney's avatar
      rcu: Prevent RCU IPI storms in presence of high call_rcu() load · 37c72e56
      Paul E. McKenney authored
      As the number of callbacks on a given CPU rises, invoke
      force_quiescent_state() only every blimit number of callbacks
      (defaults to 10,000), and even then only if no other CPU has
      invoked force_quiescent_state() in the meantime.
      
      This should fix the performance regression reported by Nick.
      Reported-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592133-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      37c72e56
  13. 14 Oct, 2009 1 commit
    • Darren Hart's avatar
      futex: Check for NULL keys in match_futex · 2bc87203
      Darren Hart authored
      If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
      it will find the futex_q->requeue_pi_key to be NULL and OOPS.
      
      Check for NULL in match_futex() instead of doing explicit NULL pointer
      checks on all call sites.  While match_futex(NULL, NULL) returning
      false is a little odd, it's still correct as we expect valid key
      references.
      Signed-off-by: default avatarDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Dinakar Guniguntala <dino@in.ibm.com>
      CC: John Stultz <johnstul@us.ibm.com>
      Cc: stable@kernel.org
      LKML-Reference: <4AD60687.10306@us.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      2bc87203