1. 17 Aug, 2017 26 commits
    • Paul E. McKenney's avatar
      Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b',... · 656e7c0c
      Paul E. McKenney authored
      Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD
      
      doc.2017.08.17a: Documentation updates.
      fixes.2017.08.17a: RCU fixes.
      hotplug.2017.07.25b: CPU-hotplug updates.
      misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
      spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
      srcu.2017.07.27c: SRCU updates.
      torture.2017.07.24c: Torture-test updates.
      656e7c0c
    • Paul E. McKenney's avatar
      arch: Remove spin_unlock_wait() arch-specific definitions · 952111d7
      Paul E. McKenney authored
      There is no agreed-upon definition of spin_unlock_wait()'s semantics,
      and it appears that all callers could do just as well with a lock/unlock
      pair.  This commit therefore removes the underlying arch-specific
      arch_spin_unlock_wait() for all architectures providing them.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      952111d7
    • Paul E. McKenney's avatar
      locking: Remove spin_unlock_wait() generic definitions · d3a024ab
      Paul E. McKenney authored
      There is no agreed-upon definition of spin_unlock_wait()'s semantics,
      and it appears that all callers could do just as well with a lock/unlock
      pair.  This commit therefore removes spin_unlock_wait() and related
      definitions from core code.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      d3a024ab
    • Paul E. McKenney's avatar
      drivers/ata: Replace spin_unlock_wait() with lock/unlock pair · a4f08141
      Paul E. McKenney authored
      There is no agreed-upon definition of spin_unlock_wait()'s semantics,
      and it appears that all callers could do just as well with a lock/unlock
      pair.  This commit therefore eliminates the spin_unlock_wait() call and
      associated else-clause and hoists the then-clause's lock and unlock out of
      the "if" statement.  This should be safe from a performance perspective
      because according to Tejun there should be few if any drivers that don't
      set their own error handler.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: <linux-ide@vger.kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      a4f08141
    • Paul E. McKenney's avatar
      ipc: Replace spin_unlock_wait() with lock/unlock pair · e0892e08
      Paul E. McKenney authored
      There is no agreed-upon definition of spin_unlock_wait()'s semantics,
      and it appears that all callers could do just as well with a lock/unlock
      pair.  This commit therefore replaces the spin_unlock_wait() call in
      exit_sem() with spin_lock() followed immediately by spin_unlock().
      This should be safe from a performance perspective because exit_sem()
      is rarely invoked in production.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      e0892e08
    • Paul E. McKenney's avatar
      exit: Replace spin_unlock_wait() with lock/unlock pair · 8083f293
      Paul E. McKenney authored
      There is no agreed-upon definition of spin_unlock_wait()'s semantics, and
      it appears that all callers could do just as well with a lock/unlock pair.
      This commit therefore replaces the spin_unlock_wait() call in do_exit()
      with spin_lock() followed immediately by spin_unlock().  This should be
      safe from a performance perspective because the lock is a per-task lock,
      and this is happening only at task-exit time.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      8083f293
    • Paul E. McKenney's avatar
      completion: Replace spin_unlock_wait() with lock/unlock pair · dec13c42
      Paul E. McKenney authored
      There is no agreed-upon definition of spin_unlock_wait()'s semantics,
      and it appears that all callers could do just as well with a lock/unlock
      pair.  This commit therefore replaces the spin_unlock_wait() call in
      completion_done() with spin_lock() followed immediately by spin_unlock().
      This should be safe from a performance perspective because the lock
      will be held only the wakeup happens really quickly.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      dec13c42
    • Paul E. McKenney's avatar
      doc: Set down RCU's scheduling-clock-interrupt needs · 850bf6d5
      Paul E. McKenney authored
      This commit documents the situations in which RCU needs the
      scheduling-clock interrupt to be enabled, along with the consequences
      of failing to meet RCU's needs in this area.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      850bf6d5
    • Paul E. McKenney's avatar
      doc: No longer allowed to use rcu_dereference on non-pointers · 8a597d63
      Paul E. McKenney authored
      There are too many ways for the compiler to optimize (that is, break)
      dependencies carried via integer values, so it is now permissible to
      carry dependencies only via pointers.  This commit catches up some of
      the documentation on this point.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8a597d63
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
      doc: Update memory-barriers.txt for read-to-write dependencies · 66ce3a4d
      Paul E. McKenney authored
      The memory-barriers.txt document contains an obsolete passage stating that
      smp_read_barrier_depends() is required to force ordering for read-to-write
      dependencies.  We now know that this is not required, even for DEC Alpha.
      This commit therefore updates this passage to state that read-to-write
      dependencies are respected even without smp_read_barrier_depends().
      Reported-by: default avatarLance Roy <ldr709@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Jade Alglave <j.alglave@ucl.ac.uk>
      Cc: Luc Maranget <luc.maranget@inria.fr>
      [ paulmck: Reference control-dependencies sections and use WRITE_ONCE()
        per Will Deacon.  Correctly place split-cache paragraph while there. ]
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      66ce3a4d
    • Paul E. McKenney's avatar
      4de5f89e
    • Mathieu Desnoyers's avatar
      membarrier: Provide expedited private command · 22e4ebb9
      Mathieu Desnoyers authored
      Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
      from all runqueues for which current thread's mm is the same as the
      thread calling sys_membarrier. It executes faster than the non-expedited
      variant (no blocking). It also works on NOHZ_FULL configurations.
      
      Scheduler-wise, it requires a memory barrier before and after context
      switching between processes (which have different mm). The memory
      barrier before context switch is already present. For the barrier after
      context switch:
      
      * Our TSO archs can do RELEASE without being a full barrier. Look at
        x86 spin_unlock() being a regular STORE for example.  But for those
        archs, all atomics imply smp_mb and all of them have atomic ops in
        switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
        barrier.
      
      * From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
        the rest does indeed do smp_mb(), so there the spin_unlock() is a full
        barrier and we're good.
      
      * ARM64 has a very heavy barrier in switch_to(), which suffices.
      
      * PPC just removed its barrier from switch_to(), but appears to be
        talking about adding something to switch_mm(). So add a
        smp_mb__after_unlock_lock() for now, until this is settled on the PPC
        side.
      
      Changes since v3:
      - Properly document the memory barriers provided by each architecture.
      
      Changes since v2:
      - Address comments from Peter Zijlstra,
      - Add smp_mb__after_unlock_lock() after finish_lock_switch() in
        finish_task_switch() to add the memory barrier we need after storing
        to rq->curr. This is much simpler than the previous approach relying
        on atomic_dec_and_test() in mmdrop(), which actually added a memory
        barrier in the common case of switching between userspace processes.
      - Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
        kernel, rather than having the whole membarrier system call returning
        -ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
        Adapt the CMD_QUERY mask accordingly.
      
      Changes since v1:
      - move membarrier code under kernel/sched/ because it uses the
        scheduler runqueue,
      - only add the barrier when we switch from a kernel thread. The case
        where we switch from a user-space thread is already handled by
        the atomic_dec_and_test() in mmdrop().
      - add a comment to mmdrop() documenting the requirement on the implicit
        memory barrier.
      
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Boqun Feng <boqun.feng@gmail.com>
      CC: Andrew Hunter <ahh@google.com>
      CC: Maged Michael <maged.michael@gmail.com>
      CC: gromer@google.com
      CC: Avi Kivity <avi@scylladb.com>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarDave Watson <davejwatson@fb.com>
      22e4ebb9
    • Paul E. McKenney's avatar
      rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() · 16c0b106
      Paul E. McKenney authored
      The rcu_idle_exit() and rcu_idle_enter() functions are exported because
      they were originally used by RCU_NONIDLE(), which was intended to
      be usable from modules.  However, RCU_NONIDLE() now instead uses
      rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not
      exported, and there have been no complaints.
      
      This commit therefore removes the exports from rcu_idle_exit() and
      rcu_idle_enter().
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      16c0b106
    • Paul E. McKenney's avatar
      rcu: Add warning to rcu_idle_enter() for irqs enabled · d4db30af
      Paul E. McKenney authored
      All current callers of rcu_idle_enter() have irqs disabled, and
      rcu_idle_enter() relies on this, but doesn't check.  This commit
      therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust.
      While we are there, pass "true" rather than "1" to rcu_eqs_enter().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d4db30af
    • Peter Zijlstra (Intel)'s avatar
      rcu: Make rcu_idle_enter() rely on callers disabling irqs · 3a607992
      Peter Zijlstra (Intel) authored
      All callers to rcu_idle_enter() have irqs disabled, so there is no
      point in rcu_idle_enter disabling them again.  This commit therefore
      replaces the irq disabling with a RCU_LOCKDEP_WARN().
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3a607992
    • Paul E. McKenney's avatar
      rcu: Add assertions verifying blocked-tasks list · 2dee9404
      Paul E. McKenney authored
      This commit adds assertions verifying the consistency of the rcu_node
      structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and
      ->boost_tasks pointers.  In particular, the ->blkd_tasks lists must be
      empty except for leaf rcu_node structures.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2dee9404
    • Masami Hiramatsu's avatar
      rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() · 35fe723b
      Masami Hiramatsu authored
      Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also
      rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was
      fixed for rcu_eqs_enter_common() by commit 03ecd3f4 ("rcu/tracing:
      Add rcu_disabled to denote when rcu_irq_enter() will not work").
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      35fe723b
    • Paul E. McKenney's avatar
      rcu: Add TPS() protection for _rcu_barrier_trace strings · d8db2e86
      Paul E. McKenney authored
      The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(),
      which needs TPS() protection for strings passed through the second
      argument.  However, it has escaped prior TPS()-ification efforts because
      it _rcu_barrier_trace() does not start with "trace_".  This commit
      therefore adds the needed TPS() protection
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d8db2e86
    • Luis R. Rodriguez's avatar
      rcu: Use idle versions of swait to make idle-hack clear · d5374226
      Luis R. Rodriguez authored
      These RCU waits were set to use interruptible waits to avoid the kthreads
      contributing to system load average, even though they are not interruptible
      as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
      our goal clear, and removes confusion about these paths possibly being
      interruptible -- they are not.
      
      When the system is idle the RCU grace-period kthread will spend all its time
      blocked inside the swait_event_interruptible(). If the interruptible() was
      not used, then this kthread would contribute to the load average. This means
      that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
      rather than the load average of 0 that almost fifty years of UNIX has
      conditioned sysadmins to expect.
      
      The same argument applies to swait_event_interruptible_timeout() use. The
      RCU grace-period kthread spends its time blocked inside this call while
      waiting for grace periods to complete. In particular, if there was only one
      busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
      grace-period kthread would spend almost all its time blocked inside the
      swait_event_interruptible_timeout(). This would mean that the load average
      would be 2 rather than the expected 1 for the single busy CPU.
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Tested-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d5374226
    • Luis R. Rodriguez's avatar
      swait: Add idle variants which don't contribute to load average · 352eee12
      Luis R. Rodriguez authored
      There are cases where folks are using an interruptible swait when
      using kthreads. This is rather confusing given you'd expect
      interruptible waits to be -- interruptible, but kthreads are not
      interruptible ! The reason for such practice though is to avoid
      having these kthreads contribute to the system load average.
      
      When systems are idle some kthreads may spend a lot of time blocking if
      using swait_event_timeout(). This would contribute to the system load
      average. On systems without preemption this would mean the load average
      of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
      this would mean the load average of an idle system is bumped to 3
      instead of 0.
      
      This adds proper API using TASK_IDLE to make such goals explicit and
      avoid confusion.
      Suggested-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Tested-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      352eee12
    • Paul E. McKenney's avatar
      rcu: Add event tracing to ->gp_tasks update at GP start · c5ebe66c
      Paul E. McKenney authored
      There is currently event tracing to track when a task is preempted
      within a preemptible RCU read-side critical section, and also when that
      task subsequently reaches its outermost rcu_read_unlock(), but none
      indicating when a new grace period starts when that grace period must
      wait on pre-existing readers that have been been preempted at least once
      since the beginning of their current RCU read-side critical sections.
      
      This commit therefore adds an event trace at grace-period start in
      the case where there are such readers.  Note that only the first
      reader in the list is traced.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      c5ebe66c
    • Paul E. McKenney's avatar
      rcu: Move rcu.h to new trivial-function style · 7414fac0
      Paul E. McKenney authored
      This commit saves a few lines in kernel/rcu/rcu.h by moving to single-line
      definitions for trivial functions, instead of the old style where the
      two curly braces each get their own line.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7414fac0
    • Paul E. McKenney's avatar
      rcu: Add TPS() to event-traced strings · bedbb648
      Paul E. McKenney authored
      Strings used in event tracing need to be specially handled, for example,
      using the TPS() macro.  Without the TPS() macro, although output looks
      fine from within a running kernel, extracting traces from a crash dump
      produces garbage instead of strings.  This commit therefore adds the TPS()
      macro to some unadorned strings that were passed to event-tracing macros.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      bedbb648
    • Paul E. McKenney's avatar
      rcu: Create reasonable API for do_exit() TASKS_RCU processing · ccdd29ff
      Paul E. McKenney authored
      Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
      This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
      APIs for do_exit() use.  This has the benefit of confining the use of the
      tasks_rcu_exit_srcu variable to one file, allowing it to become static.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ccdd29ff
    • Paul E. McKenney's avatar
      rcu: Drive TASKS_RCU directly off of PREEMPT · 7e42776d
      Paul E. McKenney authored
      The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched
      is used instead.  This commit therefore makes synchronize_rcu_tasks()
      and call_rcu_tasks() available always, but mapped to synchronize_sched()
      and call_rcu_sched(), respectively, when !PREEMPT.  This approach also
      allows some #ifdefs to be removed from rcutorture.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      7e42776d
  2. 11 Aug, 2017 1 commit
  3. 28 Jul, 2017 1 commit
    • Tejun Heo's avatar
      sched: Allow migrating kthreads into online but inactive CPUs · 955dbdf4
      Tejun Heo authored
      Per-cpu workqueues have been tripping CPU affinity sanity checks while
      a CPU is being offlined.  A per-cpu kworker ends up running on a CPU
      which isn't its target CPU while the CPU is online but inactive.
      
      While the scheduler allows kthreads to wake up on an online but
      inactive CPU, it doesn't allow a running kthread to be migrated to
      such a CPU, which leads to an odd situation where setting affinity on
      a sleeping and running kthread leads to different results.
      
      Each mem-reclaim workqueue has one rescuer which guarantees forward
      progress and the rescuer needs to bind itself to the CPU which needs
      help in making forward progress; however, due to the above issue,
      while set_cpus_allowed_ptr() succeeds, the rescuer doesn't end up on
      the correct CPU if the CPU is in the process of going offline,
      tripping the sanity check and executing the work item on the wrong
      CPU.
      
      This patch updates __migrate_task() so that kthreads can be migrated
      into an inactive but online CPU.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatar"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      955dbdf4
  4. 27 Jul, 2017 1 commit
    • Paul E. McKenney's avatar
      srcu: Provide ordering for CPU not involved in grace period · 35732cf9
      Paul E. McKenney authored
      Tree RCU guarantees that every online CPU has a memory barrier between
      any given grace period and any of that CPU's RCU read-side sections that
      must be ordered against that grace period.  Since RCU doesn't always
      know where read-side critical sections are, the actual implementation
      guarantees order against prior and subsequent non-idle non-offline code,
      whether in an RCU read-side critical section or not.  As a result, there
      does not need to be a memory barrier at the end of synchronize_rcu()
      and friends because the ordering internal to the grace period has
      ordered every CPU's post-grace-period execution against each CPU's
      pre-grace-period execution, again for all non-idle online CPUs.
      
      In contrast, SRCU can have non-idle online CPUs that are completely
      uninvolved in a given SRCU grace period, for example, a CPU that
      never runs any SRCU read-side critical sections and took no part in
      the grace-period processing.  It is in theory possible for a given
      synchronize_srcu()'s wakeup to be delivered to a CPU that was completely
      uninvolved in the prior SRCU grace period, which could mean that the
      code following that synchronize_srcu() would end up being unordered with
      respect to both the grace period and any pre-existing SRCU read-side
      critical sections.
      
      This commit therefore adds an smp_mb() to the end of __synchronize_srcu(),
      which prevents this scenario from occurring.
      Reported-by: default avatarLance Roy <ldr709@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarLance Roy <ldr709@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.12.x
      35732cf9
  5. 25 Jul, 2017 11 commits
    • Paul E. McKenney's avatar
      rcu: Move callback-list warning to irq-disable region · 09efeeee
      Paul E. McKenney authored
      After adopting callbacks from a newly offlined CPU, the adopting CPU
      checks to make sure that its callback list's count is zero only if the
      list has no callbacks and vice versa.  Unfortunately, it does so after
      enabling interrupts, which means that false positives are possible due to
      interrupt handlers invoking call_rcu().  Although these false positives
      are improbable, rcutorture did make it happen once.
      
      This commit therefore moves this check to an irq-disabled region of code,
      thus suppressing the false positive.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09efeeee
    • Paul E. McKenney's avatar
      rcu: Remove unused RCU list functions · aed4e046
      Paul E. McKenney authored
      Given changes to callback migration, rcu_cblist_head(),
      rcu_cblist_tail(), rcu_cblist_count_cbs(), rcu_segcblist_segempty(),
      rcu_segcblist_dequeued_lazy(), and rcu_segcblist_new_cbs() are
      no longer used.  This commit therefore removes them.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      aed4e046
    • Paul E. McKenney's avatar
      rcu: Localize rcu_state ->orphan_pend and ->orphan_done · f2dbe4a5
      Paul E. McKenney authored
      Given that the rcu_state structure's >orphan_pend and ->orphan_done
      fields are used only during migration of callbacks from the recently
      offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
      rcu_adopt_orphan_cbs() are combined, these fields can become local
      variables in the combined function.  This commit therefore combines
      rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
      rcu_segcblist_merge() function and removes the ->orphan_pend and
      ->orphan_done fields.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2dbe4a5
    • Paul E. McKenney's avatar
      rcu: Advance callbacks after migration · 21cc2483
      Paul E. McKenney authored
      When migrating callbacks from a newly offlined CPU, we are already
      holding the root rcu_node structure's lock, so it costs almost nothing
      to advance and accelerate the newly migrated callbacks.  This patch
      therefore makes this advancing and acceleration happen.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      21cc2483
    • Paul E. McKenney's avatar
      rcu: Eliminate rcu_state ->orphan_lock · 537b85c8
      Paul E. McKenney authored
      The ->orphan_lock is acquired and released only within the
      rcu_migrate_callbacks() function, which now acquires the root rcu_node
      structure's ->lock.  This commit therefore eliminates the ->orphan_lock
      in favor of the root rcu_node structure's ->lock.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      537b85c8
    • Paul E. McKenney's avatar
      rcu: Advance outgoing CPU's callbacks before migrating them · 9fa46fb8
      Paul E. McKenney authored
      It is possible that the outgoing CPU is unaware of recent grace periods,
      and so it is also possible that some of its pending callbacks are actually
      ready to be invoked.  The current callback-migration code would needlessly
      force these callbacks to pass through another grace period.  This commit
      therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in
      order to give them full credit for having passed through any recent
      grace periods.
      
      This also fixes an odd theoretical bug where there are no callbacks in
      the system except for those on the outgoing CPU, none of those callbacks
      have yet been associated with a grace-period number, there is never again
      another callback registered, and the surviving CPU never again takes a
      scheduling-clock interrupt, never goes idle, and never enters nohz_full
      userspace execution.  Yes, this is (just barely) possible.  It requires
      that the surviving CPU be a nohz_full CPU, that its scheduler-clock
      interrupt be shut off, and that it loop forever in the kernel.  You get
      bonus points if you can make this one happen!  ;-)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9fa46fb8
    • Paul E. McKenney's avatar
      rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU · b1a2d79f
      Paul E. McKenney authored
      RCU's CPU-hotplug callback-migration code first moves the outgoing
      CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
      moves them to the NOCB callback list.  This commit avoids the
      extra step (and simplifies the code) by moving the callbacks directly
      from the outgoing CPU's callback list to the NOCB callback list.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b1a2d79f
    • Paul E. McKenney's avatar
      rcu: Check for NOCB CPUs and empty lists earlier in CB migration · 95335c03
      Paul E. McKenney authored
      The current CPU-hotplug RCU-callback-migration code checks
      for the source (newly offlined) CPU being a NOCBs CPU down in
      rcu_send_cbs_to_orphanage().  This commit simplifies callback migration a
      bit by moving this check up to rcu_migrate_callbacks().  This commit also
      adds a check for the source CPU having no callbacks, which eases analysis
      of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      95335c03
    • Paul E. McKenney's avatar
      rcu: Remove orphan/adopt event-tracing fields · c47e067a
      Paul E. McKenney authored
      The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
      are updated, but never read.  This commit therefore removes them.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c47e067a
    • Paul E. McKenney's avatar
      torture: Fix typo suppressing CPU-hotplug statistics · a2b2df20
      Paul E. McKenney authored
      The torture status line contains a series of values preceded by "onoff:".
      The last value in that line, the one preceding the "HZ=" string, is
      always zero.  The reason that it is always zero is that torture_offline()
      was incrementing the sum_offl pointer instead of the value that this
      pointer referenced.  This commit therefore makes this increment operate
      on the statistic rather than the pointer to the statistic.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a2b2df20
    • Paul E. McKenney's avatar
      rcu: Make expedited GPs correctly handle hardware CPU insertion · 313517fc
      Paul E. McKenney authored
      The update of the ->expmaskinitnext and of ->ncpus are unsynchronized,
      with the value of ->ncpus being incremented long before the corresponding
      ->expmaskinitnext mask is updated.  If an RCU expedited grace period
      sees ->ncpus change, it will update the ->expmaskinit masks from the new
      ->expmaskinitnext masks.  But it is possible that ->ncpus has already
      been updated, but the ->expmaskinitnext masks still have their old values.
      For the current expedited grace period, no harm done.  The CPU could not
      have been online before the grace period started, so there is no need to
      wait for its non-existent pre-existing readers.
      
      But the next RCU expedited grace period is in a world of hurt.  The value
      of ->ncpus has already been updated, so this grace period will assume
      that the ->expmaskinitnext masks have not changed.  But they have, and
      they won't be taken into account until the next never-been-online CPU
      comes online.  This means that RCU will be ignoring some CPUs that it
      should be paying attention to.
      
      The solution is to update ->ncpus and ->expmaskinitnext while holding
      the ->lock for the rcu_node structure containing the ->expmaskinitnext
      mask.  Because smp_store_release() is now used to update ->ncpus and
      smp_load_acquire() is now used to locklessly read it, if the expedited
      grace period sees ->ncpus change, then the updating CPU has to
      already be holding the corresponding ->lock.  Therefore, when the
      expedited grace period later acquires that ->lock, it is guaranteed
      to see the new value of ->expmaskinitnext.
      
      On the other hand, if the expedited grace period loads ->ncpus just
      before an update, earlier full memory barriers guarantee that
      the incoming CPU isn't far enough along to be running any RCU readers.
      
      This commit therefore makes the required change.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      313517fc