rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
It's not entirely obvious why rdp->qlen_last_fqs_check is updated before processing the queue only on offloaded rdp. There can be different effect to that, either in favour of triggering the force quiescent state path or not. For example: 1) If the number of callbacks has decreased since the last rdp->qlen_last_fqs_check update (because we recently called rcu_do_batch() and we executed below qhimark callbacks) and the number of processed callbacks on a subsequent do_batch() arranges for exceeding qhimark on non-offloaded but not on offloaded setup, then we may spare a later run to the force quiescent state slow path on __call_rcu_nocb_wake(), as compared to the non-offloaded counterpart scenario. Here is such an offloaded scenario instance: qhimark = 1000 rdp->last_qlen_last_fqs_check = 3000 rcu_segcblist_n_cbs(rdp) = 2000 rcu_do_batch() { if (offloaded) rdp->last_qlen_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000 // run 1000 callback rcu_segcblist_n_cbs(rdp) = 1000 // Not updating rdp->qlen_last_fqs_check if (count < rdp->qlen_last_fqs_check - qhimark) rdp->qlen_last_fqs_check = count; } call_rcu() * 1001 { __call_rcu_nocb_wake() { // not taking the fqs slowpath: // rcu_segcblist_n_cbs(rdp) == 2001 // rdp->qlen_last_fqs_check == 2000 // qhimark == 1000 if (len > rdp->qlen_last_fqs_check + qhimark) ... } In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check would be 1000 and the fqs slowpath would have executed. 2) If the number of callbacks has increased since the last rdp->qlen_last_fqs_check update (because we recently queued below qhimark callbacks) and the number of callbacks executed in rcu_do_batch() doesn't exceed qhimark for either offloaded or non-offloaded setup, then it's possible that the offloaded scenario later run the force quiescent state slow path on __call_rcu_nocb_wake() while the non-offloaded doesn't. qhimark = 1000 rdp->last_qlen_last_fqs_check = 3000 rcu_segcblist_n_cbs(rdp) = 2000 rcu_do_batch() { if (offloaded) rdp->last_qlen_last_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000 // run 100 callbacks // concurrent queued 100 rcu_segcblist_n_cbs(rdp) = 2000 // Not updating rdp->qlen_last_fqs_check if (count < rdp->qlen_last_fqs_check - qhimark) rdp->qlen_last_fqs_check = count; } call_rcu() * 1001 { __call_rcu_nocb_wake() { // Taking the fqs slowpath: // rcu_segcblist_n_cbs(rdp) == 3001 // rdp->qlen_last_fqs_check == 2000 // qhimark == 1000 if (len > rdp->qlen_last_fqs_check + qhimark) ... } In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check would be 3000 and the fqs slowpath would have executed. The reason for updating rdp->qlen_last_fqs_check when invoking callbacks for offloaded CPUs is that there is usually no point in waking up either the rcuog or rcuoc kthreads while in this state. After all, both threads are prohibited from indefinite sleeps. The exception is when some huge number of callbacks are enqueued while rcu_do_batch() is in the midst of invoking, in which case interrupting the rcuog kthread's timed sleep might get more callbacks set up for the next grace period. Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Original-patch-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Showing
Please register or sign in to comment