1. 24 Jan, 2024 1 commit
    • Frederic Weisbecker's avatar
      rcu: Defer RCU kthreads wakeup when CPU is dying · e787644c
      Frederic Weisbecker authored
      When the CPU goes idle for the last time during the CPU down hotplug
      process, RCU reports a final quiescent state for the current CPU. If
      this quiescent state propagates up to the top, some tasks may then be
      woken up to complete the grace period: the main grace period kthread
      and/or the expedited main workqueue (or kworker).
      
      If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
      arm the RT bandwith timer to the local offline CPU. Since this happens
      after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
      timer gets ignored. Therefore if the RCU kthreads are waiting for RT
      bandwidth to be available, they may never be actually scheduled.
      
      This triggers TREE03 rcutorture hangs:
      
      	 rcu: INFO: rcu_preempt self-detected stall on CPU
      	 rcu:     4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
      	 rcu:     (t=21035 jiffies g=938281 q=40787 ncpus=6)
      	 rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
      	 rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
      	 rcu: RCU grace-period kthread stack dump:
      	 task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
      	 Call Trace:
      	  <TASK>
      	  __schedule+0x2eb/0xa80
      	  schedule+0x1f/0x90
      	  schedule_timeout+0x163/0x270
      	  ? __pfx_process_timeout+0x10/0x10
      	  rcu_gp_fqs_loop+0x37c/0x5b0
      	  ? __pfx_rcu_gp_kthread+0x10/0x10
      	  rcu_gp_kthread+0x17c/0x200
      	  kthread+0xde/0x110
      	  ? __pfx_kthread+0x10/0x10
      	  ret_from_fork+0x2b/0x40
      	  ? __pfx_kthread+0x10/0x10
      	  ret_from_fork_asm+0x1b/0x30
      	  </TASK>
      
      The situation can't be solved with just unpinning the timer. The hrtimer
      infrastructure and the nohz heuristics involved in finding the best
      remote target for an unpinned timer would then also need to handle
      enqueues from an offline CPU in the most horrendous way.
      
      So fix this on the RCU side instead and defer the wake up to an online
      CPU if it's too late for the local one.
      Reported-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Fixes: 5c0930cc ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
      e787644c
  2. 21 Jan, 2024 39 commits