Commit 9cfb38a7 authored by Wanpeng Li's avatar Wanpeng Li Committed by Ingo Molnar

sched/fair: Fix sched domains NULL dereference in select_idle_sibling()

Commit:

  10e2f1ac ("sched/core: Rewrite and improve select_idle_siblings()")

... improved select_idle_sibling(), but also triggered a regression (crash)
during CPU-hotplug:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
  IP: [<ffffffffb10cd332>] select_idle_sibling+0x1c2/0x4f0
  Call Trace:
   <IRQ>
    select_task_rq_fair+0x749/0x930
    ? select_task_rq_fair+0xb4/0x930
    ? __lock_is_held+0x54/0x70
    try_to_wake_up+0x19a/0x5b0
    default_wake_function+0x12/0x20
    autoremove_wake_function+0x12/0x40
    __wake_up_common+0x55/0x90
    __wake_up+0x39/0x50
    wake_up_klogd_work_func+0x40/0x60
    irq_work_run_list+0x57/0x80
    irq_work_run+0x2c/0x30
    smp_irq_work_interrupt+0x2e/0x40
    irq_work_interrupt+0x96/0xa0
   <EOI>
    ? _raw_spin_unlock_irqrestore+0x45/0x80
    try_to_wake_up+0x4a/0x5b0
    wake_up_state+0x10/0x20
    __kthread_unpark+0x67/0x70
    kthread_unpark+0x22/0x30
    cpuhp_online_idle+0x3e/0x70
    cpu_startup_entry+0x6a/0x450
    start_secondary+0x154/0x180

This can be reproduced by running the ftrace test case of kselftest, the
test case will hot-unplug the CPU and the CPU will attach to the NULL
sched-domain during scheduler teardown.

The step 2 for the rewrite select_idle_siblings():

  | Step 2) tracks the average cost of the scan and compares this to the
  | average idle time guestimate for the CPU doing the wakeup.

If the CPU which doing the wakeup is the going hot-unplug CPU, then NULL
sched domain will be dereferenced to acquire the average cost of the scan.

This patch fix it by failing the search of an idle CPU in the LLC process
if this sched domain is NULL.
Tested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1475971443-3187-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent be6a2e4c
...@@ -5471,13 +5471,18 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd ...@@ -5471,13 +5471,18 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
*/ */
static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target) static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
{ {
struct sched_domain *this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc)); struct sched_domain *this_sd;
u64 avg_idle = this_rq()->avg_idle; u64 avg_cost, avg_idle = this_rq()->avg_idle;
u64 avg_cost = this_sd->avg_scan_cost;
u64 time, cost; u64 time, cost;
s64 delta; s64 delta;
int cpu, wrap; int cpu, wrap;
this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
if (!this_sd)
return -1;
avg_cost = this_sd->avg_scan_cost;
/* /*
* Due to large variance we need a large fuzz factor; hackbench in * Due to large variance we need a large fuzz factor; hackbench in
* particularly is sensitive here. * particularly is sensitive here.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment