1. 19 Nov, 2020 3 commits
    • Daniel Jordan's avatar
      cpuset: fix race between hotplug work and later CPU offline · 406100f3
      Daniel Jordan authored
      One of our machines keeled over trying to rebuild the scheduler domains.
      Mainline produces the same splat:
      
        BUG: unable to handle page fault for address: 0000607f820054db
        CPU: 2 PID: 149 Comm: kworker/1:1 Not tainted 5.10.0-rc1-master+ #6
        Workqueue: events cpuset_hotplug_workfn
        RIP: build_sched_domains
        Call Trace:
         partition_sched_domains_locked
         rebuild_sched_domains_locked
         cpuset_hotplug_workfn
      
      It happens with cgroup2 and exclusive cpusets only.  This reproducer
      triggers it on an 8-cpu vm and works most effectively with no
      preexisting child cgroups:
      
        cd $UNIFIED_ROOT
        mkdir cg1
        echo 4-7 > cg1/cpuset.cpus
        echo root > cg1/cpuset.cpus.partition
      
        # with smt/control reading 'on',
        echo off > /sys/devices/system/cpu/smt/control
      
      RIP maps to
      
        sd->shared = *per_cpu_ptr(sdd->sds, sd_id);
      
      from sd_init().  sd_id is calculated earlier in the same function:
      
        cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
        sd_id = cpumask_first(sched_domain_span(sd));
      
      tl->mask(cpu), which reads cpu_sibling_map on x86, returns an empty mask
      and so cpumask_first() returns >= nr_cpu_ids, which leads to the bogus
      value from per_cpu_ptr() above.
      
      The problem is a race between cpuset_hotplug_workfn() and a later
      offline of CPU N.  cpuset_hotplug_workfn() updates the effective masks
      when N is still online, the offline clears N from cpu_sibling_map, and
      then the worker uses the stale effective masks that still have N to
      generate the scheduling domains, leading the worker to read
      N's empty cpu_sibling_map in sd_init().
      
      rebuild_sched_domains_locked() prevented the race during the cgroup2
      cpuset series up until the Fixes commit changed its check.  Make the
      check more robust so that it can detect an offline CPU in any exclusive
      cpuset's effective mask, not just the top one.
      
      Fixes: 0ccea8fe ("cpuset: Make generate_sched_domains() work with partition")
      Signed-off-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20201112171711.639541-1-daniel.m.jordan@oracle.com
      406100f3
    • Peter Zijlstra's avatar
      sched: Fix migration_cpu_stop() WARN · 1293771e
      Peter Zijlstra authored
      Oleksandr reported hitting the WARN in the 'task_rq(p) != rq' branch
      of migration_cpu_stop(). Valentin noted that using cpu_of(rq) in that
      case is just plain wrong to begin with, since per the earlier branch
      that isn't the actual CPU of the task.
      
      Replace both instances of is_cpu_allowed() by a direct p->cpus_mask
      test using task_cpu().
      Reported-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Debugged-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      1293771e
    • Valentin Schneider's avatar
      sched/core: Add missing completion for affine_move_task() waiters · d707faa6
      Valentin Schneider authored
      Qian reported that some fuzzer issuing sched_setaffinity() ends up stuck on
      a wait_for_completion(). The problematic pattern seems to be:
      
        affine_move_task()
            // task_running() case
            stop_one_cpu();
            wait_for_completion(&pending->done);
      
      Combined with, on the stopper side:
      
        migration_cpu_stop()
          // Task moved between unlocks and scheduling the stopper
          task_rq(p) != rq &&
          // task_running() case
          dest_cpu >= 0
      
          => no complete_all()
      
      This can happen with both PREEMPT and !PREEMPT, although !PREEMPT should
      be more likely to see this given the targeted task has a much bigger window
      to block and be woken up elsewhere before the stopper runs.
      
      Make migration_cpu_stop() always look at pending affinity requests; signal
      their completion if the stopper hits a rq mismatch but the task is
      still within its allowed mask. When Migrate-Disable isn't involved, this
      matches the previous set_cpus_allowed_ptr() vs migration_cpu_stop()
      behaviour.
      
      Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
      Reported-by: default avatarQian Cai <cai@redhat.com>
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/lkml/8b62fd1ad1b18def27f18e2ee2df3ff5b36d0762.camel@redhat.com
      d707faa6
  2. 10 Nov, 2020 24 commits
  3. 29 Oct, 2020 13 commits