• Srivatsa S. Bhat's avatar
    cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock · 1aee40ac
    Srivatsa S. Bhat authored
    __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
    offline. But because we destroy the kobject towards the end of the CPU offline
    phase, there are certain race windows where a task can try to write to a
    cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
    that CPU offline, and this can bump up the kobject refcount, which in turn might
    hinder the CPU offline task from running to completion. (It can also cause
    other more serious problems such as trying to acquire a destroyed timer-mutex
    etc., depending on the exact stage of the cleanup at which the task managed to
    take a new refcount).
    
    To fix the race window, we will need to synchronize those store_*() call-sites
    with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
    in turn can cause a total deadlock because it can end up waiting for the
    CPU offline task to complete, with incremented refcount!
    
    Write to sysfs                            CPU offline task
    --------------                            ----------------
    kobj_refcnt++
    
                                              Acquire cpu_hotplug.lock
    
    get_online_cpus();
    
    					  Wait for kobj_refcnt to drop to zero
    
                         **DEADLOCK**
    
    A simple way to avoid this problem is to perform the kobject cleanup in the
    CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
    perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
    in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
    released. Doing this helps us avoid deadlocks due to holding kobject refcounts
    and waiting on each other on the cpu_hotplug.lock.
    
    (Note: We can't move all of the cpufreq CPU offline steps to the
    CPU_POST_DEAD stage, because certain things such as stopping the governors
    have to be done before the outgoing CPU is marked offline. So retain those
    parts in the CPU_DOWN_PREPARE stage itself).
    Reported-by: default avatarStephen Boyd <sboyd@codeaurora.org>
    Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    1aee40ac
cpufreq.c 54.9 KB