1. 10 Sep, 2013 5 commits
    • Srivatsa S. Bhat's avatar
      cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock · 1aee40ac
      Srivatsa S. Bhat authored
      __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
      offline. But because we destroy the kobject towards the end of the CPU offline
      phase, there are certain race windows where a task can try to write to a
      cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
      that CPU offline, and this can bump up the kobject refcount, which in turn might
      hinder the CPU offline task from running to completion. (It can also cause
      other more serious problems such as trying to acquire a destroyed timer-mutex
      etc., depending on the exact stage of the cleanup at which the task managed to
      take a new refcount).
      
      To fix the race window, we will need to synchronize those store_*() call-sites
      with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
      in turn can cause a total deadlock because it can end up waiting for the
      CPU offline task to complete, with incremented refcount!
      
      Write to sysfs                            CPU offline task
      --------------                            ----------------
      kobj_refcnt++
      
                                                Acquire cpu_hotplug.lock
      
      get_online_cpus();
      
      					  Wait for kobj_refcnt to drop to zero
      
                           **DEADLOCK**
      
      A simple way to avoid this problem is to perform the kobject cleanup in the
      CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
      perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
      in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
      released. Doing this helps us avoid deadlocks due to holding kobject refcounts
      and waiting on each other on the cpu_hotplug.lock.
      
      (Note: We can't move all of the cpufreq CPU offline steps to the
      CPU_POST_DEAD stage, because certain things such as stopping the governors
      have to be done before the outgoing CPU is marked offline. So retain those
      parts in the CPU_DOWN_PREPARE stage itself).
      Reported-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aee40ac
    • Srivatsa S. Bhat's avatar
      cpufreq: Split __cpufreq_remove_dev() into two parts · cedb70af
      Srivatsa S. Bhat authored
      During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
      to perform work such as stopping the cpufreq governor, clearing the
      CPU from the policy structure etc, and finally cleaning up the
      kobject.
      
      There are certain subtle issues related to the kobject cleanup, and
      it would be much easier to deal with them if we separate that part
      from the rest of the cleanup-work in the CPU offline phase. So split
      the __cpufreq_remove_dev() function into 2 parts: one that handles
      the kobject cleanup, and the other that handles the rest of the work.
      Reported-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cedb70af
    • Andreas Schwab's avatar
      cpufreq: Fix wrong time unit conversion · a857c0b9
      Andreas Schwab authored
      The time spent by a CPU under a given frequency is stored in jiffies unit
      in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of
      the frequency.
      
      This is what is displayed in the following file on the right column:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 19835820
           2300000 3172
           [...]
      
      Now cpufreq converts this jiffies unit delta to clock_t before returning it
      to the user as in the above file. And that conversion is achieved using the API
      cputime64_to_clock_t().
      
      Although it accidentally works on traditional tick based cputime accounting, where
      cputime_t maps directly to jiffies, it doesn't work with other types of cputime
      accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs
      or any granularity preffered by the architecture.
      
      For example we get a buggy zero delta on full dyntick configurations:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 0
           2300000 0
           [...]
      
      Fix this with using the proper jiffies_64_t to clock_t conversion.
      Reported-and-tested-by: default avatarCarsten Emde <C.Emde@osadl.org>
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a857c0b9
    • Viresh Kumar's avatar
      cpufreq: serialize calls to __cpufreq_governor() · 19c76303
      Viresh Kumar authored
      We can't take a big lock around __cpufreq_governor() as this causes
      recursive locking for some cases. But calls to this routine must be
      serialized for every policy. Otherwise we can see some unpredictable
      events.
      
      For example, consider following scenario:
      
      __cpufreq_remove_dev()
       __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
         policy->governor->governor(policy, CPUFREQ_GOV_STOP);
          cpufreq_governor_dbs()
           case CPUFREQ_GOV_STOP:
            mutex_destroy(&cpu_cdbs->timer_mutex)
            cpu_cdbs->cur_policy = NULL;
        <PREEMPT>
      store()
       __cpufreq_set_policy()
        __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
          policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
           case CPUFREQ_GOV_LIMITS:
            mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
             if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      And so store() will eventually result in a crash if cur_policy is
      NULL at this point.
      
      Introduce an additional variable which would guarantee serialization
      here.
      Reported-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      19c76303
    • Viresh Kumar's avatar
      cpufreq: don't allow governor limits to be changed when it is disabled · f73d3933
      Viresh Kumar authored
      __cpufreq_governor() returns with -EBUSY when governor is already
      stopped and we try to stop it again, but when it is stopped we must
      not allow calls to CPUFREQ_GOV_LIMITS event as well.
      
      This patch adds this check in __cpufreq_governor().
      Reported-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f73d3933
  2. 29 Aug, 2013 1 commit
    • Stephen Boyd's avatar
      cpufreq: Don't use smp_processor_id() in preemptible context · 69320783
      Stephen Boyd authored
      Workqueues are preemptible even if works are queued on them with
      queue_work_on(). Let's use raw_smp_processor_id() here to silence
      the warning.
      
      BUG: using smp_processor_id() in preemptible [00000000] code: kworker/3:2/674
      caller is gov_queue_work+0x28/0xb0
      CPU: 0 PID: 674 Comm: kworker/3:2 Tainted: G        W    3.10.0 #30
      Workqueue: events od_dbs_timer
      [<c010c178>] (unwind_backtrace+0x0/0x11c) from [<c0109dec>] (show_stack+0x10/0x14)
      [<c0109dec>] (show_stack+0x10/0x14) from [<c03885a4>] (debug_smp_processor_id+0xbc/0xf0)
      [<c03885a4>] (debug_smp_processor_id+0xbc/0xf0) from [<c0635864>] (gov_queue_work+0x28/0xb0)
      [<c0635864>] (gov_queue_work+0x28/0xb0) from [<c0635618>] (od_dbs_timer+0x108/0x134)
      [<c0635618>] (od_dbs_timer+0x108/0x134) from [<c01aa8f8>] (process_one_work+0x25c/0x444)
      [<c01aa8f8>] (process_one_work+0x25c/0x444) from [<c01aaf88>] (worker_thread+0x200/0x344)
      [<c01aaf88>] (worker_thread+0x200/0x344) from [<c01b03bc>] (kthread+0xa0/0xb0)
      [<c01b03bc>] (kthread+0xa0/0xb0) from [<c01061b8>] (ret_from_fork+0x14/0x3c)
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69320783
  3. 28 Aug, 2013 3 commits
    • Stratos Karafotis's avatar
      cpufreq: governor: Fix typos in comments · c4afc410
      Stratos Karafotis authored
       - 'Governer' should be 'Governor'.
       - 'S' is used for Siemens (electrical conductance) in SI units,
         so use small 's' for seconds.
      Signed-off-by: default avatarStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4afc410
    • Stratos Karafotis's avatar
      cpufreq: governors: Remove duplicate check of target freq in supported range · 934dac1e
      Stratos Karafotis authored
      Function __cpufreq_driver_target() checks if target_freq is within
      policy->min and policy->max range. generic_powersave_bias_target() also
      checks if target_freq is valid via a cpufreq_frequency_table_target()
      call. So, drop the unnecessary duplicate check in *_check_cpu().
      Signed-off-by: default avatarStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      934dac1e
    • Stephen Boyd's avatar
      cpufreq: Fix timer/workqueue corruption due to double queueing · 3617f2ca
      Stephen Boyd authored
      When a CPU is hot removed we'll cancel all the delayed work items
      via gov_cancel_work(). Normally this will just cancels a delayed
      timer on each CPU that the policy is managing and the work won't
      run, but if the work is already running the workqueue code will
      wait for the work to finish before continuing to prevent the
      work items from re-queuing themselves like they normally do. This
      scheme will work most of the time, except for the case where the
      work function determines that it should adjust the delay for all
      other CPUs that the policy is managing. If this scenario occurs,
      the canceling CPU will cancel its own work but queue up the other
      CPUs works to run. For example:
      
       CPU0                                        CPU1
       ----                                        ----
       cpu_down()
        ...
        __cpufreq_remove_dev()
         cpufreq_governor_dbs()
          case CPUFREQ_GOV_STOP:
           gov_cancel_work(dbs_data, policy);
            cpu0 work is canceled
             timer is canceled
             cpu1 work is canceled                    <work runs>
             <waits for cpu1>                         od_dbs_timer()
                                                       gov_queue_work(*, *, true);
       						  cpu0 work queued
       						  cpu1 work queued
      						  cpu2 work queued
      						  ...
             cpu1 work is canceled
             cpu2 work is canceled
             ...
      
      At the end of the GOV_STOP case cpu0 still has a work queued to
      run although the code is expecting all of the works to be
      canceled. __cpufreq_remove_dev() will then proceed to
      re-initialize all the other CPUs works except for the CPU that is
      going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
      will trample over the queued work and debugobjects will spit out
      a warning:
      
      WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
      ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
      Modules linked in:
      CPU: 0 PID: 1491 Comm: sh Tainted: G        W    3.10.0 #19
      [<c010c178>] (unwind_backtrace+0x0/0x11c) from [<c0109dec>] (show_stack+0x10/0x14)
      [<c0109dec>] (show_stack+0x10/0x14) from [<c01904cc>] (warn_slowpath_common+0x4c/0x6c)
      [<c01904cc>] (warn_slowpath_common+0x4c/0x6c) from [<c019056c>] (warn_slowpath_fmt+0x2c/0x3c)
      [<c019056c>] (warn_slowpath_fmt+0x2c/0x3c) from [<c0388a7c>] (debug_print_object+0x94/0xbc)
      [<c0388a7c>] (debug_print_object+0x94/0xbc) from [<c0388e34>] (__debug_object_init+0x2d0/0x340)
      [<c0388e34>] (__debug_object_init+0x2d0/0x340) from [<c019e3b0>] (init_timer_key+0x14/0xb0)
      [<c019e3b0>] (init_timer_key+0x14/0xb0) from [<c0635f78>] (cpufreq_governor_dbs+0x3e8/0x5f8)
      [<c0635f78>] (cpufreq_governor_dbs+0x3e8/0x5f8) from [<c06325a0>] (__cpufreq_governor+0xdc/0x1a4)
      [<c06325a0>] (__cpufreq_governor+0xdc/0x1a4) from [<c0633704>] (__cpufreq_remove_dev.isra.10+0x3b4/0x434)
      [<c0633704>] (__cpufreq_remove_dev.isra.10+0x3b4/0x434) from [<c08989f4>] (cpufreq_cpu_callback+0x60/0x80)
      [<c08989f4>] (cpufreq_cpu_callback+0x60/0x80) from [<c08a43c0>] (notifier_call_chain+0x38/0x68)
      [<c08a43c0>] (notifier_call_chain+0x38/0x68) from [<c01938e0>] (__cpu_notify+0x28/0x40)
      [<c01938e0>] (__cpu_notify+0x28/0x40) from [<c0892ad4>] (_cpu_down+0x7c/0x2c0)
      [<c0892ad4>] (_cpu_down+0x7c/0x2c0) from [<c0892d3c>] (cpu_down+0x24/0x40)
      [<c0892d3c>] (cpu_down+0x24/0x40) from [<c0893ea8>] (store_online+0x2c/0x74)
      [<c0893ea8>] (store_online+0x2c/0x74) from [<c04519d8>] (dev_attr_store+0x18/0x24)
      [<c04519d8>] (dev_attr_store+0x18/0x24) from [<c02a69d4>] (sysfs_write_file+0x100/0x148)
      [<c02a69d4>] (sysfs_write_file+0x100/0x148) from [<c0255c18>] (vfs_write+0xcc/0x174)
      [<c0255c18>] (vfs_write+0xcc/0x174) from [<c0255f70>] (SyS_write+0x38/0x64)
      [<c0255f70>] (SyS_write+0x38/0x64) from [<c0106120>] (ret_fast_syscall+0x0/0x30)
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3617f2ca
  4. 27 Aug, 2013 1 commit
  5. 26 Aug, 2013 1 commit
    • Sascha Hauer's avatar
      cpufreq: imx6q: Fix clock enable balance · fae19b84
      Sascha Hauer authored
      For changing the cpu frequency the i.MX6q has to be switched to some
      intermediate clock during the PLL reprogramming. The driver tries
      to be clever to keep the enable count correct but gets it wrong. If
      the cpufreq is increased it calls clk_disable_unprepare twice
      on pll2_pfd2_396m. This puts all other devices which get their clock
      from pll2_pfd2_396m into a nonworking state.
      
      Fix this by removing the clk enabling/disabling altogether since the
      clk core will do this automatically during a reparent.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      fae19b84
  6. 23 Aug, 2013 1 commit
  7. 22 Aug, 2013 2 commits
    • Rafael J. Wysocki's avatar
      Merge branch 'cpu_of_node' of git://linux-arm.org/linux-skn into pm-cpufreq-next · 09198f8f
      Rafael J. Wysocki authored
      Pull DT/core/cpufreq cpu_ofnode updates for v3.12 from Sudeep KarkadaNagesha.
      
      * 'cpu_of_node' of git://linux-arm.org/linux-skn:
        cpufreq: pmac32-cpufreq: remove device tree parsing for cpu nodes
        cpufreq: pmac64-cpufreq: remove device tree parsing for cpu nodes
        cpufreq: maple-cpufreq: remove device tree parsing for cpu nodes
        cpufreq: arm_big_little: remove device tree parsing for cpu nodes
        cpufreq: kirkwood-cpufreq: remove device tree parsing for cpu nodes
        cpufreq: spear-cpufreq: remove device tree parsing for cpu nodes
        cpufreq: highbank-cpufreq: remove device tree parsing for cpu nodes
        cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
        cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes
        drivers/bus: arm-cci: avoid parsing DT for cpu device nodes
        ARM: mvebu: remove device tree parsing for cpu nodes
        ARM: topology: remove hwid/MPIDR dependency from cpu_capacity
        of/device: add helper to get cpu device node from logical cpu index
        driver/core: cpu: initialize of_node in cpu's device struture
        ARM: DT/kernel: define ARM specific arch_match_cpu_phys_id
        of: move of_get_cpu_node implementation to DT core library
        powerpc: refactor of_get_cpu_node to support other architectures
        openrisc: remove undefined of_get_cpu_node declaration
        microblaze: remove undefined of_get_cpu_node declaration
      09198f8f
    • Rafael J. Wysocki's avatar
      4eb5178c
  8. 21 Aug, 2013 20 commits
  9. 20 Aug, 2013 5 commits
  10. 18 Aug, 2013 1 commit