1. 11 May, 2016 7 commits
  2. 09 May, 2016 2 commits
  3. 06 May, 2016 3 commits
  4. 04 May, 2016 8 commits
  5. 02 May, 2016 1 commit
  6. 28 Apr, 2016 10 commits
  7. 27 Apr, 2016 6 commits
    • Srinivas Pandruvada's avatar
      cpufreq: intel_pstate: Enable PPC enforcement for servers · 2b3ec765
      Srinivas Pandruvada authored
      For platforms which are controlled via remove node manager, enable _PPC by
      default. These platforms are mostly categorized as enterprise server or
      performance servers. These platforms needs to go through some
      certifications tests, which tests control via _PPC.
      The relative risk of enabling by default is  low as this is is less likely
      that these systems have broken _PSS table.
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2b3ec765
    • Srinivas Pandruvada's avatar
      cpufreq: intel_pstate: Adjust policy->max · 3be9200d
      Srinivas Pandruvada authored
      When policy->max is changed via _PPC or sysfs and is more than the max non
      turbo frequency, it does not really change resulting performance in some
      processors. When policy->max results in a P-State ratio more than the
      turbo activation ratio, then processor can choose any P-State up to max
      turbo. So the user or _PPC setting has no value, but this can cause
      undesirable side effects like:
      - Showing reduced max percentage in Intel P-State sysfs
      - It can cause reduced max performance under certain boundary conditions:
      The requested max scaling frequency either via _PPC or via cpufreq-sysfs,
      will be converted into a fixed floating point max percent scale. In
      majority of the cases this will result in correct max. But not 100% of the
      time. If the _PPC is requested at a point where the calculation lead to a
      lower max, this can result in a lower P-State then expected and it will
      impact performance.
      Example of this condition using a Broadwell laptop with config TDP.
      
      ACPI _PSS table from a Broadwell laptop
      2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000
      1300000 1100000 1000000 900000 800000 600000 500000
      
      The actual results by disabling config TDP so that we can get what is
      requested on or below 2300000Khz.
      
      scaling_max_freq        Max Requested P-State   Resultant scaling
      max
      ---------------------------------------- ----------------------
      2400000                 18                      2900000 (max
      turbo)
      2300000                 17                      2300000 (max
      physical non turbo)
      2200000                 15                      2100000
      2100000                 15                      2100000
      2000000                 13                      1900000
      1900000                 13                      1900000
      1800000                 12                      1800000
      1700000                 11                      1700000
      1600000                 10                      1600000
      1500000                 f                       1500000
      1400000                 e                       1400000
      1300000                 d                       1300000
      1200000                 c                       1200000
      1100000                 a                       1000000
      1000000                 a                       1000000
      900000                  9                        900000
      800000                  8                        800000
      700000                  7                        700000
      600000                  6                        600000
      500000                  5                        500000
      ------------------------------------------------------------------
      
      Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz)
      in BIOS (not every system will let you adjust this).
      The turbo activation ratio will be set to one less than that, which will
      be 0x0a (So any request above 1000000KHz should result in turbo region
      assuming no thermal limits).
      Here _PPC will request max to 1100000KHz (which basically should still
      result in turbo as this is more than the turbo activation ratio up to
      max allowable turbo frequency), but actual calculation resulted in a max
      ceiling P-State which is 0x0a. So under any load condition, this driver
      will not request turbo P-States. This will be a huge performance hit.
      
      When config TDP feature is ON, if the _PPC points to a frequency above
      turbo activation ratio, the performance can still reach max turbo. In this
      case we don't need to treat this as the reduced frequency in set_policy
      callback.
      
      In this change when config TDP is active (by checking if the physical max
      non turbo ratio is more than the current max non turbo ratio), any request
      above current max non turbo is treated as full performance.
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw : Minor cleanups ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3be9200d
    • Srinivas Pandruvada's avatar
      cpufreq: intel_pstate: Enforce _PPC limits · 9522a2ff
      Srinivas Pandruvada authored
      Use ACPI _PPC notification to limit max P state driver will request.
      ACPI _PPC change notification is sent by BIOS to limit max P state
      in several cases:
      - Reduce impact of platform thermal condition
      - When Config TDP feature is used, a changed _PPC is sent to
      follow TDP change
      - Remote node managers in server want to control platform power
      via baseboard management controller (BMC)
      
      This change registers with ACPI processor performance lib so that
      _PPC changes are notified to cpufreq core, which in turns will
      result in call to .setpolicy() callback. Also the way _PSS
      table identifies a turbo frequency is not compatible to max turbo
      frequency in intel_pstate, so the very first entry in _PSS needs
      to be adjusted.
      
      This feature can be turned on by using kernel parameters:
      intel_pstate=support_acpi_ppc
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw: Minor cleanups ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9522a2ff
    • Akshay Adiga's avatar
      cpufreq: powernv: Ramp-down global pstate slower than local-pstate · eaa2c3ae
      Akshay Adiga authored
      The frequency transition latency from pmin to pmax is observed to be in
      few millisecond granurality. And it usually happens to take a performance
      penalty during sudden frequency rampup requests.
      
      This patch set solves this problem by using an entity called "global
      pstates". The global pstate is a Chip-level entity, so the global entitiy
      (Voltage) is managed across the cores. The local pstate is a Core-level
      entity, so the local entity (frequency) is managed across threads.
      
      This patch brings down global pstate at a slower rate than the local
      pstate. Hence by holding global pstates higher than local pstate makes
      the subsequent rampups faster.
      
      A per policy structure is maintained to keep track of the global and
      local pstate changes. The global pstate is brought down using a parabolic
      equation. The ramp down time to pmin is set to ~5 seconds. To make sure
      that the global pstates are dropped at regular interval , a timer is
      queued for every 2 seconds during ramp-down phase, which eventually brings
      the pstate down to local pstate.
      
      Iozone results show fairly consistent performance boost.
      YCSB on redis shows improved Max latencies in most cases.
      
      Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb
      with different record sizes . The following table shows IOoperations/sec
      with and without patch.
      
      Iozone Results ( in op/sec) ( mean over 3 iterations )
      ---------------------------------------------------------------------
      file size-                      with            without		  %
      recordsize-IOtype               patch           patch		change
      ----------------------------------------------------------------------
      200704-1-SeqWrite               1616532         1615425         0.06
      200704-1-Rewrite                2423195         2303130         5.21
      200704-2-SeqWrite               1628577         1602620         1.61
      200704-2-Rewrite                2428264         2312154         5.02
      200704-4-SeqWrite               1617605         1617182         0.02
      200704-4-Rewrite                2430524         2351238         3.37
      200704-8-SeqWrite               1629478         1600436         1.81
      200704-8-Rewrite                2415308e         2298136         5.09
      200704-16-SeqWrite              1619632         1618250         0.08
      200704-16-Rewrite               2396650         2352591         1.87
      200704-32-SeqWrite              1632544         1598083         2.15
      200704-32-Rewrite               2425119         2329743         4.09
      200704-64-SeqWrite              1617812         1617235         0.03
      200704-64-Rewrite               2402021         2321080         3.48
      200704-128-SeqWrite             1631998         1600256         1.98
      200704-128-Rewrite              2422389         2304954         5.09
      200704-256 SeqWrite             1617065         1616962         0.00
      200704-256-Rewrite              2432539         2301980         5.67
      200704-512-SeqWrite             1632599         1598656         2.12
      200704-512-Rewrite              2429270         2323676         4.54
      200704-1024-SeqWrite            1618758         1616156         0.16
      200704-1024-Rewrite             2431631         2315889         4.99
      401408-1-SeqWrite               1631479         1608132         1.45
      401408-1-Rewrite                2501550         2459409         1.71
      401408-2-SeqWrite               1617095         1626069         -0.55
      401408-2-Rewrite                2507557         2443621         2.61
      401408-4-SeqWrite               1629601         1611869         1.10
      401408-4-Rewrite                2505909         2462098         1.77
      401408-8-SeqWrite               1617110         1626968         -0.60
      401408-8-Rewrite                2512244         2456827         2.25
      401408-16-SeqWrite              1632609         1609603         1.42
      401408-16-Rewrite               2500792         2451405         2.01
      401408-32-SeqWrite              1619294         1628167         -0.54
      401408-32-Rewrite               2510115         2451292         2.39
      401408-64-SeqWrite              1632709         1603746         1.80
      401408-64-Rewrite               2506692         2433186         3.02
      401408-128-SeqWrite             1619284         1627461         -0.50
      401408-128-Rewrite              2518698         2453361         2.66
      401408-256-SeqWrite             1634022         1610681         1.44
      401408-256-Rewrite              2509987         2446328         2.60
      401408-512-SeqWrite             1617524         1628016         -0.64
      401408-512-Rewrite              2504409         2442899         2.51
      401408-1024-SeqWrite            1629812         1611566         1.13
      401408-1024-Rewrite             2507620          2442968        2.64
      
      Tested with YCSB workload (50% update + 50% read) over redis for 1 million
      records and 1 million operation. Each test was carried out with target
      operations per second and persistence disabled.
      
      Max-latency (in us)( mean over 5 iterations )
      ---------------------------------------------------------------
      op/s    Operation       with patch      without patch   %change
      ---------------------------------------------------------------
      15000   Read            61480.6         50261.4         22.32
      15000   cleanup         215.2           293.6           -26.70
      15000   update          25666.2         25163.8         2.00
      
      25000   Read            32626.2         89525.4         -63.56
      25000   cleanup         292.2           263.0           11.10
      25000   update          32293.4         90255.0         -64.22
      
      35000   Read            34783.0         33119.0         5.02
      35000   cleanup         321.2           395.8           -18.8
      35000   update          36047.0         38747.8         -6.97
      
      40000   Read            38562.2         42357.4         -8.96
      40000   cleanup         371.8           384.6           -3.33
      40000   update          27861.4         41547.8         -32.94
      
      45000   Read            42271.0         88120.6         -52.03
      45000   cleanup         263.6           383.0           -31.17
      45000   update          29755.8         81359.0         -63.43
      
      (test without target op/s)
      47659   Read            83061.4         136440.6        -39.12
      47659   cleanup         195.8           193.8           1.03
      47659   update          73429.4         124971.8        -41.24
      Signed-off-by: default avatarAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eaa2c3ae
    • Shilpasri G Bhat's avatar
      cpufreq: powernv: Remove flag use-case of policy->driver_data · 2920e9ce
      Shilpasri G Bhat authored
      commit 1b028984 ("cpufreq: powernv: Add sysfs attributes to show
      throttle stats") used policy->driver_data as a flag for one-time creation
      of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to
      check if the attribute already exists. This is required as
      policy->driver_data is used for other purposes in the later patch.
      Signed-off-by: default avatarShilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2920e9ce
    • Javier Martinez Canillas's avatar
      cpufreq: e_powersaver: Use IS_ENABLED() instead of checking for built-in or module · 6de0dc4b
      Javier Martinez Canillas authored
      The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
      built-in or as a module, use that macro instead of open coding the same.
      Signed-off-by: default avatarJavier Martinez Canillas <javier@osg.samsung.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6de0dc4b
  8. 25 Apr, 2016 3 commits
    • Srinivas Pandruvada's avatar
      cpufreq: intel_pstate: Fix processing for turbo activation ratio · 1becf035
      Srinivas Pandruvada authored
      When the config TDP level is not nominal (level = 0), the MSR values for
      reading level 1 and level 2 ratios contain power in low 14 bits and actual
      ratio bits are at bits [23:16]. The current processing for level 1 and
      level 2 is wrong as there is no shift done to get actual ratio.
      
      Fixes: 6a35fc2d (cpufreq: intel_pstate: get P1 from TAR when available)
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1becf035
    • Rafael J. Wysocki's avatar
      cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start() · ba1ca654
      Rafael J. Wysocki authored
      The way cpufreq_governor_start() initializes j_cdbs->prev_load is
      questionable.
      
      First off, j_cdbs->prev_cpu_wall used as a denominator in the
      computation may be zero.  The case this happens is when
      get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
      used to return that number is called exactly at the jiffies_64
      wrap time.  It is rather hard to trigger that error, but it is not
      impossible and it will just crash the kernel then.
      
      Second, j_cdbs->prev_load is computed as the average load during
      the entire time since the system started and it may not reflect the
      load in the previous sampling period (as it is expected to).
      That doesn't play well with the way dbs_update() uses that value.
      Namely, if the update time delta (wall_time) happens do be greater
      than twice the sampling rate on the first invocation of it, the
      initial value of j_cdbs->prev_load (which may be completely off) will
      be returned to the caller as the current load (unless it is equal to
      zero and unless another CPU sharing the same policy object has a
      greater load value).
      
      For this reason, notice that the prev_load field of struct cpu_dbs_info
      is only used by dbs_update() and only in that one place, so if
      cpufreq_governor_start() is modified to always initialize it to 0,
      it will make dbs_update() always compute the actual load first time
      it checks the update time delta against the doubled sampling rate
      (after initialization) and there won't be any side effects of it.
      
      Consequently, modify cpufreq_governor_start() as described.
      
      Fixes: 18b46abd (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      ba1ca654
    • Viresh Kumar's avatar
      cpufreq: hisilicon: Use generic platdev driver · 3920be47
      Viresh Kumar authored
      The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
      device now, reuse that and remove similar code from platform code.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3920be47