1. 06 May, 2024 2 commits
    • Rafael J. Wysocki's avatar
      Merge branch 'thermal-core' · 9396b2a6
      Rafael J. Wysocki authored
      This includes a major rework of thermal governors and part of the
      thermal core interacting with them as well as some fixes and cleanups
      of the thermal debug code:
      
       - Redesign the thermal governor interface to allow the governors to
         work in a more straightforward way.
      
       - Make thermal governors take the current trip point thresholds into
         account in their computations which allows trip hysteresis to be
         observed more accurately.
      
       - Clean up thermal governors.
      
       - Make the thermal core manage passive polling for thermal zones and
         remove passive polling management from thermal governors.
      
       - Improve the handling of cooling device states and thermal mitigation
         episodes in progress in the thermal debug code.
      
       - Avoid excessive updates of trip point statistics and clean up the
         printing of thermal mitigation episode information.
      
      * thermal-core: (27 commits)
        thermal: core: Move passive polling management to the core
        thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid
        thermal: trip: Add missing empty code line
        thermal/debugfs: Avoid printing zero duration for mitigation events in progress
        thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()
        thermal/debugfs: Create records for cdev states as they get used
        thermal: core: Introduce thermal_governor_trip_crossed()
        thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats
        thermal/debugfs: Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
        thermal/debugfs: Clean up thermal_debug_update_temp()
        thermal/debugfs: Avoid excessive updates of trip point statistics
        thermal: core: Relocate critical and hot trip handling
        thermal: core: Drop the .throttle() governor callback
        thermal: gov_user_space: Use .trip_crossed() instead of .throttle()
        thermal: gov_fair_share: Eliminate unnecessary integer divisions
        thermal: gov_fair_share: Use trip thresholds instead of trip temperatures
        thermal: gov_fair_share: Use .manage() callback instead of .throttle()
        thermal: gov_step_wise: Clean up thermal_zone_trip_update()
        thermal: gov_step_wise: Use trip thresholds instead of trip temperatures
        thermal: gov_step_wise: Use .manage() callback instead of .throttle()
        ...
      9396b2a6
    • Rafael J. Wysocki's avatar
      00211025
  2. 02 May, 2024 1 commit
    • Rafael J. Wysocki's avatar
      Merge tag 'thermal-v6.10-rc1' of... · e1242ff0
      Rafael J. Wysocki authored
      Merge tag 'thermal-v6.10-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux into thermal
      
      Merge updates of thermal drivers for v6.10 from Daniel Lezcano:
      
      "- Add QCM2290 compatible DT bindings for Lmh and fix a null pointer
         dereference in the lmh driver in case the SCM is not present (Konrad
         Dybcio)
      
       - Use the strreplace() function instead of doing it manually in the
         Armada driver (Rasmus Villemoes)
      
       - Convert st,stih407-thermal to DT schema and fix up missing
         properties (Raphael Gallais-Pou)
      
       - Add suspend/resume by restoring the context of the tsens sensor
         (Priyansh Jain)
      
       - Support A1 SoC family Thermal Sensor controller and add the DT
         bindings (Dmitry Rokosov)
      
       - Improve the temperature approximation calculation and consolidate
         the Tj constant into a shared area of the structure instead of
         duplicating it on the Rcar Gen3 (Niklas Söderlund)
      
       - Fix the Mediatek LVTS sensor coefficient for the MT8192 in order to support
         it correctly (Hsin-Te Yuan)
      
       - Fix a null pointer dereference on the tsens driver when the function
         compute_intercept_slope() is called with a null parameter (Aleksandr
         Mishin)
      
       - Remove some unused fields in struct qpnp_tm_chip and k3_bandgap
         (Christophe Jaillet)
      
       - Fixup calibration efuse data decoding, consolidate the code by
         checking boundaries and refactor some part of the LVTS Mediatek
         driver. After setting the scene, add MT8186 and MT8188 along with
         the DT bindings (Nicolas Pitre)
      
       - Add Loongson-2K2000 support after some minor code adjustements and
         providing the DT bindings definition (Binbin Zhou)"
      
      * tag 'thermal-v6.10-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (28 commits)
        thermal/drivers/loongson2: Add Loongson-2K2000 support
        dt-bindings: thermal: loongson,ls2k-thermal: Fix incorrect compatible definition
        dt-bindings: thermal: loongson,ls2k-thermal: Add Loongson-2K0500 compatible
        thermal/drivers/loongson2: Trivial code style adjustment
        thermal/drivers/mediatek/lvts_thermal: Add MT8188 support
        dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8188
        thermal/drivers/mediatek/lvts_thermal: Allow early empty sensor slots
        thermal/drivers/mediatek/lvts_thermal: Provision for gt variable location
        thermal/drivers/mediatek/lvts_thermal: Add MT8186 support
        dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8186
        thermal/drivers/mediatek/lvts_thermal: Guard against efuse data buffer overflow
        thermal/drivers/mediatek/lvts_thermal: Use offsets for every calibration byte
        thermal/drivers/mediatek/lvts_thermal: Remove .hw_tshut_temp
        thermal/drivers/mediatek/lvts_thermal: Move comment
        thermal/drivers/mediatek/lvts_thermal: Retrieve all calibration bytes
        thermal/drivers/k3_bandgap: Remove some unused fields in struct k3_bandgap
        thermal/drivers/qcom: Remove some unused fields in struct qpnp_tm_chip
        thermal/drivers/tsens: Fix null pointer dereference
        thermal/drivers/mediatek/lvts_thermal: Add coeff for mt8192
        thermal/drivers/rcar_gen3: Update temperature approximation calculation
        ...
      e1242ff0
  3. 30 Apr, 2024 3 commits
    • Rafael J. Wysocki's avatar
      thermal: core: Move passive polling management to the core · 042a3d80
      Rafael J. Wysocki authored
      Passive polling is enabled by setting the 'passive' field in
      struct thermal_zone_device to a positive value so long as the
      'passive_delay_jiffies' field is greater than zero.  It causes
      the thermal core to actively check the thermal zone temperature
      periodically which in theory should be done after crossing a
      passive trip point on the way up in order to allow governors to
      react more rapidly to temperature changes and adjust mitigation
      more precisely.
      
      However, the 'passive' field in struct thermal_zone_device is currently
      managed by governors which is quite problematic.  First of all, only
      two governors, Step-Wise and Power Allocator, update that field at
      all, so the other governors do not benefit from passive polling,
      although in principle they should.  Moreover, if the zone governor is
      changed from, say, Step-Wise to Fair-Share after 'passive' has been
      incremented by the former, it is not going to be reset back to zero by
      the latter even if the zone temperature falls down below all passive
      trip points.
      
      For this reason, make handle_thermal_trip() increment 'passive'
      to enable passive polling for the given thermal zone whenever a
      passive trip point is crossed on the way up and decrement it
      whenever a passive trip point is crossed on the way down.  Also
      remove the 'passive' field updates from governors and additionally
      clear it in thermal_zone_device_init() to prevent passive polling
      from being enabled after a system resume just beacuse it was enabled
      before suspending the system.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      042a3d80
    • Rafael J. Wysocki's avatar
      thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid · 202aa0d4
      Rafael J. Wysocki authored
      Make __thermal_zone_device_update() bail out if update_temperature()
      fails to update the zone temperature because __thermal_zone_get_temp()
      has returned an error and the current zone temperature is
      THERMAL_TEMP_INVALID (user space receiving netlink thermal messages,
      thermal debug code and thermal governors may get confused otherwise).
      
      Fixes: 9ad18043 ("thermal: core: Send trip crossing notifications at init time if needed")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      202aa0d4
    • Rafael J. Wysocki's avatar
      thermal: trip: Add missing empty code line · 1502718a
      Rafael J. Wysocki authored
      Add missing empty line of code to thermal_zone_trip_id().
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      1502718a
  4. 26 Apr, 2024 7 commits
    • Rafael J. Wysocki's avatar
      thermal/debugfs: Avoid printing zero duration for mitigation events in progress · bd700ba9
      Rafael J. Wysocki authored
      If a thermal mitigation event is in progress, its duration value has
      not been updated yet, so 0 will be printed as the event duration by
      tze_seq_show() which is confusing.
      
      Avoid doing that by marking the beginning of the event with the
      KTIME_MIN duration value and making tze_seq_show() compute the current
      event duration on the fly, in which case '>' will be printed instead of
      '=' in the event duration value field.
      
      Similarly, for trip points that have been crossed on the down, mark
      the end of mitigation with the KTIME_MAX timestamp value and make
      tze_seq_show() compute the current duration on the fly for the trip
      points still involved in the mitigation, in which cases the duration
      value printed by it will be prepended with a '>' character.
      
      Fixes: 7ef01f22 ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      bd700ba9
    • Rafael J. Wysocki's avatar
      thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add() · 31a0fa00
      Rafael J. Wysocki authored
      If cdev_dt_seq_show() runs before the first state transition of a cooling
      device, it will not print any state residency information for it, even
      though it might be reasonably expected to print residency information for
      the initial state of the cooling device.
      
      For this reason, rearrange the code to get the initial state of a cooling
      device at the registration time and pass it to thermal_debug_cdev_add(),
      so that the latter can create a duration record for that state which will
      allow cdev_dt_seq_show() to print its residency information.
      
      Fixes: 755113d7 ("thermal/debugfs: Add thermal cooling device debugfs information")
      Reported-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      31a0fa00
    • Rafael J. Wysocki's avatar
      thermal/debugfs: Create records for cdev states as they get used · f4ae18fc
      Rafael J. Wysocki authored
      Because thermal_debug_cdev_state_update() only creates a duration record
      for the old state of a cooling device, if its new state is used for the
      first time, there will be no record for it and cdev_dt_seq_show() will
      not print the duration information for it even though it contains code
      to compute the duration value in that case.
      
      Address this by making thermal_debug_cdev_state_update() create a
      duration record for the new state if there is none.
      
      Fixes: 755113d7 ("thermal/debugfs: Add thermal cooling device debugfs information")
      Reported-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      f4ae18fc
    • Rafael J. Wysocki's avatar
      8c882f17
    • Rafael J. Wysocki's avatar
      thermal/debugfs: Prevent use-after-free from occurring after cdev removal · d351eb0a
      Rafael J. Wysocki authored
      Since thermal_debug_cdev_remove() does not run under cdev->lock, it can
      run in parallel with thermal_debug_cdev_state_update() and it may free
      the struct thermal_debugfs object used by the latter after it has been
      checked against NULL.
      
      If that happens, thermal_debug_cdev_state_update() will access memory
      that has been freed already causing the kernel to crash.
      
      Address this by using cdev->lock in thermal_debug_cdev_remove() around
      the cdev->debugfs value check (in case the same cdev is removed at the
      same time in two different threads) and its reset to NULL.
      
      Fixes: 755113d7 ("thermal/debugfs: Add thermal cooling device debugfs information")
      Cc :6.8+ <stable@vger.kernel.org> # 6.8+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      d351eb0a
    • Rafael J. Wysocki's avatar
      thermal/debugfs: Fix two locking issues with thermal zone debug · c7f7c372
      Rafael J. Wysocki authored
      With the current thermal zone locking arrangement in the debugfs code,
      user space can open the "mitigations" file for a thermal zone before
      the zone's debugfs pointer is set which will result in a NULL pointer
      dereference in tze_seq_start().
      
      Moreover, thermal_debug_tz_remove() is not called under the thermal
      zone lock, so it can run in parallel with the other functions accessing
      the thermal zone's struct thermal_debugfs object.  Then, it may clear
      tz->debugfs after one of those functions has checked it and the
      struct thermal_debugfs object may be freed prematurely.
      
      To address the first problem, pass a pointer to the thermal zone's
      struct thermal_debugfs object to debugfs_create_file() in
      thermal_debug_tz_add() and make tze_seq_start(), tze_seq_next(),
      tze_seq_stop(), and tze_seq_show() retrieve it from s->private
      instead of a pointer to the thermal zone object.  This will ensure
      that tz_debugfs will be valid across the "mitigations" file accesses
      until thermal_debugfs_remove_id() called by thermal_debug_tz_remove()
      removes that file.
      
      To address the second problem, use tz->lock in thermal_debug_tz_remove()
      around the tz->debugfs value check (in case the same thermal zone is
      removed at the same time in two different threads) and its reset to NULL.
      
      Fixes: 7ef01f22 ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
      Cc :6.8+ <stable@vger.kernel.org> # 6.8+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      c7f7c372
    • Rafael J. Wysocki's avatar
      thermal/debugfs: Free all thermal zone debug memory on zone removal · 72c1afff
      Rafael J. Wysocki authored
      Because thermal_debug_tz_remove() does not free all memory allocated for
      thermal zone diagnostics, some of that memory becomes unreachable after
      freeing the thermal zone's struct thermal_debugfs object.
      
      Address this by making thermal_debug_tz_remove() free all of the memory
      in question.
      
      Fixes: 7ef01f22 ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
      Cc :6.8+ <stable@vger.kernel.org> # 6.8+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      72c1afff
  5. 24 Apr, 2024 14 commits
  6. 23 Apr, 2024 13 commits