Commit 57726ebe authored by Vadim Pasternak's avatar Vadim Pasternak Committed by Jakub Kicinski

mlxsw: core: Add validation of transceiver temperature thresholds

Validate thresholds to avoid a single failure due to some transceiver
unreliability. Ignore the last readouts in case warning temperature is
above alarm temperature, since it can cause unexpected thermal
shutdown. Stay with the previous values and refresh threshold within
the next iteration.

This is a rare scenario, but it was observed at a customer site.

Fixes: 6a79507c ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent b7741344
...@@ -176,6 +176,12 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core, ...@@ -176,6 +176,12 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
if (err) if (err)
return err; return err;
if (crit_temp > emerg_temp) {
dev_warn(dev, "%s : Critical threshold %d is above emergency threshold %d\n",
tz->tzdev->type, crit_temp, emerg_temp);
return 0;
}
/* According to the system thermal requirements, the thermal zones are /* According to the system thermal requirements, the thermal zones are
* defined with four trip points. The critical and emergency * defined with four trip points. The critical and emergency
* temperature thresholds, provided by QSFP module are set as "active" * temperature thresholds, provided by QSFP module are set as "active"
...@@ -190,11 +196,8 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core, ...@@ -190,11 +196,8 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
tz->trips[MLXSW_THERMAL_TEMP_TRIP_NORM].temp = crit_temp; tz->trips[MLXSW_THERMAL_TEMP_TRIP_NORM].temp = crit_temp;
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HIGH].temp = crit_temp; tz->trips[MLXSW_THERMAL_TEMP_TRIP_HIGH].temp = crit_temp;
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HOT].temp = emerg_temp; tz->trips[MLXSW_THERMAL_TEMP_TRIP_HOT].temp = emerg_temp;
if (emerg_temp > crit_temp)
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp + tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
MLXSW_THERMAL_MODULE_TEMP_SHIFT; MLXSW_THERMAL_MODULE_TEMP_SHIFT;
else
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp;
return 0; return 0;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment