Commit a6633e11 authored by Moshe Shemesh's avatar Moshe Shemesh Committed by Saeed Mahameed

net/mlx5: Fix delay in fw fatal report handling due to fw report

When fw fatal error occurs, poll health() first detects and reports on a
fw error. Afterwards, it detects and reports on the fw fatal error
itself.

That can cause a long delay in fw fatal error handling which waits in a
queue for the fw error handling to be finished. The fw error handle will
try asking for fw core dump command while fw in fatal state may not
respond and driver will wait for command timeout.

Changing the flow to detect and handle first fw fatal errors and only if
no fatal error detected look for a fw error to handle.

Fixes: d1bf0e2c ("net/mlx5: Report devlink health on FW issues")
Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
parent 8465df40
...@@ -701,6 +701,16 @@ static void poll_health(struct timer_list *t) ...@@ -701,6 +701,16 @@ static void poll_health(struct timer_list *t)
if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
goto out; goto out;
fatal_error = check_fatal_sensors(dev);
if (fatal_error && !health->fatal_error) {
mlx5_core_err(dev, "Fatal error %u detected\n", fatal_error);
dev->priv.health.fatal_error = fatal_error;
print_health_info(dev);
mlx5_trigger_health_work(dev);
goto out;
}
count = ioread32be(health->health_counter); count = ioread32be(health->health_counter);
if (count == health->prev) if (count == health->prev)
++health->miss_counter; ++health->miss_counter;
...@@ -719,15 +729,6 @@ static void poll_health(struct timer_list *t) ...@@ -719,15 +729,6 @@ static void poll_health(struct timer_list *t)
if (health->synd && health->synd != prev_synd) if (health->synd && health->synd != prev_synd)
queue_work(health->wq, &health->report_work); queue_work(health->wq, &health->report_work);
fatal_error = check_fatal_sensors(dev);
if (fatal_error && !health->fatal_error) {
mlx5_core_err(dev, "Fatal error %u detected\n", fatal_error);
dev->priv.health.fatal_error = fatal_error;
print_health_info(dev);
mlx5_trigger_health_work(dev);
}
out: out:
mod_timer(&health->timer, get_next_poll_jiffies()); mod_timer(&health->timer, get_next_poll_jiffies());
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment