Commit 6377ed0b authored by Moshe Shemesh's avatar Moshe Shemesh Committed by Saeed Mahameed

net/mlx5: Fix health work queue spin lock to IRQ safe

spin_lock/unlock of health->wq_lock should be IRQ safe.
It was changed to spin_lock_irqsave since adding commit 0179720d
("net/mlx5: Introduce trigger_health_work function") which uses
spin_lock from asynchronous event (IRQ) context.
Thus, all spin_lock/unlock of health->wq_lock should have been moved
to IRQ safe mode.
However, one occurrence on new code using this lock missed that
change, resulting in possible deadlock:
  kernel: Possible unsafe locking scenario:
  kernel:       CPU0
  kernel:       ----
  kernel:  lock(&(&health->wq_lock)->rlock);
  kernel:  <Interrupt>
  kernel:    lock(&(&health->wq_lock)->rlock);
  kernel: #012 *** DEADLOCK ***

Fixes: 2a0165a0 ("net/mlx5: Cancel delayed recovery work when unloading the driver")
Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
parent 5c25f65f
...@@ -356,10 +356,11 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev) ...@@ -356,10 +356,11 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev)
void mlx5_drain_health_recovery(struct mlx5_core_dev *dev) void mlx5_drain_health_recovery(struct mlx5_core_dev *dev)
{ {
struct mlx5_core_health *health = &dev->priv.health; struct mlx5_core_health *health = &dev->priv.health;
unsigned long flags;
spin_lock(&health->wq_lock); spin_lock_irqsave(&health->wq_lock, flags);
set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags); set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
spin_unlock(&health->wq_lock); spin_unlock_irqrestore(&health->wq_lock, flags);
cancel_delayed_work_sync(&dev->priv.health.recover_work); cancel_delayed_work_sync(&dev->priv.health.recover_work);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment