]> git.itanic.dy.fi Git - linux-stable/commit
net/mlx5: Drain fw_reset when removing device
authorShay Drory <shayd@nvidia.com>
Mon, 4 Apr 2022 07:47:36 +0000 (10:47 +0300)
committerSaeed Mahameed <saeedm@nvidia.com>
Wed, 18 May 2022 06:03:57 +0000 (23:03 -0700)
commit16d42d313350946f4b9a8b74a13c99f0461a6572
tree92f4be6fb52ef0fae981e152a47a2b3a8275b8f7
parent04c551bad3713661ac85902163b3a567864c9a7c
net/mlx5: Drain fw_reset when removing device

In case fw sync reset is called in parallel to device removal, device
might stuck in the following deadlock:
         CPU 0                        CPU 1
         -----                        -----
                                  remove_one
                                   uninit_one (locks intf_state_mutex)
mlx5_sync_reset_now_event()
work in fw_reset->wq.
 mlx5_enter_error_state()
  mutex_lock (intf_state_mutex)
                                   cleanup_once
                                    fw_reset_cleanup()
                                     destroy_workqueue(fw_reset->wq)

Drain the fw_reset WQ, and make sure no new work is being queued, before
entering uninit_one().
The Drain is done before devlink_unregister() since fw_reset, in some
flows, is using devlink API devlink_remote_reload_actions_performed().

Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
drivers/net/ethernet/mellanox/mlx5/core/main.c