• Jiri Pirko's avatar
    devlink: change per-devlink netdev notifier to static one · e93c9378
    Jiri Pirko authored
    The commit 565b4824 ("devlink: change port event netdev notifier
    from per-net to global") changed original per-net notifier to be
    per-devlink instance. That fixed the issue of non-receiving events
    of netdev uninit if that moved to a different namespace.
    That worked fine in -net tree.
    
    However, later on when commit ee75f1fc ("net/mlx5e: Create
    separate devlink instance for ethernet auxiliary device") and
    commit 72ed5d56 ("net/mlx5: Suspend auxiliary devices only in
    case of PCI device suspend") were merged, a deadlock was introduced
    when removing a namespace with devlink instance with another nested
    instance.
    
    Here there is the bad flow example resulting in deadlock with mlx5:
    net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
    devlink_pernet_pre_exit() -> devlink_reload() ->
    mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
    mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
    mlx5e_destroy_devlink() -> devlink_free() ->
    unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
    
    Steps to reproduce:
    $ modprobe mlx5_core
    $ ip netns add ns1
    $ devlink dev reload pci/0000:08:00.0 netns ns1
    $ ip netns del ns1
    
    Resolve this by converting the notifier from per-devlink instance to
    a static one registered during init phase and leaving it registered
    forever. Use this notifier for all devlink port instances created
    later on.
    
    Note what a tree needs this fix only in case all of the cited fixes
    commits are present.
    Reported-by: default avatarMoshe Shemesh <moshe@nvidia.com>
    Fixes: 565b4824 ("devlink: change port event netdev notifier from per-net to global")
    Fixes: ee75f1fc ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
    Fixes: 72ed5d56 ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
    Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
    Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230510144621.932017-1-jiri@resnulli.usSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    e93c9378
leftover.c 250 KB