• Leon Romanovsky's avatar
    RDMA/mlx5: Release locks during notifier unregister · 23eaf3b5
    Leon Romanovsky authored
    The below kernel panic was observed when created bond mode LACP
    with GRE tunnel on top. The reason to it was not released spinlock
    during mlx5 notify unregsiter sequence.
    
    [  234.562007] BUG: scheduling while atomic: sh/10900/0x00000002
    [  234.563005] Preemption disabled at:
    [  234.566864] ------------[ cut here ]------------
    [  234.567120] DEBUG_LOCKS_WARN_ON(val > preempt_count())
    [  234.567139] WARNING: CPU: 16 PID: 10900 at kernel/sched/core.c:3203 preempt_count_sub+0xca/0x170
    [  234.569550] CPU: 16 PID: 10900 Comm: sh Tainted: G        W 5.2.0-rc1-for-linust-dbg-2019-05-25_04-57-33-60 #1
    [  234.569886] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.6.1 02/12/2018
    [  234.570183] RIP: 0010:preempt_count_sub+0xca/0x170
    [  234.570404] Code: 03 38
    d0 7c 08 84 d2 0f 85 b0 00 00 00 8b 15 dd 02 03 04 85 d2 75 ba 48 c7 c6
    00 e1 88 83 48 c7 c7 40 e1 88 83 e8 76 11 f7 ff <0f> 0b 5b c3 65 8b 05
    d3 1f d8 7e 84 c0 75 82 e8 62 c3 c3 00 85 c0
    [  234.570911] RSP: 0018:ffff888b94477b08 EFLAGS: 00010286
    [  234.571133] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
    [  234.571391] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000246
    [  234.571648] RBP: ffff888ba5560000 R08: fffffbfff08962d5 R09: fffffbfff08962d5
    [  234.571902] R10: 0000000000000001 R11: fffffbfff08962d4 R12: ffff888bac6e9548
    [  234.572157] R13: ffff888babfaf728 R14: ffff888bac6e9568 R15: ffff888babfaf750
    [  234.572412] FS: 00007fcafa59b740(0000) GS:ffff888bed200000(0000) knlGS:0000000000000000
    [  234.572686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  234.572914] CR2: 00007f984f16b140 CR3: 0000000b2bf0a001 CR4: 00000000001606e0
    [  234.573172] Call Trace:
    [  234.573336] _raw_spin_unlock+0x2e/0x50
    [  234.573542] mlx5_ib_unbind_slave_port+0x1bc/0x690 [mlx5_ib]
    [  234.573793] mlx5_ib_cleanup_multiport_master+0x1d3/0x660 [mlx5_ib]
    [  234.574039] mlx5_ib_stage_init_cleanup+0x4c/0x360 [mlx5_ib]
    [  234.574271]  ? kfree+0xf5/0x2f0
    [  234.574465] __mlx5_ib_remove+0x61/0xd0 [mlx5_ib]
    [  234.574688]  ? __mlx5_ib_remove+0xd0/0xd0 [mlx5_ib]
    [  234.574951] mlx5_remove_device+0x234/0x300 [mlx5_core]
    [  234.575224] mlx5_unregister_device+0x4d/0x1e0 [mlx5_core]
    [  234.575493] remove_one+0x4f/0x160 [mlx5_core]
    [  234.575704] pci_device_remove+0xef/0x2a0
    [  234.581407]  ? pcibios_free_irq+0x10/0x10
    [  234.587143]  ? up_read+0xc1/0x260
    [  234.592785] device_release_driver_internal+0x1ab/0x430
    [  234.598442] unbind_store+0x152/0x200
    [  234.604064]  ? sysfs_kf_write+0x3b/0x180
    [  234.609441]  ? sysfs_file_ops+0x160/0x160
    [  234.615021] kernfs_fop_write+0x277/0x440
    [  234.620288]  ? __sb_start_write+0x1ef/0x2c0
    [  234.625512] vfs_write+0x15e/0x460
    [  234.630786] ksys_write+0x156/0x1e0
    [  234.635988]  ? __ia32_sys_read+0xb0/0xb0
    [  234.641120]  ? trace_hardirqs_off_thunk+0x1a/0x1c
    [  234.646163] do_syscall_64+0x95/0x470
    [  234.651106] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [  234.656004] RIP: 0033:0x7fcaf9c9cfd0
    [  234.660686] Code: 73 01
    c3 48 8b 0d c0 6e 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00
    83 3d cd cf 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73
    31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
    [  234.670128] RSP: 002b:00007ffd3b01ddd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    [  234.674811] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fcaf9c9cfd0
    [  234.679387] RDX: 000000000000000d RSI: 00007fcafa5c1000 RDI: 0000000000000001
    [  234.683848] RBP: 00007fcafa5c1000 R08: 000000000000000a R09: 00007fcafa59b740
    [  234.688167] R10: 00007ffd3b01d8e0 R11: 0000000000000246 R12: 00007fcaf9f75400
    [  234.692386] R13: 000000000000000d R14: 0000000000000001 R15: 0000000000000000
    [  234.696495] irq event stamp: 153067
    [  234.700525] hardirqs last enabled at (153067): [<ffffffff83258c39>] _raw_spin_unlock_irqrestore+0x59/0x70
    [  234.704665] hardirqs last disabled at (153066): [<ffffffff83259382>] _raw_spin_lock_irqsave+0x22/0x90
    [  234.708722] softirqs last enabled at (153058): [<ffffffff836006c5>] __do_softirq+0x6c5/0xb4e
    [  234.712673] softirqs last disabled at (153051): [<ffffffff81227c1d>] irq_exit+0x17d/0x1d0
    [  234.716601] ---[ end trace 5dbf096843ee9ce6 ]---
    
    Fixes: df097a27 ("IB/mlx5: Use the new mlx5 core notifier API")
    Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
    Link: https://lore.kernel.org/r/20190731083852.584-1-leon@kernel.orgSigned-off-by: default avatarDoug Ledford <dledford@redhat.com>
    23eaf3b5
main.c 186 KB