• Karl Heiss's avatar
    sctp: Prevent soft lockup when sctp_accept() is called during a timeout event · 635682a1
    Karl Heiss authored
    A case can occur when sctp_accept() is called by the user during
    a heartbeat timeout event after the 4-way handshake.  Since
    sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
    bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
    the listening socket but released with the new association socket.
    The result is a deadlock on any future attempts to take the listening
    socket lock.
    
    Note that this race can occur with other SCTP timeouts that take
    the bh_lock_sock() in the event sctp_accept() is called.
    
     BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
     ...
     RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
     RSP: 0018:ffff880028323b20  EFLAGS: 00000206
     RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
     RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
     RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
     R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
     R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
     FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
     CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
     Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
     Stack:
     ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
     <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
     <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
     Call Trace:
     <IRQ>
     [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
     [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
     [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
     [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
     [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
     [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
     [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
     [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
     [<ffffffff81497255>] ? ip_rcv+0x275/0x350
     [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
     ...
    
    With lockdep debugging:
    
     =====================================
     [ BUG: bad unlock balance detected! ]
     -------------------------------------
     CslRx/12087 is trying to release lock (slock-AF_INET) at:
     [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
     but there are no more locks to release!
    
     other info that might help us debug this:
     2 locks held by CslRx/12087:
     #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
     #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
    
    Ensure the socket taken is also the same one that is released by
    saving a copy of the socket before entering the timeout event
    critical section.
    Signed-off-by: default avatarKarl Heiss <kheiss@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    635682a1
sm_sideeffect.c 48.8 KB