• Eric Dumazet's avatar
    net/sched: Fix mirred deadlock on device recursion · 0f022d32
    Eric Dumazet authored
    When the mirred action is used on a classful egress qdisc and a packet is
    mirrored or redirected to self we hit a qdisc lock deadlock.
    See trace below.
    
    [..... other info removed for brevity....]
    [   82.890906]
    [   82.890906] ============================================
    [   82.890906] WARNING: possible recursive locking detected
    [   82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G        W
    [   82.890906] --------------------------------------------
    [   82.890906] ping/418 is trying to acquire lock:
    [   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
    __dev_queue_xmit+0x1778/0x3550
    [   82.890906]
    [   82.890906] but task is already holding lock:
    [   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
    __dev_queue_xmit+0x1778/0x3550
    [   82.890906]
    [   82.890906] other info that might help us debug this:
    [   82.890906]  Possible unsafe locking scenario:
    [   82.890906]
    [   82.890906]        CPU0
    [   82.890906]        ----
    [   82.890906]   lock(&sch->q.lock);
    [   82.890906]   lock(&sch->q.lock);
    [   82.890906]
    [   82.890906]  *** DEADLOCK ***
    [   82.890906]
    [..... other info removed for brevity....]
    
    Example setup (eth0->eth0) to recreate
    tc qdisc add dev eth0 root handle 1: htb default 30
    tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
         action mirred egress redirect dev eth0
    
    Another example(eth0->eth1->eth0) to recreate
    tc qdisc add dev eth0 root handle 1: htb default 30
    tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
         action mirred egress redirect dev eth1
    
    tc qdisc add dev eth1 root handle 1: htb default 30
    tc filter add dev eth1 handle 1: protocol ip prio 2 matchall \
         action mirred egress redirect dev eth0
    
    We fix this by adding an owner field (CPU id) to struct Qdisc set after
    root qdisc is entered. When the softirq enters it a second time, if the
    qdisc owner is the same CPU, the packet is dropped to break the loop.
    Reported-by: default avatarMingshuai Ren <renmingshuai@huawei.com>
    Closes: https://lore.kernel.org/netdev/20240314111713.5979-1-renmingshuai@huawei.com/
    Fixes: 3bcb846c ("net: get rid of spin_trylock() in net_tx_action()")
    Fixes: e578d9c0 ("net: sched: use counter to break reclassify loops")
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Reviewed-by: default avatarVictor Nogueira <victor@mojatatu.com>
    Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
    Tested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
    Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/r/20240415210728.36949-1-victor@mojatatu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    0f022d32
dev.c 299 KB