• Jon Maxwell's avatar
    dccp/tcp: fix routing redirect race · 29c4bf40
    Jon Maxwell authored
    commit 45caeaa5 upstream.
    
    As Eric Dumazet pointed out this also needs to be fixed in IPv6.
    v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
    
    We have seen a few incidents lately where a dst_enty has been freed
    with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
    dst_entry. If the conditions/timings are right a crash then ensues when the
    freed dst_entry is referenced later on. A Common crashing back trace is:
    
     #8 [] page_fault at ffffffff8163e648
        [exception RIP: __tcp_ack_snd_check+74]
    .
    .
     #9 [] tcp_rcv_established at ffffffff81580b64
    #10 [] tcp_v4_do_rcv at ffffffff8158b54a
    #11 [] tcp_v4_rcv at ffffffff8158cd02
    #12 [] ip_local_deliver_finish at ffffffff815668f4
    #13 [] ip_local_deliver at ffffffff81566bd9
    #14 [] ip_rcv_finish at ffffffff8156656d
    #15 [] ip_rcv at ffffffff81566f06
    #16 [] __netif_receive_skb_core at ffffffff8152b3a2
    #17 [] __netif_receive_skb at ffffffff8152b608
    #18 [] netif_receive_skb at ffffffff8152b690
    #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
    #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
    #21 [] net_rx_action at ffffffff8152bac2
    #22 [] __do_softirq at ffffffff81084b4f
    #23 [] call_softirq at ffffffff8164845c
    #24 [] do_softirq at ffffffff81016fc5
    #25 [] irq_exit at ffffffff81084ee5
    #26 [] do_IRQ at ffffffff81648ff8
    
    Of course it may happen with other NIC drivers as well.
    
    It's found the freed dst_entry here:
    
     224 static bool tcp_in_quickack_mode(struct sock *sk)↩
     225 {↩
     226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
     227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
     228 ↩
     229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
     230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
     231 }↩
    
    But there are other backtraces attributed to the same freed dst_entry in
    netfilter code as well.
    
    All the vmcores showed 2 significant clues:
    
    - Remote hosts behind the default gateway had always been redirected to a
    different gateway. A rtable/dst_entry will be added for that host. Making
    more dst_entrys with lower reference counts. Making this more probable.
    
    - All vmcores showed a postitive LockDroppedIcmps value, e.g:
    
    LockDroppedIcmps                  267
    
    A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
    regardless of whether user space has the socket locked. This can result in a
    race condition where the same dst_entry cached in sk->sk_dst_entry can be
    decremented twice for the same socket via:
    
    do_redirect()->__sk_dst_check()-> dst_release().
    
    Which leads to the dst_entry being prematurely freed with another socket
    pointing to it via sk->sk_dst_cache and a subsequent crash.
    
    To fix this skip do_redirect() if usespace has the socket locked. Instead let
    the redirect take place later when user space does not have the socket
    locked.
    
    The dccp/IPv6 code is very similar in this respect, so fixing it there too.
    
    As Eric Garver pointed out the following commit now invalidates routes. Which
    can set the dst->obsolete flag so that ipv4_dst_check() returns null and
    triggers the dst_release().
    
    Fixes: ceb33206 ("ipv4: Kill routes during PMTU/redirect updates.")
    Cc: Eric Garver <egarver@redhat.com>
    Cc: Hannes Sowa <hsowa@redhat.com>
    Signed-off-by: default avatarJon Maxwell <jmaxwell37@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
    29c4bf40
ipv4.c 28.6 KB