• Neal Cardwell's avatar
    tcp: fix tcp_mark_head_lost to check skb len before fragmenting · 627f3abe
    Neal Cardwell authored
    commit d88270ee upstream.
    
    This commit fixes a corner case in tcp_mark_head_lost() which was
    causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.
    
    tcp_mark_head_lost() was assuming that if a packet has
    tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
    M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
    maintained, this is not always true.
    
    For example, suppose the sender sends 4 1-byte packets and have the
    last 3 packet sacked. It will merge the last 3 packets in the write
    queue into an skb with pcount = 3 and len = 3 bytes. If another
    recovery happens after a sack reneging event, tcp_mark_head_lost()
    may attempt to split the skb assuming it has more than 2*MSS bytes.
    
    This sounds very counterintuitive, but as the commit description for
    the related commit c0638c24 ("tcp: don't fragment SACKed skbs in
    tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
    coalesces adjacent regions of SACKed skbs, and when doing this it
    preserves the sum of their packet counts in order to reflect the
    real-world dynamics on the wire. The c0638c24 commit tried to
    avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
    where the non-proportionality between pcount and skb->len/mss is known
    to be possible. However, that commit did not handle the case where
    during a reneging event one of these weird SACKed skbs becomes an
    un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.
    
    The fix is to simply mark the entire skb lost when this happens.
    This makes the recovery slightly more aggressive in such corner
    cases before we detect reordering. But once we detect reordering
    this code path is by-passed because FACK is disabled.
    Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
    Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    Cc: Vinson Lee <vlee@freedesktop.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    627f3abe
tcp_input.c 180 KB