• Neal Cardwell's avatar
    tcp: reduce spurious retransmits due to transient SACK reneging · 5ae344c9
    Neal Cardwell authored
    This commit reduces spurious retransmits due to apparent SACK reneging
    by only reacting to SACK reneging that persists for a short delay.
    
    When a sequence space hole at snd_una is filled, some TCP receivers
    send a series of ACKs as they apparently scan their out-of-order queue
    and cumulatively ACK all the packets that have now been consecutiveyly
    received. This is essentially misbehavior B in "Misbehaviors in TCP
    SACK generation" ACM SIGCOMM Computer Communication Review, April
    2011, so we suspect that this is from several common OSes (Windows
    2000, Windows Server 2003, Windows XP). However, this issue has also
    been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
    into spurious retransmissions by lack of timestamps?" from March 2014,
    where the receiver was thought to be a BSD box.
    
    Since snd_una would temporarily be adjacent to a previously SACKed
    range in these scenarios, this receiver behavior triggered the Linux
    SACK reneging code path in the sender. This led the sender to clear
    the SACK scoreboard, enter CA_Loss, and spuriously retransmit
    (potentially) every packet from the entire write queue at line rate
    just a few milliseconds before the ACK for each packet arrives at the
    sender.
    
    To avoid such situations, now when a sender sees apparent reneging it
    does not yet retransmit, but rather adjusts the RTO timer to give the
    receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
    that will restore sanity to the SACK scoreboard. If the reneging
    persists until this RTO then, as before, we clear the SACK scoreboard
    and enter CA_Loss.
    
    A 10ms delay tolerates a receiver sending such a stream of ACKs at
    56Kbit/sec. And to allow for receivers with slower or more congested
    paths, we wait for at least RTT/2.
    
    We validated the resulting max(RTT/2, 10ms) delay formula with a mix
    of North American and South American Google web server traffic, and
    found that for ACKs displaying transient reneging:
    
     (1) 90% of inter-ACK delays were less than 10ms
     (2) 99% of inter-ACK delays were less than RTT/2
    
    In tests on Google web servers this commit reduced reneging events by
    75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
    any measurable impact on latency for user HTTP and SPDY requests.
    Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
    Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    5ae344c9
tcp.h 49.5 KB