• Eric Dumazet's avatar
    tcp: refine tcp_prune_ofo_queue() logic · b0e01253
    Eric Dumazet authored
    After commits 36a6503f ("tcp: refine tcp_prune_ofo_queue()
    to not drop all packets") and 72cd43ba
    ("tcp: free batches of packets in tcp_prune_ofo_queue()")
    tcp_prune_ofo_queue() drops a fraction of ooo queue,
    to make room for incoming packet.
    
    However it makes no sense to drop packets that are
    before the incoming packet, in sequence space.
    
    In order to recover from packet losses faster,
    it makes more sense to only drop ooo packets
    which are after the incoming packet.
    
    Tested:
    packetdrill test:
       0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
       +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
       +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [3800], 4) = 0
       +0 bind(3, ..., ...) = 0
       +0 listen(3, 1) = 0
    
       +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
       +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 0>
      +.1 < . 1:1(0) ack 1 win 1024
       +0 accept(3, ..., ...) = 4
    
     +.01 < . 200:300(100) ack 1 win 1024
       +0 > . 1:1(0) ack 1 <nop,nop, sack 200:300>
    
     +.01 < . 400:500(100) ack 1 win 1024
       +0 > . 1:1(0) ack 1 <nop,nop, sack 400:500 200:300>
    
     +.01 < . 600:700(100) ack 1 win 1024
       +0 > . 1:1(0) ack 1 <nop,nop, sack 600:700 400:500 200:300>
    
     +.01 < . 800:900(100) ack 1 win 1024
       +0 > . 1:1(0) ack 1 <nop,nop, sack 800:900 600:700 400:500 200:300>
    
     +.01 < . 1000:1100(100) ack 1 win 1024
       +0 > . 1:1(0) ack 1 <nop,nop, sack 1000:1100 800:900 600:700 400:500>
    
     +.01 < . 1200:1300(100) ack 1 win 1024
       +0 > . 1:1(0) ack 1 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
    
    // this packet is dropped because we have no room left.
     +.01 < . 1400:1500(100) ack 1 win 1024
    
     +.01 < . 1:200(199) ack 1 win 1024
    // Make sure kernel did not drop 200:300 sequence
       +0 > . 1:1(0) ack 300 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
    // Make room, since our RCVBUF is very small
       +0 read(4, ..., 299) = 299
    
     +.01 < . 300:400(100) ack 1 win 1024
       +0 > . 1:1(0) ack 500 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
    
     +.01 < . 500:600(100) ack 1 win 1024
       +0 > . 1:1(0) ack 700 <nop,nop, sack 1200:1300 1000:1100 800:900>
    
       +0 read(4, ..., 400) = 400
    
     +.01 < . 700:800(100) ack 1 win 1024
       +0 > . 1:1(0) ack 900 <nop,nop, sack 1200:1300 1000:1100>
    
     +.01 < . 900:1000(100) ack 1 win 1024
       +0 > . 1:1(0) ack 1100 <nop,nop, sack 1200:1300>
    
     +.01 < . 1100:1200(100) ack 1 win 1024
    // This checks that 1200:1300 has not been removed from ooo queue
       +0 > . 1:1(0) ack 1300
    Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
    Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20221101035234.3910189-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    b0e01253
tcp_input.c 202 KB