• Wei Wang's avatar
    net/tcp_fastopen: Disable active side TFO in certain scenarios · cf1ef3f0
    Wei Wang authored
    Middlebox firewall issues can potentially cause server's data being
    blackholed after a successful 3WHS using TFO. Following are the related
    reports from Apple:
    https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
    Slide 31 identifies an issue where the client ACK to the server's data
    sent during a TFO'd handshake is dropped.
    C ---> syn-data ---> S
    C <--- syn/ack ----- S
    C (accept & write)
    C <---- data ------- S
    C ----- ACK -> X     S
    		[retry and timeout]
    
    https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
    Slide 5 shows a similar situation that the server's data gets dropped
    after 3WHS.
    C ---- syn-data ---> S
    C <--- syn/ack ----- S
    C ---- ack --------> S
    S (accept & write)
    C?  X <- data ------ S
    		[retry and timeout]
    
    This is the worst failure b/c the client can not detect such behavior to
    mitigate the situation (such as disabling TFO). Failing to proceed, the
    application (e.g., SSL library) may simply timeout and retry with TFO
    again, and the process repeats indefinitely.
    
    The proposed solution is to disable active TFO globally under the
    following circumstances:
    1. client side TFO socket detects out of order FIN
    2. client side TFO socket receives out of order RST
    
    We disable active side TFO globally for 1hr at first. Then if it
    happens again, we disable it for 2h, then 4h, 8h, ...
    And we reset the timeout to 1hr if a client side TFO sockets not opened
    on loopback has successfully received data segs from server.
    And we examine this condition during close().
    
    The rational behind it is that when such firewall issue happens,
    application running on the client should eventually close the socket as
    it is not able to get the data it is expecting. Or application running
    on the server should close the socket as it is not able to receive any
    response from client.
    In both cases, out of order FIN or RST will get received on the client
    given that the firewall will not block them as no data are in those
    frames.
    And we want to disable active TFO globally as it helps if the middle box
    is very close to the client and most of the connections are likely to
    fail.
    
    Also, add a debug sysctl:
      tcp_fastopen_blackhole_detect_timeout_sec:
        the initial timeout to use when firewall blackhole issue happens.
        This can be set and read.
        When setting it to 0, it means to disable the active disable logic.
    Signed-off-by: default avatarWei Wang <weiwan@google.com>
    Acked-by: default avatarYuchung Cheng <ycheng@google.com>
    Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    cf1ef3f0
sysctl_net_ipv4.c 28.5 KB