• Eric Dumazet's avatar
    tcp: get rid of sysctl_tcp_adv_win_scale · dfa2f048
    Eric Dumazet authored
    With modern NIC drivers shifting to full page allocations per
    received frame, we face the following issue:
    
    TCP has one per-netns sysctl used to tweak how to translate
    a memory use into an expected payload (RWIN), in RX path.
    
    tcp_win_from_space() implementation is limited to few cases.
    
    For hosts dealing with various MSS, we either under estimate
    or over estimate the RWIN we send to the remote peers.
    
    For instance with the default sysctl_tcp_adv_win_scale value,
    we expect to store 50% of payload per allocated chunk of memory.
    
    For the typical use of MTU=1500 traffic, and order-0 pages allocations
    by NIC drivers, we are sending too big RWIN, leading to potential
    tcp collapse operations, which are extremely expensive and source
    of latency spikes.
    
    This patch makes sysctl_tcp_adv_win_scale obsolete, and instead
    uses a per socket scaling factor, so that we can precisely
    adjust the RWIN based on effective skb->len/skb->truesize ratio.
    
    This patch alone can double TCP receive performance when receivers
    are too slow to drain their receive queue, or by allowing
    a bigger RWIN when MSS is close to PAGE_SIZE.
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
    Link: https://lore.kernel.org/r/20230717152917.751987-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    dfa2f048
tcp.c 125 KB