• Eric Dumazet's avatar
    tcp: make tcp_sendmsg() aware of socket backlog · d41a69f1
    Eric Dumazet authored
    Large sendmsg()/write() hold socket lock for the duration of the call,
    unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
    are parked into socket backlog for a long time.
    Critical decisions like fast retransmit might be delayed.
    Receivers have to maintain a big out of order queue with additional cpu
    overhead, and also possible stalls in TX once windows are full.
    
    Bidirectional flows are particularly hurt since the backlog can become
    quite big if the copy from user space triggers IO (page faults)
    
    Some applications learnt to use sendmsg() (or sendmmsg()) with small
    chunks to avoid this issue.
    
    Kernel should know better, right ?
    
    Add a generic sk_flush_backlog() helper and use it right
    before a new skb is allocated. Typically we put 64KB of payload
    per skb (unless MSG_EOR is requested) and checking socket backlog
    every 64KB gives good results.
    
    As a matter of fact, tests with TSO/GSO disabled give very nice
    results, as we manage to keep a small write queue and smaller
    perceived rtt.
    
    Note that sk_flush_backlog() maintains socket ownership,
    so is not equivalent to a {release_sock(sk); lock_sock(sk);},
    to ensure implicit atomicity rules that sendmsg() was
    giving to (possibly buggy) applications.
    
    In this simple implementation, I chose to not call tcp_release_cb(),
    but we might consider this later.
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Cc: Alexei Starovoitov <ast@fb.com>
    Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    d41a69f1
tcp.c 84.1 KB