• Neal Cardwell's avatar
    tcp: fix child sockets to use system default congestion control if not set · 9f950415
    Neal Cardwell authored
    Linux 3.17 and earlier are explicitly engineered so that if the app
    doesn't specifically request a CC module on a listener before the SYN
    arrives, then the child gets the system default CC when the connection
    is established. See tcp_init_congestion_control() in 3.17 or earlier,
    which says "if no choice made yet assign the current value set as
    default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
    created") altered these semantics, so that children got their parent
    listener's congestion control even if the system default had changed
    after the listener was created.
    
    This commit returns to those original semantics from 3.17 and earlier,
    since they are the original semantics from 2007 in 4d4d3d1e ("[TCP]:
    Congestion control initialization."), and some Linux congestion
    control workflows depend on that.
    
    In summary, if a listener socket specifically sets TCP_CONGESTION to
    "x", or the route locks the CC module to "x", then the child gets
    "x". Otherwise the child gets current system default from
    net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
    earlier, and this commit restores that.
    
    Fixes: 55d8694f ("net: tcp: assign tcp cong_ops when tcp sk is created")
    Cc: Florian Westphal <fw@strlen.de>
    Cc: Daniel Borkmann <dborkman@redhat.com>
    Cc: Glenn Judd <glenn.judd@morganstanley.com>
    Cc: Stephen Hemminger <stephen@networkplumber.org>
    Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
    Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    9f950415
tcp_minisocks.c 25.6 KB