• Martin KaFai Lau's avatar
    net: inet: Retire port only listening_hash · cae3873c
    Martin KaFai Lau authored
    The listen sk is currently stored in two hash tables,
    listening_hash (hashed by port) and lhash2 (hashed by port and address).
    
    After commit 0ee58dad ("net: tcp6: prefer listeners bound to an address")
    and commit d9fbc7f6 ("net: tcp: prefer listeners bound to an address"),
    the TCP-SYN lookup fast path does not use listening_hash.
    
    The commit 05c0b357 ("tcp: seq_file: Replace listening_hash with lhash2")
    also moved the seq_file (/proc/net/tcp) iteration usage from
    listening_hash to lhash2.
    
    There are still a few listening_hash usages left.
    One of them is inet_reuseport_add_sock() which uses the listening_hash
    to search a listen sk during the listen() system call.  This turns
    out to be very slow on use cases that listen on many different
    VIPs at a popular port (e.g. 443).  [ On top of the slowness in
    adding to the tail in the IPv6 case ].  The latter patch has a
    selftest to demonstrate this case.
    
    This patch takes this chance to move all remaining listening_hash
    usages to lhash2 and then retire listening_hash.
    
    Since most changes need to be done together, it is hard to cut
    the listening_hash to lhash2 switch into small patches.  The
    changes in this patch is highlighted here for the review
    purpose.
    
    1. Because of the listening_hash removal, lhash2 can use the
       sk->sk_nulls_node instead of the icsk->icsk_listen_portaddr_node.
       This will also keep the sk_unhashed() check to work as is
       after stop adding sk to listening_hash.
    
       The union is removed from inet_listen_hashbucket because
       only nulls_head is needed.
    
    2. icsk->icsk_listen_portaddr_node and its helpers are removed.
    
    3. The current lhash2 users needs to iterate with sk_nulls_node
       instead of icsk_listen_portaddr_node.
    
       One case is in the inet[6]_lhash2_lookup().
    
       Another case is the seq_file iterator in tcp_ipv4.c.
       One thing to note is sk_nulls_next() is needed
       because the old inet_lhash2_for_each_icsk_continue()
       does a "next" first before iterating.
    
    4. Move the remaining listening_hash usage to lhash2
    
       inet_reuseport_add_sock() which this series is
       trying to improve.
    
       inet_diag.c and mptcp_diag.c are the final two
       remaining use cases and is moved to lhash2 now also.
    Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    cae3873c
tcp_ipv4.c 85.3 KB