• Kuniyuki Iwashima's avatar
    selftest: Don't reuse port for SO_INCOMING_CPU test. · 97de5a15
    Kuniyuki Iwashima authored
    Jakub reported that ASSERT_EQ(cpu, i) in so_incoming_cpu.c seems to
    fire somewhat randomly.
    
      # #  RUN           so_incoming_cpu.before_reuseport.test3 ...
      # # so_incoming_cpu.c:191:test3:Expected cpu (32) == i (0)
      # # test3: Test terminated by assertion
      # #          FAIL  so_incoming_cpu.before_reuseport.test3
      # not ok 3 so_incoming_cpu.before_reuseport.test3
    
    When the test failed, not-yet-accepted CLOSE_WAIT sockets received
    SYN with a "challenging" SEQ number, which was sent from an unexpected
    CPU that did not create the receiver.
    
    The test basically does:
    
      1. for each cpu:
        1-1. create a server
        1-2. set SO_INCOMING_CPU
    
      2. for each cpu:
        2-1. set cpu affinity
        2-2. create some clients
        2-3. let clients connect() to the server on the same cpu
        2-4. close() clients
    
      3. for each server:
        3-1. accept() all child sockets
        3-2. check if all children have the same SO_INCOMING_CPU with the server
    
    The root cause was the close() in 2-4. and net.ipv4.tcp_tw_reuse.
    
    In a loop of 2., close() changed the client state to FIN_WAIT_2, and
    the peer transitioned to CLOSE_WAIT.
    
    In another loop of 2., connect() happened to select the same port of
    the FIN_WAIT_2 socket, and it was reused as the default value of
    net.ipv4.tcp_tw_reuse is 2.
    
    As a result, the new client sent SYN to the CLOSE_WAIT socket from
    a different CPU, and the receiver's sk_incoming_cpu was overwritten
    with unexpected CPU ID.
    
    Also, the SYN had a different SEQ number, so the CLOSE_WAIT socket
    responded with Challenge ACK.  The new client properly returned RST
    and effectively killed the CLOSE_WAIT socket.
    
    This way, all clients were created successfully, but the error was
    detected later by 3-2., ASSERT_EQ(cpu, i).
    
    To avoid the failure, let's make sure that (i) the number of clients
    is less than the number of available ports and (ii) such reuse never
    happens.
    
    Fixes: 6df96146 ("selftest: Add test for SO_INCOMING_CPU.")
    Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
    Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
    Tested-by: default avatarJakub Kicinski <kuba@kernel.org>
    Link: https://lore.kernel.org/r/20240120031642.67014-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
    97de5a15
so_incoming_cpu.c 5.92 KB