• Jon Maloy's avatar
    tipc: fix race condition at topology server receive · e88f2be8
    Jon Maloy authored
    We have identified a race condition during reception of socket
    events and messages in the topology server.
    
    - The function tipc_close_conn() is releasing the corresponding
      struct tipc_subscriber instance without considering that there
      may still be items in the receive work queue. When those are
      scheduled, in the function tipc_receive_from_work(), they are
      using the subscriber pointer stored in struct tipc_conn, without
      first checking if this is valid or not. This will sometimes
      lead to crashes, as the next call of tipc_conn_recvmsg() will
      access the now deleted item.
      We fix this by making the usage of this pointer conditional on
      whether the connection is active or not. I.e., we check the condition
      test_bit(CF_CONNECTED) before making the call tipc_conn_recvmsg().
    
    - Since the two functions may be running on different cores, the
      condition test described above is not enough. tipc_close_conn()
      may come in between and delete the subscriber item after the condition
      test is done, but before tipc_conn_recv_msg() is finished. This
      happens less frequently than the problem described above, but leads
      to the same symptoms.
    
      We fix this by using the existing sk_callback_lock for mutual
      exclusion in the two functions. In addition, we have to move
      a call to tipc_conn_terminate() outside the mentioned lock to
      avoid deadlock.
    Acked-by: default avatarYing Xue <ying.xue@windriver.com>
    Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    e88f2be8
subscr.c 11.7 KB