1. 22 Nov, 2012 5 commits
    • Jon Maloy's avatar
      tipc: introduce message to synchronize broadcast link · c64f7a6a
      Jon Maloy authored
      Upon establishing a first link between two nodes, there is
      currently a risk that the two endpoints will disagree on exactly
      which sequence number reception and acknowleding of broadcast
      packets should start.
      
      The following scenarios may happen:
      
      1: Node A sends an ACTIVATE message to B, telling it to start acking
         packets from sequence number N.
      2: Node A sends out broadcast N, but does not expect an acknowledge
         from B, since B is not yet in its broadcast receiver's list.
      3: Node A receives ACK for N from all nodes except B, and releases
         packet N.
      4: Node B receives the ACTIVATE, activates its link endpoint, and
         stores the value N as sequence number of first expected packet.
      5: Node B sends a NAME_DISTR message to A.
      6: Node A receives the NAME_DISTR message, and activates its endpoint.
         At this moment B is added to A's broadcast receiver's set.
         Node A also sets sequence number 0 as the first broadcast packet
         to be received from B.
      7: Node A sends broadcast N+1.
      8: B receives N+1, determines there is a gap in the sequence, since
         it is expecting N, and sends a NACK for N back to A.
      9: Node A has already released N, so no retransmission is possible.
         The broadcast link in direction A->B is stale.
      
      In addition to, or instead of, 7-9 above, the following may happen:
      
      10: Node B sends broadcast M > 0 to A.
      11: Node A receives M, falsely decides there must be a gap, since
          it is expecting packet 0, and asks for retransmission of packets
          [0,M-1].
      12: Node B has already released these packets, so the broadcast
          link is stale in direction B->A.
      
      We solve this problem by introducing a new unicast message type,
      BCAST_PROTOCOL/STATE, to convey the sequence number of the next
      sent broadcast packet to the other endpoint, at exactly the moment
      that endpoint is added to the own node's broadcast receivers list,
      and before any other unicast messages are permitted to be sent.
      
      Furthermore, we don't allow any node to start receiving and
      processing broadcast packets until this new synchronization
      message has been received.
      
      To maintain backwards compatibility, we still open up for
      broadcast reception if we receive a NAME_DISTR message without
      any preceding broadcast sync message. In this case, we must
      assume that the other end has an older code version, and will
      never send out the new synchronization message. Hence, for mixed
      old and new nodes, the issue arising in 7-12 of the above may
      happen with the same probability as before.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      c64f7a6a
    • Ying Xue's avatar
      tipc: rename supported flag to recv_permitted · 389dd9bc
      Ying Xue authored
      Rename the "supported" flag in bclink structure to "recv_permitted"
      to better reflect what it is used for. When this flag is set for a
      given node, we are permitted to receive and acknowledge broadcast
      messages from that node.  Convert it to a bool at the same time,
      since it is not used to store any numerical values.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      389dd9bc
    • Ying Xue's avatar
      tipc: remove supportable flag from bclink structure · 818f4da5
      Ying Xue authored
      The "supportable" flag in bclink structure is a compatibility flag
      indicating whether a peer node is capable of receiving TIPC broadcast
      messages. However, all TIPC versions since tipc-1.5, and after the
      inclusion in the upstream Linux kernel in 2006, support this capability.
      It is highly unlikely that anybody is still using such an old
      version of TIPC, let alone that they want to mix it with TIPC-2.0
      nodes. Therefore, we now remove the "supportable" flag.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      818f4da5
    • Ying Xue's avatar
      tipc: remove the bearer congestion mechanism · 3c294cb3
      Ying Xue authored
      Currently at the TIPC bearer layer there is the following congestion
      mechanism:
      
      Once sending packets has failed via that bearer, the bearer will be
      flagged as being in congested state at once. During bearer congestion,
      all packets arriving at link will be queued on the link's outgoing
      buffer.  When we detect that the state of bearer congestion has
      relaxed (e.g. some packets are received from the bearer) we will try
      our best to push all packets in the link's outgoing buffer until the
      buffer is empty, or until the bearer is congested again.
      
      However, in fact the TIPC bearer never receives any feedback from the
      device layer whether a send was successful or not, so it must always
      assume it was successful. Therefore, the bearer congestion mechanism
      as it exists currently is of no value.
      
      But the bearer blocking state is still useful for us. For example,
      when the physical media goes down/up, we need to change the state of
      the links bound to the bearer.  So the code maintaing the state
      information is not removed.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      3c294cb3
    • Ying Xue's avatar
      tipc: wake up all waiting threads at socket shutdown · 75031151
      Ying Xue authored
      When a socket is shut down, we should wake up all thread sleeping on
      it, instead of just one of them. Otherwise, when several threads are
      polling the same socket, and one of them does shutdown(), the
      remaining threads may end up sleeping forever.
      
      Also, to align socket usage with common practice in other stacks, we
      use one of the common socket callback handlers, sk_state_change(),
      to wake up pending users. This is similar to the usage in e.g.
      inet_shutdown(). [net/ipv4/af_inet.c].
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      75031151
  2. 21 Nov, 2012 2 commits
    • Erik Hugne's avatar
      tipc: return POLLOUT for sockets in an unconnected state · c4fc298a
      Erik Hugne authored
      If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
      socket during link congestion, the connect message will be discarded
      and sendmsg will return EAGAIN. This is normal behavior, and the
      application is expected to poll the socket until POLLOUT is set,
      after which the connection attempt can be retried.
      However, the POLLOUT flag is never set for unconnected sockets and
      poll() always returns a zero mask. The application is then left without
      a trigger for when it can make another attempt at sending the message.
      
      The solution is to check if we're polling on an unconnected socket
      and set the POLLOUT flag if the TIPC port owned by this socket
      is not congested. The TIPC ports waiting on a specific link will be
      marked as 'not congested' when the link congestion have abated.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      c4fc298a
    • Ying Xue's avatar
      tipc: fix race/inefficiencies in poll/wait behaviour · f288bef4
      Ying Xue authored
      When an application blocks at poll/select on a TIPC socket
      while requesting a specific event mask, both the filter_rcv() and
      wakeupdispatch() case will wake it up unconditionally whenever
      the state changes (i.e an incoming message arrives, or congestion
      has subsided).  No mask is used.
      
      To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
      with tipc_data_ready and tipc_write_space respectively, which makes
      tipc more in alignment with the rest of the networking code.  These
      pass the exact set of possible events to the waker in fs/select.c
      hence avoiding waking up blocked processes unnecessarily.
      
      In doing so, we uncover another issue -- that there needs to be a
      memory barrier in these poll/receive callbacks, otherwise we are
      subject to the the same race as documented above wq_has_sleeper()
      [in commit a57de0b4 "net: adding memory barrier to the poll and
      receive callbacks"].  So we need to replace poll_wait() with
      sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
      in these two new functions.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      f288bef4
  3. 20 Nov, 2012 11 commits
  4. 19 Nov, 2012 22 commits