1. 18 Sep, 2011 1 commit
    • Allan Stephens's avatar
      tipc: Ensure both nodes recognize loss of contact between them · b4b56102
      Allan Stephens authored
      Enhances TIPC to ensure that a node that loses contact with a
      neighboring node does not allow contact to be re-established until
      it sees that its peer has also recognized the loss of contact.
      
      Previously, nodes that were connected by two or more links could
      encounter a situation in which node A would lose contact with node B
      on all of its links, purge its name table of names published by B,
      and then fail to repopulate those names once contact with B was restored.
      This would happen because B was able to re-establish one or more links
      so quickly that it never reached a point where it had no links to A --
      meaning that B never saw a loss of contact with A, and consequently
      didn't re-publish its names to A.
      
      This problem is now prevented by enhancing the cleanup done by TIPC
      following a loss of contact with a neighboring node to ensure that
      node A ignores all messages sent by B until it receives a LINK_PROTOCOL
      message that indicates B has lost contact with A, thereby preventing
      the (re)establishment of links between the nodes. The loss of contact
      is recognized when a RESET or ACTIVATE message is received that has
      a "redundant link exists" field of 0, indicating that B's sending link
      endpoint is in a reset state and that B has no other working links.
      
      Additionally, TIPC now suppresses the sending of (most) link protocol
      messages to a neighboring node while it is cleaning up after an earlier
      loss of contact with that node. This stops the peer node from prematurely
      activating its link endpoint, which would prevent TIPC from later
      activating its own end. TIPC still allows outgoing RESET messages to
      occur during cleanup, to avoid problems if its own node recognizes
      the loss of contact first and tries to notify the peer of the situation.
      
      Finally, TIPC now recognizes an impending loss of contact with a peer node
      as soon as it receives a RESET message on a working link that is the
      peer's only link to the node, and ensures that the link protocol
      suppression mentioned above goes into effect right away -- that is,
      even before its own link endpoints have failed. This is necessary to
      ensure correct operation when there are redundant links between the nodes,
      since otherwise TIPC would send an ACTIVATE message upon receiving a RESET
      on its first link and only begin suppressing when a RESET on its second
      link was received, instead of initiating suppression with the first RESET
      message as it needs to.
      
      Note: The reworked cleanup code also eliminates a check that prevented
      a link endpoint's discovery object from responding to incoming messages
      while stale name table entries are being purged. This check is now
      unnecessary and would have slowed down re-establishment of communication
      between the nodes in some situations.
      Signed-off-by: default avatarAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      b4b56102
  2. 01 Sep, 2011 17 commits
  3. 30 Aug, 2011 21 commits
  4. 29 Aug, 2011 1 commit