1. 15 Oct, 2017 19 commits
  2. 14 Oct, 2017 17 commits
  3. 13 Oct, 2017 4 commits
    • David S. Miller's avatar
      Merge branch 'tipc-comm-groups' · a00344bd
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: Introduce Communcation Group feature
      
      With this commit series we introduce a 'Group Communication' feature in
      order to resolve the datagram and multicast flow control problem. This
      new feature makes it possible for a user to instantiate multiple private
      virtual brokerless message buses by just creating and joining member
      sockets.
      
      The main features are as follows:
      ---------------------------------
      - Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
        If it is the first socket of the group this implies creation of the
        group. This call takes four parameters: 'type' serves as group
        identifier, 'instance' serves as member identifier, and 'scope'
        indicates the visibility of the group (node/cluster/zone). Finally,
        'flags' indicates different options for the socket joining the group.
        For the time being, there are only two such flags: 1) 'LOOPBACK'
        indicates if the creator of the socket wants to receive a copy of
        broadcast or multicast messages it sends to the group, 2) EVENTS
        indicates if it wants to receive membership (JOINED/LEFT) events for
        the other members of the group.
      
      - Groups are closed, i.e., sockets which have not joined a group will
        not be able to send messages to or receive messages from members of
        the group, and vice versa. A socket can only be member of one group
        at a time.
      
      - There are four transmission modes.
        1: Unicast. The sender transmits a message using the port identity
           (node:port tuple) of the receiving socket.
        2: Anycast. The sender transmits a message using a port name (type:
           instance:scope) of one of the receiving sockets. If more than
           one member socket matches the given address a destination is
           selected according to a round-robin algorithm, but also considering
           the destination load (advertised window size) as an additional
           criteria.
        3: Multicast. The sender transmits a message using a port name
           (type:instance:scope) of one or more of the receiving sockets.
           All sockets in the group matching the given address will receive
           a copy of the message.
        4: Broadcast. The sender transmits a message using the primtive
           send(). All members of the group, irrespective of their member
           identity (instance) number receive a copy of the message.
      
      - TIPC broadcast is used for carrying messages in mode 3 or 4 when
        this is deemed more efficient, i.e., depending on number of actual
        destinations.
      
      - All transmission modes are flow controlled, so that messages never
        are dropped or rejected, just like we are used to from connection
        oriented communication. A special algorithm guarantees that this is
        true even for multipoint-to-point communication, i.e., at occasions
        where many source sockets may decide to send simultaneously towards
        the same  destination socket.
      
      - Sequence order is always guaranteed, even between the different
        transmission modes.
      
      - Member join/leave events are received in all other member sockets
        in guaranteed order. I.e., a 'JOINED' (an empty message with the OOB
        bit set) will always be received before the first data message from
        a new member, and a 'LEAVE' (like 'JOINED', but with EOR bit set) will
        always arrive after the last data message from a leaving member.
      
      -----
      v2: Reordered variable declarations in descending length order, as per
          feedback from David Miller. This was done as far as permitted by the
          the initialization order.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a00344bd
    • Jon Maloy's avatar
      tipc: add multipoint-to-point flow control · 04d7b574
      Jon Maloy authored
      We already have point-to-multipoint flow control within a group. But
      we even need the opposite; -a scheme which can handle that potentially
      hundreds of sources may try to send messages to the same destination
      simultaneously without causing buffer overflow at the recipient. This
      commit adds such a mechanism.
      
      The algorithm works as follows:
      
      - When a member detects a new, joining member, it initially set its
        state to JOINED and advertises a minimum window to the new member.
        This window is chosen so that the new member can send exactly one
        maximum sized message, or several smaller ones, to the recipient
        before it must stop and wait for an additional advertisement. This
        minimum window ADV_IDLE is set to 65 1kB blocks.
      
      - When a member receives the first data message from a JOINED member,
        it changes the state of the latter to ACTIVE, and advertises a larger
        window ADV_ACTIVE = 12 x ADV_IDLE blocks to the sender, so it can
        continue sending with minimal disturbances to the data flow.
      
      - The active members are kept in a dedicated linked list. Each time a
        message is received from an active member, it will be moved to the
        tail of that list. This way, we keep a record of which members have
        been most (tail) and least (head) recently active.
      
      - There is a maximum number (16) of permitted simultaneous active
        senders per receiver. When this limit is reached, the receiver will
        not advertise anything immediately to a new sender, but instead put
        it in a PENDING state, and add it to a corresponding queue. At the
        same time, it will pick the least recently active member, send it an
        advertisement RECLAIM message, and set this member to state
        RECLAIMING.
      
      - The reclaimee member has to respond with a REMIT message, meaning that
        it goes back to a send window of ADV_IDLE, and returns its unused
        advertised blocks beyond that value to the reclaiming member.
      
      - When the reclaiming member receives the REMIT message, it unlinks
        the reclaimee from its active list, resets its state to JOINED, and
        notes that it is now back at ADV_IDLE advertised blocks to that
        member. If there are still unread data messages sent out by
        reclaimee before the REMIT, the member goes into an intermediate
        state REMITTED, where it stays until the said messages have been
        consumed.
      
      - The returned advertised blocks can now be re-advertised to the
        pending member, which is now set to state ACTIVE and added to
        the active member list.
      
      - To be proactive, i.e., to minimize the risk that any member will
        end up in the pending queue, we start reclaiming resources already
        when the number of active members exceeds 3/4 of the permitted
        maximum.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04d7b574
    • Jon Maloy's avatar
      tipc: guarantee delivery of last broadcast before DOWN event · a3bada70
      Jon Maloy authored
      The following scenario is possible:
      - A user sends a broadcast message, and thereafter immediately leaves
        the group.
      - The LEAVE message, following a different path than the broadcast,
        arrives ahead of the broadcast, and the sending member is removed
        from the receiver's list.
      - The broadcast message arrives, but is dropped because the sender
        now is unknown to the receipient.
      
      We fix this by sequence numbering membership events, just like ordinary
      unicast messages. Currently, when a JOIN is sent to a peer, it contains
      a synchronization point, - the sequence number of the next sent
      broadcast, in order to give the receiver a start synchronization point.
      We now let even LEAVE messages contain such an "end synchronization"
      point, so that the recipient can delay the removal of the sending member
      until it knows that all messages have been received.
      
      The received synchronization points are added as sequence numbers to the
      generated membership events, making it possible to handle them almost
      the same way as regular unicasts in the receiving filter function. In
      particular, a DOWN event with a too high sequence number will be kept
      in the reordering queue until the missing broadcast(s) arrive and have
      been delivered.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3bada70
    • Jon Maloy's avatar
      tipc: guarantee delivery of UP event before first broadcast · 399574d4
      Jon Maloy authored
      The following scenario is possible:
      - A user joins a group, and immediately sends out a broadcast message
        to its members.
      - The broadcast message, following a different data path than the
        initial JOIN message sent out during the joining procedure, arrives
        to a receiver before the latter..
      - The receiver drops the message, since it is not ready to accept any
        messages until the JOIN has arrived.
      
      We avoid this by treating group protocol JOIN messages like unicast
      messages.
      - We let them pass through the recipient's multicast input queue, just
        like ordinary unicasts.
      - We force the first following broadacst to be sent as replicated
        unicast and being acknowledged by the recipient before accepting
        any more broadcast transmissions.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      399574d4