• Alexander Aring's avatar
    fs: dlm: add reliable connection if reconnect · 489d8e55
    Alexander Aring authored
    This patch introduce to make a tcp lowcomms connection reliable even if
    reconnects occurs. This is done by an application layer re-transmission
    handling and sequence numbers in dlm protocols. There are three new dlm
    commands:
    
    DLM_OPTS:
    
    This will encapsulate an existing dlm message (and rcom message if they
    don't have an own application side re-transmission handling). As optional
    handling additional tlv's (type length fields) can be appended. This can
    be for example a sequence number field. However because in DLM_OPTS the
    lockspace field is unused and a sequence number is a mandatory field it
    isn't made as a tlv and we put the sequence number inside the lockspace
    id. The possibility to add optional options are still there for future
    purposes.
    
    DLM_ACK:
    
    Just a dlm header to acknowledge the receive of a DLM_OPTS message to
    it's sender.
    
    DLM_FIN:
    
    This provides a 4 way handshake for connection termination inclusive
    support for half-closed connections. It's provided on application layer
    because SCTP doesn't support half-closed sockets, the shutdown() call
    can interrupted by e.g. TCP resets itself and a hard logic to implement
    it because the othercon paradigm in lowcomms. The 4-way termination
    handshake also solve problems to synchronize peer EOF arrival and that
    the cluster manager removes the peer in the node membership handling of
    DLM. In some cases messages can be still transmitted in this time and we
    need to wait for the node membership event.
    
    To provide a reliable connection the node will retransmit all
    unacknowledges message to it's peer on reconnect. The receiver will then
    filtering out the next received message and drop all messages which are
    duplicates.
    
    As RCOM_STATUS and RCOM_NAMES messages are the first messages which are
    exchanged and they have they own re-transmission handling, there exists
    logic that these messages must be first. If these messages arrives we
    store the dlm version field. This handling is on DLM 3.1 and after this
    patch 3.2 the same. A backwards compatibility handling has been added
    which seems to work on tests without tcpkill, however it's not recommended
    to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long
    term bugs in the DLM protocol.
    Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
    Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
    489d8e55
rcom.c 16.6 KB