• David Teigland's avatar
    [DLM] don't accept replies to old recovery messages · 98f176fb
    David Teigland authored
    We often abort a recovery after sending a status request to a remote node.
    We want to ignore any potential status reply we get from the remote node.
    If we get one of these unwanted replies, we've often moved on to the next
    recovery message and incremented the message sequence counter, so the
    reply will be ignored due to the seq number.  In some cases, we've not
    moved on to the next message so the seq number of the reply we want to
    ignore is still correct, causing the reply to be accepted.  The next
    recovery message will then mistake this old reply as a new one.
    
    To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
    new reply.  We clear this flag if we abort recovery while waiting for a
    reply.  Before the flag is set again (to allow new replies) we know that
    any old replies will be rejected due to their sequence number.  We also
    initialize the recovery-message sequence number to a random value when a
    lockspace is first created.  This makes it clear when messages are being
    rejected from an old instance of a lockspace that has since been
    recreated.
    Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
    Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    98f176fb
dlm_internal.h 14.3 KB