• David Teigland's avatar
    [DLM] do full recover_locks barrier · 4b77f2c9
    David Teigland authored
    Red Hat BZ 211914
    
    The previous patch "[DLM] fix aborted recovery during
    node removal" was incomplete as discovered with further testing.  It set
    the bit for the RS_LOCKS barrier but did not then wait for the barrier.
    This is often ok, but sometimes it will cause yet another recovery hang.
    If it's a new node that also has the lowest nodeid that skips the barrier
    wait, then it misses the important step of collecting and reporting the
    barrier status from the other nodes (which is the job of the low nodeid in
    the barrier wait routine).
    Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
    Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    4b77f2c9
recoverd.c 6.8 KB