• Tao Ma's avatar
    ocfs2/dlm: Clear joining_node on hearbeat node down · 2d4b1cbb
    Tao Ma authored
    Currently the process of dlm join contains 2 steps: query join and assert join.
    After query join, the joined node will set its joining_node. So if the joining
    node happens to panic before the 2nd step, the joined node will fail to clear
    its joining_node flag because that node isn't in the domain map. It at least
    cause 2 problems.
    1. All the new join request will fail. So no new node can mount the volume.
    2. The joined node can't umount the volume since during the umount process it
       has to wait for the joining_node to be unknown. So the umount will be hanged.
    
    The solution is to clear the joining_node before we check the domain map.
    Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
    Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
    2d4b1cbb
dlmrecovery.c 81.8 KB