• Junxiao Bi's avatar
    ocfs2: o2hb: add negotiate timer · e0cbb798
    Junxiao Bi authored
    This series of patches is to fix the issue that when storage down, all
    nodes will fence self due to write timeout.
    
    With this patch set, all nodes will keep going until storage back
    online, except if the following issue happens, then all nodes will do as
    before to fence self.
    
    1. io error got
    2. network between nodes down
    3. nodes panic
    
    This patch (of 6):
    
    When storage down, all nodes will fence self due to write timeout.  The
    negotiate timer is designed to avoid this, with it node will wait until
    storage up again.
    
    Negotiate timer working in the following way:
    
    1. The timer expires before write timeout timer, its timeout is half
       of write timeout now.  It is re-queued along with write timeout timer.
       If expires, it will send NEGO_TIMEOUT message to master node(node with
       lowest node number).  This message does nothing but marks a bit in a
       bitmap recording which nodes are negotiating timeout on master node.
    
    2. If storage down, nodes will send this message to master node, then
       when master node finds its bitmap including all online nodes, it sends
       NEGO_APPROVL message to all nodes one by one, this message will
       re-queue write timeout timer and negotiate timer.  For any node doesn't
       receive this message or meets some issue when handling this message, it
       will be fenced.  If storage up at any time, o2hb_thread will run and
       re-queue all the timer, nothing will be affected by these two steps.
    Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
    Reviewed-by: default avatarRyan Ding <ryan.ding@oracle.com>
    Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
    Cc: Gang He <ghe@suse.com>
    Cc: rwxybh <rwxybh@126.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Joseph Qi <joseph.qi@huawei.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e0cbb798
heartbeat.c 66.4 KB