• Junxiao Bi's avatar
    ocfs2: dlm: fix lock migration crash · 09400fe4
    Junxiao Bi authored
    commit 34aa8dac upstream.
    
    This issue was introduced by commit 800deef3 ("ocfs2: use
    list_for_each_entry where benefical") in 2007 where it replaced
    list_for_each with list_for_each_entry.  The variable "lock" will point
    to invalid data if "tmpq" list is empty and a panic will be triggered
    due to this.  Sunil advised reverting it back, but the old version was
    also not right.  At the end of the outer for loop, that
    list_for_each_entry will also set "lock" to an invalid data, then in the
    next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
    data and cause the panic.  So reverting the list_for_each back and reset
    "lock" to NULL to fix this issue.
    
    Another concern is that this seemes can not happen because the "tmpq"
    list should not be empty.  Let me describe how.
    
    old lock resource owner(node 1):                                  migratation target(node 2):
    image there's lockres with a EX lock from node 2 in
    granted list, a NR lock from node x with convert_type
    EX in converting list.
    dlm_empty_lockres() {
     dlm_pick_migration_target() {
       pick node 2 as target as its lock is the first one
       in granted list.
     }
     dlm_migrate_lockres() {
       dlm_mark_lockres_migrating() {
         res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
         wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
    	 //after the above code, we can not dirty lockres any more,
         // so dlm_thread shuffle list will not run
                                                                       downconvert lock from EX to NR
                                                                       upconvert lock from NR to EX
    <<< migration may schedule out here, then
    <<< node 2 send down convert request to convert type from EX to
    <<< NR, then send up convert request to convert type from NR to
    <<< EX, at this time, lockres granted list is empty, and two locks
    <<< in the converting list, node x up convert lock followed by
    <<< node 2 up convert lock.
    
    	 // will set lockres RES_MIGRATING flag, the following
    	 // lock/unlock can not run
         dlm_lockres_release_ast(dlm, res);
       }
    
       dlm_send_one_lockres()
                                                                     dlm_process_recovery_data()
                                                                       for (i=0; i<mres->num_locks; i++)
                                                                         if (ml->node == dlm->node_num)
                                                                           for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
                                                                            list_for_each_entry(lock, tmpq, list)
                                                                            if (lock) break; <<< lock is invalid as grant list is empty.
                                                                           }
                                                                           if (lock->ml.node != ml->node)
                                                                             BUG() >>> crash here
     }
    
    I see the above locks status from a vmcore of our internal bug.
    Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
    Reviewed-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
    Cc: Sunil Mushran <sunil.mushran@gmail.com>
    Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Mark Fasheh <mfasheh@suse.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    09400fe4
dlmrecovery.c 85.3 KB