• Yu Kuai's avatar
    md: refactor idle/frozen_sync_thread() to fix deadlock · 130443d6
    Yu Kuai authored
    Our test found a following deadlock in raid10:
    
    1) Issue a normal write, and such write failed:
    
      raid10_end_write_request
       set_bit(R10BIO_WriteError, &r10_bio->state)
       one_write_done
        reschedule_retry
    
      // later from md thread
      raid10d
       handle_write_completed
        list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
    
      // later from md thread
      raid10d
       if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
        list_move(conf->bio_end_io_list.prev, &tmp)
        r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
        raid_end_bio_io(r10_bio)
    
    Dependency chain 1: normal io is waiting for updating superblock
    
    2) Trigger a recovery:
    
      raid10_sync_request
       raise_barrier
    
    Dependency chain 2: sync thread is waiting for normal io
    
    3) echo idle/frozen to sync_action:
    
      action_store
       mddev_lock
        md_unregister_thread
         kthread_stop
    
    Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
    
    4) md thread can't update superblock:
    
      raid10d
       md_check_recovery
        if (mddev_trylock(mddev))
         md_update_sb
    
    Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
    
    Hence cyclic dependency exist, in order to fix the problem, we must
    break one of them. Dependency 1 and 2 can't be broken because they are
    foundation design. Dependency 4 may be possible if it can be guaranteed
    that no io can be inflight, however, this requires a new mechanism which
    seems complex. Dependency 3 is a good choice, because idle/frozen only
    requires sync thread to finish, which can be done asynchronously that is
    already implemented, and 'reconfig_mutex' is not needed anymore.
    
    This patch switch 'idle' and 'frozen' to wait sync thread to be done
    asynchronously, and this patch also add a sequence counter to record how
    many times sync thread is done, so that 'idle' won't keep waiting on new
    started sync thread.
    
    Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
    deadlock can be fixed by this patch as well.
    
    [1] https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
    [2] https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
    130443d6
md.c 261 KB