• Yu Kuai's avatar
    md: Don't ignore suspended array in md_check_recovery() · 1baae052
    Yu Kuai authored
    mddev_suspend() never stop sync_thread, hence it doesn't make sense to
    ignore suspended array in md_check_recovery(), which might cause
    sync_thread can't be unregistered.
    
    After commit f52f5c71
    
     ("md: fix stopping sync thread"), following
    hang can be triggered by test shell/integrity-caching.sh:
    
    1) suspend the array:
    raid_postsuspend
     mddev_suspend
    
    2) stop the array:
    raid_dtr
     md_stop
      __md_stop_writes
       stop_sync_thread
        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
        md_wakeup_thread_directly(mddev->sync_thread);
        wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
    
    3) sync thread done:
    md_do_sync
     set_bit(MD_RECOVERY_DONE, &mddev->recovery);
     md_wakeup_thread(mddev->thread);
    
    4) daemon thread can't unregister sync thread:
    md_check_recovery
     if (mddev->suspended)
       return; -> return directly
     md_read_sync_thread
     clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
     -> MD_RECOVERY_RUNNING can't be cleared, hence step 2 hang;
    
    This problem is not just related to dm-raid, fix it by ignoring
    suspended array in md_check_recovery(). And follow up patches will
    improve dm-raid better to frozen sync thread during suspend.
    Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
    Closes: https://lore.kernel.org/all/8fb335e-6d2c-dbb5-d7-ded8db5145a@redhat.com/
    Fixes: 68866e42 ("MD: no sync IO while suspended")
    Fixes: f52f5c71
    
     ("md: fix stopping sync thread")
    Cc: stable@vger.kernel.org # v6.7+
    Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20240201092559.910982-2-yukuai1@huaweicloud.com
    1baae052
md.c 264 KB