• Yu Kuai's avatar
    md: Don't register sync_thread for reshape directly · ad39c081
    Yu Kuai authored
    Currently, if reshape is interrupted, then reassemble the array will
    register sync_thread directly from pers->run(), in this case
    'MD_RECOVERY_RUNNING' is set directly, however, there is no guarantee
    that md_do_sync() will be executed, hence stop_sync_thread() will hang
    because 'MD_RECOVERY_RUNNING' can't be cleared.
    
    Last patch make sure that md_do_sync() will set MD_RECOVERY_DONE,
    however, following hang can still be triggered by dm-raid test
    shell/lvconvert-raid-reshape.sh occasionally:
    
    [root@fedora ~]# cat /proc/1982/stack
    [<0>] stop_sync_thread+0x1ab/0x270 [md_mod]
    [<0>] md_frozen_sync_thread+0x5c/0xa0 [md_mod]
    [<0>] raid_presuspend+0x1e/0x70 [dm_raid]
    [<0>] dm_table_presuspend_targets+0x40/0xb0 [dm_mod]
    [<0>] __dm_destroy+0x2a5/0x310 [dm_mod]
    [<0>] dm_destroy+0x16/0x30 [dm_mod]
    [<0>] dev_remove+0x165/0x290 [dm_mod]
    [<0>] ctl_ioctl+0x4bb/0x7b0 [dm_mod]
    [<0>] dm_ctl_ioctl+0x11/0x20 [dm_mod]
    [<0>] vfs_ioctl+0x21/0x60
    [<0>] __x64_sys_ioctl+0xb9/0xe0
    [<0>] do_syscall_64+0xc6/0x230
    [<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74
    
    Meanwhile mddev->recovery is:
    MD_RECOVERY_RUNNING |
    MD_RECOVERY_INTR |
    MD_RECOVERY_RESHAPE |
    MD_RECOVERY_FROZEN
    
    Fix this problem by remove the code to register sync_thread directly
    from raid10 and raid5. And let md_check_recovery() to register
    sync_thread.
    
    Fixes: f6705578 ("[PATCH] md: Checkpoint and allow restart of raid5 reshape")
    Fixes: f52f5c71 ("md: fix stopping sync thread")
    Cc: stable@vger.kernel.org # v6.7+
    Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20240201092559.910982-5-yukuai1@huaweicloud.com
    ad39c081
raid5.c 252 KB