• Li Nan's avatar
    md/raid10: fix task hung in raid10d · 72c215ed
    Li Nan authored
    commit fe630de0 ("md/raid10: avoid deadlock on recovery.") allowed
    normal io and sync io to exist at the same time. Task hung will occur as
    below:
    
    T1                      T2		T3		T4
    raid10d
     handle_read_error
      allow_barrier
       conf->nr_pending--
        -> 0
                            //submit sync io
                            raid10_sync_request
                             raise_barrier
    			  ->will not be blocked
    			  ...
    			//submit to drivers
      raid10_read_request
       wait_barrier
        conf->nr_pending++
         -> 1
    					//retry read fail
    					raid10_end_read_request
    					 reschedule_retry
    					  add to retry_list
    					  conf->nr_queued++
    					   -> 1
    							//sync io fail
    							end_sync_read
    							 __end_sync_read
    							  reschedule_retry
    							   add to retry_list
    					                    conf->nr_queued++
    							     -> 2
     ...
     handle_read_error
     get form retry_list
     conf->nr_queued--
      freeze_array
       wait nr_pending == nr_queued+1
            ->1	      ->2
       //task hung
    
    retry read and sync io will be added to retry_list(nr_queued->2) if they
    fails. raid10d() called handle_read_error() and hung in freeze_array().
    nr_queued will not decrease because raid10d is blocked, nr_pending will
    not increase because conf->barrier is not released.
    
    Fix it by moving allow_barrier() after raid10_read_request().
    raise_barrier() will wait for nr_waiting to become 0. Therefore, sync io
    and regular io will not be issued at the same time.
    
    Also remove the check of nr_queued in stop_waiting_barrier. It can be 0
    but don't need to be blocking. Remove the check for MD_RECOVERY_RUNNING as
    the check is redundent.
    
    Fixes: fe630de0 ("md/raid10: avoid deadlock on recovery.")
    Signed-off-by: default avatarLi Nan <linan122@huawei.com>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20230222041000.3341651-2-linan666@huaweicloud.com
    72c215ed
raid10.c 145 KB