• NeilBrown's avatar
    md: fix deadlock in md/raid1 and md/raid10 when handling a read error · a35e63ef
    NeilBrown authored
    When handling a read error, we freeze the array to stop any other IO while
    attempting to over-write with correct data.
    
    This is done in the raid1d(raid10d) thread and must wait for all submitted IO
    to complete (except for requests that failed and are sitting in the retry
    queue - these are counted in ->nr_queue and will stay there during a freeze).
    
    However write requests need attention from raid1d as bitmap updates might be
    required.  This can cause a deadlock as raid1 is waiting for requests to
    finish that themselves need attention from raid1d.
    
    So we create a new function 'flush_pending_writes' to give that attention, and
    call it in freeze_array to be sure that we aren't waiting on raid1d.
    
    Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this
    problem.
    
    Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
    Signed-off-by: default avatarNeil Brown <neilb@suse.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a35e63ef
raid10.c 58.9 KB