• Nate Dailey's avatar
    raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang · 11266d29
    Nate Dailey authored
    commit ccfc7bf1 upstream.
    
    If raid1d is handling a mix of read and write errors, handle_read_error's
    call to freeze_array can get stuck.
    
    This can happen because, though the bio_end_io_list is initially drained,
    writes can be added to it via handle_write_finished as the retry_list
    is processed. These writes contribute to nr_pending but are not included
    in nr_queued.
    
    If a later entry on the retry_list triggers a call to handle_read_error,
    freeze array hangs waiting for nr_pending == nr_queued+extra. The writes
    on the bio_end_io_list aren't included in nr_queued so the condition will
    never be satisfied.
    
    To prevent the hang, include bio_end_io_list writes in nr_queued.
    
    There's probably a better way to handle decrementing nr_queued, but this
    seemed like the safest way to avoid breaking surrounding code.
    
    I'm happy to supply the script I used to repro this hang.
    
    Fixes: 55ce74d4(md/raid1: ensure device failure recorded before write request returns.)
    Signed-off-by: default avatarNate Dailey <nate.dailey@stratus.com>
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
    11266d29
raid1.c 76.4 KB