• Jonathan Brassow's avatar
    dm raid1: fix EIO after log failure · b80aa7a0
    Jonathan Brassow authored
    This patch adds the ability to requeue write I/O to
    core device-mapper when there is a log device failure.
    
    If a write to the log produces and error, the pending writes are
    put on the "failures" list.  Since the log is marked as failed,
    they will stay on the failures list until a suspend happens.
    
    Suspends come in two phases, presuspend and postsuspend.  We must
    make sure that all the writes on the failures list are requeued
    in the presuspend phase (a requirement of dm core).  This means
    that recovery must be complete (because writes may be delayed
    behind it) and the failures list must be requeued before we
    return from presuspend.
    
    The mechanisms to ensure recovery is complete (or stopped) was
    already in place, but needed to be moved from postsuspend to
    presuspend.  We rely on 'flush_workqueue' to ensure that the
    mirror thread is complete and therefore, has requeued all writes
    in the failures list.
    
    Because we are using flush_workqueue, we must ensure that no
    additional 'queue_work' calls will produce additional I/O
    that we need to requeue (because once we return from
    presuspend, we are unable to do anything about it).  'queue_work'
    is called in response to the following functions:
    - complete_resync_work = NA, recovery is stopped
    - rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it
                               is ready to recover the region
                               (recovery is stopped) or it needs
                               to clear the region in the log*
                               **this doesn't get called while
                               suspending**
    - rh_recovery_end = NA, recovery is stopped
    - rh_recovery_start = NA, recovery is stopped
    - write_callback = 1) Writes w/o failures simply call
                       bio_endio -> mirror_end_io -> rh_dec
                       (see rh_dec above)
                       2) Writes with failures are put on
                       the failures list and queue_work is
                       called**
                       ** write_callbacks don't happen
                       during suspend **
    - do_failures = NA, 'queue_work' not called if suspending
    - add_mirror (initialization) = NA, only done on mirror creation
    - queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue
                  is called.  2) No more I/Os are being issued.
                  3) Re-attempted READs can still be handled.
                  (Write completions are handled through rh_dec/
                  write_callback - mention above - and do not
                  use queue_bio.)
    Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
    Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
    b80aa7a0
dm-raid1.c 39.5 KB