Commits · 7a90484825680e7831856105f5fef654e6c02701 · Kirill Smelkov / linux

05 Mar, 2012 1 commit

md/raid10: fix assembling of arrays with replacement devices. · 7a904848

NeilBrown authored Mar 05, 2012

commit 56a2559b (md/raid10: recognise replacements ...)
changed 'run' to set ->replacement or ->rdev depending on the
'Replacement' status if the device, but it didn't remove the
old unconditional setting of 'rdev'.  So it was largely ineffective.

So remove that now.
Signed-off-by: NeilBrown <neilb@suse.de>

7a904848

14 Feb, 2012 1 commit

md/raid10: fix handling of error on last working device in array. · fae8cc5e

NeilBrown authored Feb 14, 2012

If we get a read error on the last working device in a RAID10 which
contains the target block, then we don't fail the device (which is
good) but we don't abort retries, which is wrong.
We end up in an infinite loop retrying the read on the one device.

This patch fixes the problem in two places:
1/ in raid10_end_read_request we don't even ask for a retry if this
   was the last usable device.  This is efficient but a little racy
   and will sometimes retry when it should not.

2/ in handle_read_error we are careful to exclude any device from
   retry which we tried to mark as faulty (that might have failed if
   it was the last device).  This is race-free but less efficient.
Signed-off-by: NeilBrown <neilb@suse.de>

fae8cc5e

13 Feb, 2012 1 commit

md/raid1: fix buglet in md_raid1_contested. · f53e29fc

NeilBrown authored Feb 13, 2012

Since we added 'replacement' capability, RAID1 can have twice
as many devices as ->raid_disks indicates.
So md_raid1_congested needs to check that many possible devices,
not just ->raid_disks many.
Signed-off-by: NeilBrown <neilb@suse.de>

f53e29fc

07 Feb, 2012 1 commit

md: two small fixes to handling interrupt resync. · db91ff55

NeilBrown authored Feb 07, 2012

1/ If a resync is aborted we should record how far we got
 (recovery_cp) the last request that we know has completed
 (->curr_resync_completed) rather than the last request that was
 submitted (->curr_resync).

2/ When a resync aborts we still want to update the metadata with
 any changes, so set MD_CHANGE_DEVS even if we 'skip'.
Signed-off-by: NeilBrown <neilb@suse.de>

db91ff55

30 Jan, 2012 1 commit

Prevent DM RAID from loading bitmap twice. · 34f8ac6d

Jonathan Brassow authored Jan 27, 2012

The life cycle of a device-mapper target is:
1) create
2) resume
3) suspend
*) possibly repeat from 2
4) destroy

The dm-raid target is unconditionally calling MD's bitmap_load function upon
every resume.  If steps 2 & 3 above are repeated, bitmap_load is called
multiple times.  It is only written to be called once; otherwise, it allocates
new memory for the bitmap (without freeing the old) and incrementing the number
of pages it thinks it has without zeroing first.  This ultimately leads to
access beyond allocated memory and lost memory.

Simply avoiding the bitmap_load call upon resume is not sufficient.  If the
target was suspended while the initial recovery was only partially complete,
it needs to be restarted when the target is resumed.  This is why
'md_wakeup_thread' is called before issuing the 'mddev_resume'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

34f8ac6d

10 Jan, 2012 2 commits

md/raid1: perform bad-block tests for WriteMostly devices too. · 307729c8

NeilBrown authored Jan 09, 2012

We normally try to avoid reading from write-mostly devices, but when
we do we really have to check for bad blocks and be sure not to
try reading them.

With the current code, best_good_sectors might not get set and that
causes zero-length read requests to be send down which is very
confusing.

This bug was introduced in commit d2eb35ac and so the patch
is suitable for 3.1.x and 3.2.x
Reported-and-tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Reported-and-tested-by: Art -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org

307729c8

md: notify the 'degraded' sysfs attribute on failure. · f2a371c5

NeilBrown authored Jan 09, 2012

We currently only 'notify' changes to the 'degraded' attribute
when it decreases, not when it increases.

Notifying on failure is a little awkward as it happen in
interrupt context.
So instead, notify when we remove the failed device from the array,
which is very soon afterwards.
Reported-and-tested-by: Mikhail Balabin <mbalabin@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>