1. 08 Feb, 2011 1 commit
    • Krzysztof Wojcik's avatar
      FIX: md: process hangs at wait_barrier after 0->10 takeover · 02214dc5
      Krzysztof Wojcik authored
      Following symptoms were observed:
      1. After raid0->raid10 takeover operation we have array with 2
      missing disks.
      When we add disk for rebuild, recovery process starts as expected
      but it does not finish- it stops at about 90%, md126_resync process
      hangs in "D" state.
      2. Similar behavior is when we have mounted raid0 array and we
      execute takeover to raid10. After this when we try to unmount array-
      it causes process umount hangs in "D"
      
      In scenarios above processes hang at the same function- wait_barrier
      in raid10.c.
      Process waits in macro "wait_event_lock_irq" until the
      "!conf->barrier" condition will be true.
      In scenarios above it never happens.
      
      Reason was that at the end of level_store, after calling pers->run,
      we call mddev_resume. This calls pers->quiesce(mddev, 0) with
      RAID10, that calls lower_barrier.
      However raise_barrier hadn't been called on that 'conf' yet,
      so conf->barrier becomes negative, which is bad.
      
      This patch introduces setting conf->barrier=1 after takeover
      operation. It prevents to become barrier negative after call
      lower_barrier().
      Signed-off-by: default avatarKrzysztof Wojcik <krzysztof.wojcik@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      02214dc5
  2. 07 Feb, 2011 1 commit
    • Chris Mason's avatar
      md_make_request: don't touch the bio after calling make_request · e91ece55
      Chris Mason authored
      md_make_request was calling bio_sectors() for part_stat_add
      after it was calling the make_request function.  This is
      bad because the make_request function can free the bio and
      because the bi_size field can change around.
      
      The fix here was suggested by Jens Axboe.  It saves the
      sector count before the make_request call.  I hit this
      with CONFIG_DEBUG_PAGEALLOC turned on while trying to break
      his pretty fusionio card.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e91ece55
  3. 02 Feb, 2011 1 commit
  4. 31 Jan, 2011 8 commits
    • NeilBrown's avatar
      md: don't clear curr_resync_completed at end of resync. · 7281f812
      NeilBrown authored
      There is no need to set this to zero at this point.  It will be
      set to zero by remove_and_add_spares or at the start of
      md_do_sync at the latest.
      And setting it to zero before MD_RECOVERY_RUNNING is cleared can
      make a 'zero' appear briefly in the 'sync_completed' sysfs attribute
      just as resync is finishing.
      
      So simply remove this setting to zero.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7281f812
    • NeilBrown's avatar
      md: Don't use remove_and_add_spares to remove failed devices from a read-only array · a8c42c7f
      NeilBrown authored
      remove_and_add_spares is called in two places where the needs really
      are very different.
      remove_and_add_spares should not be called on an array which is about
      to be reshaped as some extra devices might have been manually added
      and that would remove them.  However if the array is 'read-auto',
      that will currently happen, which is bad.
      
      So in the 'ro != 0' case don't call remove_and_add_spares but simply
      remove the failed devices as the comment suggests is needed.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a8c42c7f
    • Krzysztof Wojcik's avatar
      Add raid1->raid0 takeover support · fc3a08b8
      Krzysztof Wojcik authored
      This patch introduces raid 1 to raid0 takeover operation
      in kernel space.
      Signed-off-by: default avatarKrzysztof Wojcik <krzysztof.wojcik@intel.com>
      Signed-off-by: default avatarNeil Brown <neilb@nbeee.brown>
      fc3a08b8
    • NeilBrown's avatar
      md: Remove the AllReserved flag for component devices. · f21e9ff7
      NeilBrown authored
      This flag is not needed and is used badly.
      
      Devices that are included in a native-metadata array are reserved
      exclusively for that array - and currently have AllReserved set.
      They all are bd_claimed for the rdev and so cannot be shared.
      
      Devices that are included in external-metadata arrays can be shared
      among multiple arrays - providing there is no overlap.
      These are bd_claimed for md in general - not for a particular rdev.
      
      When changing the amount of a device that is used in an array we need
      to check for overlap.  This currently includes a check on AllReserved
      So even without overlap, sharing with an AllReserved device is not
      allowed.
      However the bd_claim usage already precludes sharing with these
      devices, so the test on AllReserved is not needed.  And in fact it is
      wrong.
      
      As this is the only use of AllReserved, simply remove all usage and
      definition of AllReserved.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      f21e9ff7
    • NeilBrown's avatar
      md: don't abort checking spares as soon as one cannot be added. · 50da0840
      NeilBrown authored
      As spares can be added manually before a reshape starts, we need to
      find them all to mark some of them as in_sync.
      
      Previously we would abort looking for spares when we found an
      unallocated spare what could not be added to the array (implying there
      was no room for new spares).  However already-added spares could be
      later in the list, so we need to keep searching.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      50da0840
    • NeilBrown's avatar
      md: fix the test for finding spares in raid5_start_reshape. · 469518a3
      NeilBrown authored
      As spares can be added to the array before the reshape is started,
      we need to find and count them when checking there are enough.
      The array could have been degraded, so we need to check all devices,
      no just those out side of the range of devices in the array before
      the reshape.
      
      So instead of checking the index, check the In_sync flag as that
      reliably tells if the device is a spare or this purpose.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      469518a3
    • NeilBrown's avatar
      md: simplify some 'if' conditionals in raid5_start_reshape. · 87a8dec9
      NeilBrown authored
      There are two consecutive 'if' statements.
      
       if (mddev->delta_disks >= 0)
            ....
       if (mddev->delta_disks > 0)
      
      The code in the second is equally valid if delta_disks == 0, and these
      two statements are the only place that 'added_devices' is used.
      
      So make them a single if statement, make added_devices a local
      variable, and re-indent it all.
      
      No functional change.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      87a8dec9
    • NeilBrown's avatar
      md: revert change to raid_disks on failure. · de171cb9
      NeilBrown authored
      If we try to update_raid_disks and it fails, we should put
      'delta_disks' back to zero.  This is important because some code,
      such as slot_store, assumes that delta_disks has been validated.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      de171cb9
  5. 28 Jan, 2011 8 commits
  6. 27 Jan, 2011 14 commits
  7. 26 Jan, 2011 7 commits