1. 07 Aug, 2019 10 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of https://github.com/liu-song-6/linux into for-5.4/block · e8fc87f6
      Jens Axboe authored
      Pull MD changes from Song.
      
      * 'md-next' of https://github.com/liu-song-6/linux:
        raid1: factor out a common routine to handle the completion of sync write
        md: don't call spare_active in md_reap_sync_thread if all member devices can't work
        md: don't set In_sync if array is frozen
        md: allow last device to be forcibly removed from RAID1/RAID10.
        md: Convert to use int_pow()
        md/raid10: end bio when the device faulty
        md/raid1: end bio when the device faulty
        md/raid6: Set R5_ReadError when there is read failure on parity disk
        raid1: use an int as the return value of raise_barrier()
      e8fc87f6
    • Hou Tao's avatar
      raid1: factor out a common routine to handle the completion of sync write · 449808a2
      Hou Tao authored
      It's just code clean-up.
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      449808a2
    • Guoqing Jiang's avatar
      md: don't call spare_active in md_reap_sync_thread if all member devices can't work · 0d8ed0e9
      Guoqing Jiang authored
      When add one disk to array, the md_reap_sync_thread is responsible
      to activate the spare and set In_sync flag for the new member in
      spare_active().
      
      But if raid1 has one member disk A, and disk B is added to the array.
      Then we offline A before all the datas are synchronized from A to B,
      obviously B doesn't have the latest data as A, but B is still marked
      with In_sync flag.
      
      So let's not call spare_active under the condition, otherwise B is
      still showed with 'U' state which is not correct.
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      0d8ed0e9
    • Guoqing Jiang's avatar
      md: don't set In_sync if array is frozen · 062f5b2a
      Guoqing Jiang authored
      When a disk is added to array, the following path is called in mdadm.
      
      Manage_subdevs -> sysfs_freeze_array
                     -> Manage_add
                     -> sysfs_set_str(&info, NULL, "sync_action","idle")
      
      Then from kernel side, Manage_add invokes the path (add_new_disk ->
      validate_super = super_1_validate) to set In_sync flag.
      
      Since In_sync means "device is in_sync with rest of array", and the new
      added disk need to resync thread to help the synchronization of data.
      And md_reap_sync_thread would call spare_active to set In_sync for the
      new added disk finally. So don't set In_sync if array is in frozen.
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      062f5b2a
    • Guoqing Jiang's avatar
      md: allow last device to be forcibly removed from RAID1/RAID10. · 9a567843
      Guoqing Jiang authored
      When the 'last' device in a RAID1 or RAID10 reports an error,
      we do not mark it as failed.  This would serve little purpose
      as there is no risk of losing data beyond that which is obviously
      lost (as there is with RAID5), and there could be other sectors
      on the device which are readable, and only readable from this device.
      This in general this maximises access to data.
      
      However the current implementation also stops an admin from removing
      the last device by direct action.  This is rarely useful, but in many
      case is not harmful and can make automation easier by removing special
      cases.
      
      Also, if an attempt to write metadata fails the device must be marked
      as faulty, else an infinite loop will result, attempting to update
      the metadata on all non-faulty devices.
      
      So add 'fail_last_dev' member to 'struct mddev', then we can bypasses
      the 'last disk' checks for RAID1 and RAID10, and control the behavior
      per array by change sysfs node.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      [add sysfs node for fail_last_dev by Guoqing]
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      9a567843
    • Andy Shevchenko's avatar
      md: Convert to use int_pow() · cf891607
      Andy Shevchenko authored
      Instead of linear approach to calculate power of 10, use generic int_pow()
      which does it better.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      cf891607
    • Yufen Yu's avatar
      md/raid10: end bio when the device faulty · 7cee6d4e
      Yufen Yu authored
      Just like raid1, we do not queue write error bio to retry write
      and acknowlege badblocks, when the device is faulty.
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      7cee6d4e
    • Yufen Yu's avatar
      md/raid1: end bio when the device faulty · eeba6809
      Yufen Yu authored
      When write bio return error, it would be added to conf->retry_list
      and wait for raid1d thread to retry write and acknowledge badblocks.
      
      In narrow_write_error(), the error bio will be split in the unit of
      badblock shift (such as one sector) and raid1d thread issues them
      one by one. Until all of the splited bio has finished, raid1d thread
      can go on processing other things, which is time consuming.
      
      But, there is a scene for error handling that is not necessary.
      When the device has been set faulty, flush_bio_list() may end
      bios in pending_bio_list with error status. Since these bios
      has not been issued to the device actually, error handlding to
      retry write and acknowledge badblocks make no sense.
      
      Even without that scene, when the device is faulty, badblocks info
      can not be written out to the device. Thus, we also no need to
      handle the error IO.
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      eeba6809
    • Xiao Ni's avatar
      md/raid6: Set R5_ReadError when there is read failure on parity disk · 143f6e73
      Xiao Ni authored
      7471fb77 ("md/raid6: Fix anomily when recovering a single device in
      RAID6.") avoids rereading P when it can be computed from other members.
      However, this misses the chance to re-write the right data to P. This
      patch sets R5_ReadError if the re-read fails.
      
      Also, when re-read is skipped, we also missed the chance to reset
      rdev->read_errors to 0. It can fail the disk when there are many read
      errors on P member disk (other disks don't have read error)
      
      V2: upper layer read request don't read parity/Q data. So there is no
      need to consider such situation.
      
      This is Reported-by: kbuild test robot <lkp@intel.com>
      
      Fixes: 7471fb77 ("md/raid6: Fix anomily when recovering a single device in RAID6.")
      Cc: <stable@vger.kernel.org> #4.4+
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      143f6e73
    • Hou Tao's avatar
      raid1: use an int as the return value of raise_barrier() · 4675719d
      Hou Tao authored
      Using a sector_t as the return value is misleading, because
      raise_barrier() only return 0 or -EINTR.
      
      Also add comments for the return values of raise_barrier().
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      4675719d
  2. 06 Aug, 2019 4 commits
  3. 05 Aug, 2019 20 commits
  4. 04 Aug, 2019 6 commits