1. 12 Feb, 2015 1 commit
    • NeilBrown's avatar
      md/raid10: fix conversion from RAID0 to RAID10 · 53a6ab4d
      NeilBrown authored
      A RAID0 array (like a LINEAR array) does not have a concept
      of 'size' being the amount of each device that is in use.
      Rather, as much of each device as is available is used.
      So the 'size' is set to 0 and ignored.
      
      RAID10 does have this concept and needs it to be set correctly.
      So when we convert RAID0 to RAID10 we must determine the
      'size' (that being the size of the first 'strip_zone' in the
      RAID0), and set it correctly.
      Reported-and-tested-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      53a6ab4d
  2. 05 Feb, 2015 13 commits
  3. 03 Feb, 2015 14 commits
    • NeilBrown's avatar
      md: protect ->pers changes with mddev->lock · 36d091f4
      NeilBrown authored
      ->pers is already protected by ->reconfig_mutex, and
      cannot possibly change when there are threads running or
      outstanding IO.
      
      However there are some places where we access ->pers
      not in a thread or IO context, and where ->reconfig_mutex
      is unnecessarily heavy-weight:  level_show and md_seq_show().
      
      So protect all changes, and those accesses, with ->lock.
      This is a step toward taking those accesses out from under
      reconfig_mutex.
      
      [Fixed missing "mddev->pers" -> "pers" conversion, thanks to
       Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      36d091f4
    • NeilBrown's avatar
      md: level_store: group all important changes into one place. · db721d32
      NeilBrown authored
      Gather all the changes that can happen atomically and might
      be relevant to other code into one place.  This will
      make it easier to refine the locking.
      
      Note that this puts quite a few things between mddev_detach()
      and ->free().  Enabling this was the point of some recent patches.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      db721d32
    • NeilBrown's avatar
      md: rename ->stop to ->free · afa0f557
      NeilBrown authored
      Now that the ->stop function only frees the private data,
      rename is accordingly.
      
      Also pass in the private pointer as an arg rather than using
      mddev->private.  This flexibility will be useful in level_store().
      
      Finally, don't clear ->private.  It doesn't make sense to clear
      it seeing that isn't what we free, and it is no longer necessary
      to clear ->private (it was some time ago before  ->to_remove was
      introduced).
      
      Setting ->to_remove in ->free() is a bit of a wart, but not a
      big problem at the moment.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      afa0f557
    • NeilBrown's avatar
      md: split detach operation out from ->stop. · 5aa61f42
      NeilBrown authored
      Each md personality has a 'stop' operation which does two
      things:
       1/ it finalizes some aspects of the array to ensure nothing
          is accessing the ->private data
       2/ it frees the ->private data.
      
      All the steps in '1' can apply to all arrays and so can be
      performed in common code.
      
      This is useful as in the case where we change the personality which
      manages an array (in level_store()), it would be helpful to do
      step 1 early, and step 2 later.
      
      So split the 'step 1' functionality out into a new mddev_detach().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      5aa61f42
    • NeilBrown's avatar
      md/linear: remove rcu protections in favour of suspend/resume · 3be260cc
      NeilBrown authored
      The use of 'rcu' to protect accesses to ->private_data so that
      the ->private_data could be updated predates the introduction
      of mddev_suspend/mddev_resume.
      These are a cleaner mechanism for providing stability while
      swapping in a new ->private data - it is used by level_store()
      to support changing of raid levels.
      
      So get rid of the RCU stuff and just use mddev_suspend, mddev_resume.
      
      As these function call ->quiesce(), we add an empty function for
      linear just like for raid0.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      3be260cc
    • NeilBrown's avatar
      md: make merge_bvec_fn more robust in face of personality changes. · 64590f45
      NeilBrown authored
      There is no locking around calls to merge_bvec_fn(), so
      it is possible that calls which coincide with a level (or personality)
      change could go wrong.
      
      So create a central dispatch point for these functions and use
      rcu_read_lock().
      If the array is suspended, reject any merge that can be rejected.
      If not, we know it is safe to call the function.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      64590f45
    • NeilBrown's avatar
      md: make ->congested robust against personality changes. · 5c675f83
      NeilBrown authored
      There is currently no locking around calls to the 'congested'
      bdi function.  If called at an awkward time while an array is
      being converted from one level (or personality) to another, there
      is a tiny chance of running code in an unreferenced module etc.
      
      So add a 'congested' function to the md_personality operations
      structure, and call it with appropriate locking from a central
      'mddev_congested'.
      
      When the array personality is changing the array will be 'suspended'
      so no IO is processed.
      If mddev_congested detects this, it simply reports that the
      array is congested, which is a safe guess.
      As mddev_suspend calls synchronize_rcu(), mddev_congested can
      avoid races by included the whole call inside an rcu_read_lock()
      region.
      This require that the congested functions for all subordinate devices
      can be run under rcu_lock.  Fortunately this is the case.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      5c675f83
    • NeilBrown's avatar
      md: rename mddev->write_lock to mddev->lock · 85572d7c
      NeilBrown authored
      This lock is used for (slightly) more than helping with writing
      superblocks, and it will soon be extended further.  So the
      name is inappropriate.
      
      Also, the _irq variant hasn't been needed since 2.6.37 as it is
      never taking from interrupt or bh context.
      
      So:
        -rename write_lock to lock
        -document what it protects
        -remove _irq ... except in md_flush_request() as there
           is no wait_event_lock() (with no _irq).  This can be
           cleaned up after appropriate changes to wait.h.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      85572d7c
    • NeilBrown's avatar
      md/raid5: need_this_block: tidy/fix last condition. · ea664c82
      NeilBrown authored
      That last condition is unclear and over cautious.
      
      There are two related issues here.
      
      If a partial write is destined for a missing device, then
      either RMW or RCW can work.  We must read all the available
      block.  Only then can the missing blocks be calculated, and
      then the parity update performed.
      
      If RMW is not an option, then there is a complication even
      without partial writes.  If we would need to read a missing
      device to perform the reconstruction, then we must first read every
      block so the missing device data can be computed.
      This is the case for RAID6 (Which currently does not support
      RMW) and for times when we don't trust the parity (after a crash)
      and so are in the process of resyncing it.
      
      So make these two cases more clear and separate, and perform
      the relevant tests more  thoroughly.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      ea664c82
    • NeilBrown's avatar
      md/raid5: need_this_block: start simplifying the last two conditions. · a9d56950
      NeilBrown authored
      Both the last two cases are only relevant if something has failed and
      something needs to be written (but not over-written), and if it is OK
      to pre-read blocks at this point.  So factor out those tests and
      explain them.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a9d56950
    • NeilBrown's avatar
      md/raid5: separate out the easy conditions in need_this_block. · a79cfe12
      NeilBrown authored
      Some of the conditions in need_this_block have very straight
      forward motivation.  Separate those out and document them.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a79cfe12
    • NeilBrown's avatar
      md/raid5: separate large if clause out of fetch_block(). · 2c58f06e
      NeilBrown authored
      fetch_block() has a very large and hard to read 'if' condition.
      
      Separate it into its own function so that it can be
      made more readable.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      2c58f06e
    • Jes Sorensen's avatar
      md: do_release_stripe(): No need to call md_wakeup_thread() twice · ad3ab8b6
      Jes Sorensen authored
      67f45548 introduced a call to
      md_wakeup_thread() when adding to the delayed_list. However the md
      thread is woken up unconditionally just below.
      
      Remove the unnecessary wakeup call.
      Signed-off-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      ad3ab8b6
    • Jan Beulich's avatar
      x86/raid6: correctly check for assembler capabilities · 75aaf4c3
      Jan Beulich authored
      Just like for AVX2 (which simply needs an #if -> #ifdef conversion),
      SSSE3 assembler support should be checked for before using it.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      75aaf4c3
  4. 02 Feb, 2015 2 commits
  5. 27 Jan, 2015 10 commits