1. 28 Aug, 2013 2 commits
    • Shaohua Li's avatar
      raid5: fix stripe release order · d265d9dc
      Shaohua Li authored
      patch "make release_stripe lockless" changes the order stripes are released.
      Originally I thought block layer can take care of request merge, but it appears
      there are still some requests not merged. It's easy to fix the order.
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      d265d9dc
    • Shaohua Li's avatar
      raid5: make release_stripe lockless · 773ca82f
      Shaohua Li authored
      release_stripe still has big lock contention. We just add the stripe to a llist
      without taking device_lock. We let the raid5d thread to do the real stripe
      release, which must hold device_lock anyway. In this way, release_stripe
      doesn't hold any locks.
      
      The side effect is the released stripes order is changed. But sounds not a big
      deal, stripes are never handled in order. And I thought block layer can already
      do nice request merge, which means order isn't that important.
      
      I kept the unplug release batch, which is unnecessary with this patch from lock
      contention avoid point of view, and actually if we delete it, the stripe_head
      release_list and lru can share storage. But the unplug release batch is also
      helpful for request merge. We probably can delay wakeup raid5d till unplug, but
      I'm still afraid of the case which raid5d is running.
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      773ca82f
  2. 27 Aug, 2013 7 commits
    • NeilBrown's avatar
      md: avoid deadlock when dirty buffers during md_stop. · 260fa034
      NeilBrown authored
      When the last process closes /dev/mdX sync_blockdev will be called so
      that all buffers get flushed.
      So if it is then opened for the STOP_ARRAY ioctl to be sent there will
      be nothing to flush.
      
      However if we open /dev/mdX in order to send the STOP_ARRAY ioctl just
      moments before some other process which was writing closes their file
      descriptor, then there won't be a 'last close' and the buffers might
      not get flushed.
      
      So do_md_stop() calls sync_blockdev().  However at this point it is
      holding ->reconfig_mutex.  So if the array is currently 'clean' then
      the writes from sync_blockdev() will not complete until the array
      can be marked dirty and that won't happen until some other thread
      can get ->reconfig_mutex.  So we deadlock.
      
      We need to move the sync_blockdev() call to before we take
      ->reconfig_mutex.
      However then some other thread could open /dev/mdX and write to it
      after we call sync_blockdev() and before we actually stop the array.
      This can leave dirty data in the page cache which is awkward.
      
      So introduce new flag MD_STILL_CLOSED.  Set it before calling
      sync_blockdev(), clear it if anyone does open the file, and abort the
      STOP_ARRAY attempt if it gets set before we lock against further
      opens.
      
      It is still possible to get problems if you open /dev/mdX, write to
      it, then issue the STOP_ARRAY ioctl.  Just don't do that.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      260fa034
    • NeilBrown's avatar
      md: Don't test all of mddev->flags at once. · 7a0a5355
      NeilBrown authored
      mddev->flags is mostly used to record if an update of the
      metadata is needed.  Sometimes the whole field is tested
      instead of just the important bits.  This makes it difficult
      to introduce more state bits.
      
      So replace all bare tests of mddev->flags with tests for the bits
      that actually need testing.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7a0a5355
    • Dave Jones's avatar
      md: Fix apparent cut-and-paste error in super_90_validate · c9ad020f
      Dave Jones authored
      Setting a variable to itself probably wasn't the intention here.
      Signed-off-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c9ad020f
    • Max Filippov's avatar
      raid6/test: replace echo -e with printf · c28399b5
      Max Filippov authored
      -e is a non-standard echo option, echo output is
      implementation-dependent when it is used. Replace echo -e with printf as
      suggested by POSIX echo manual.
      
      Cc: NeilBrown <neilb@suse.de>
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c28399b5
    • Ken Steele's avatar
      RAID: add tilegx SIMD implementation of raid6 · ae77cbc1
      Ken Steele authored
      This change adds TILE-Gx SIMD instructions to the software raid
      (md), modeling the Altivec implementation. This is only for Syndrome
      generation; there is more that could be done to improve recovery,
      as in the recent Intel SSE3 recovery implementation.
      
      The code unrolls 8 times; this turns out to be the best on tilegx
      hardware among the set 1, 2, 4, 8 or 16.  The code reads one
      cache-line of data from each disk, stores P and Q then goes to the
      next cache-line.
      
      The test code in sys/linux/lib/raid6/test reports 2008 MB/s data
      read rate for syndrome generation using 18 disks (16 data and 2
      parity). It was 1512 MB/s before this SIMD optimizations. This is
      running on 1 core with all the data in cache.
      
      This is based on the paper The Mathematics of RAID-6.
      (http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf).
      Signed-off-by: default avatarKen Steele <ken@tilera.com>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      ae77cbc1
    • NeilBrown's avatar
      md: fix safe_mode buglet. · 275c51c4
      NeilBrown authored
      Whe we set the safe_mode_timeout to a smaller value we trigger a timeout
      immediately - otherwise the small value might not be honoured.
      However if the previous timeout was 0 meaning "no timeout", we didn't.
      This would mean that no timeout happens until the next write completes,
      which could be a long time.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      275c51c4
    • NeilBrown's avatar
      md: don't call md_allow_write in get_bitmap_file. · 60559da4
      NeilBrown authored
      There is no really need as GFP_NOIO is very likely sufficient,
      and failure is not catastrophic.
      
      Calling md_allow_write here will convert a read-auto array to
      read/write which could be confusing when you are just performing
      a read operation.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      60559da4
  3. 26 Aug, 2013 1 commit
  4. 25 Aug, 2013 4 commits
  5. 24 Aug, 2013 8 commits
  6. 23 Aug, 2013 17 commits
  7. 22 Aug, 2013 1 commit