1. 02 Aug, 2011 19 commits
  2. 28 Jul, 2011 21 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://neil.brown.name/md · 6140333d
      Linus Torvalds authored
      * 'for-linus' of git://neil.brown.name/md: (75 commits)
        md/raid10: handle further errors during fix_read_error better.
        md/raid10: Handle read errors during recovery better.
        md/raid10: simplify read error handling during recovery.
        md/raid10: record bad blocks due to write errors during resync/recovery.
        md/raid10:  attempt to fix read errors during resync/check
        md/raid10:  Handle write errors by updating badblock log.
        md/raid10: clear bad-block record when write succeeds.
        md/raid10: avoid writing to known bad blocks on known bad drives.
        md/raid10 record bad blocks as needed during recovery.
        md/raid10: avoid reading known bad blocks during resync/recovery.
        md/raid10 - avoid reading from known bad blocks - part 3
        md/raid10: avoid reading from known bad blocks - part 2
        md/raid10: avoid reading from known bad blocks - part 1
        md/raid10: Split handle_read_error out from raid10d.
        md/raid10: simplify/reindent some loops.
        md/raid5: Clear bad blocks on successful write.
        md/raid5.  Don't write to known bad block on doubtful devices.
        md/raid5: write errors should be recorded as bad blocks if possible.
        md/raid5: use bad-block log to improve handling of uncorrectable read errors.
        md/raid5: avoid reading from known bad blocks.
        ...
      6140333d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · 6f56c218
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
        sound: oss: rename local change_bits to avoid powerpc bitsops.h definition
        ALSA: hda - Fix duplicated DAC assignments for Realtek
        ALSA: asihpi - off by one in asihpi_hpi_ioctl()
        ALSA: hda - Fix Oops with Realtek quirks with NULL adc_nids
        ALSA: asihpi - bug fix pa use before init.
        ALSA: hda - Add support for vref-out based mute LED control on IDT codecs
      6f56c218
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of... · 95b68865
      Linus Torvalds authored
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (54 commits)
        tpm_nsc: Fix bug when loading multiple TPM drivers
        tpm: Move tpm_tis_reenable_interrupts out of CONFIG_PNP block
        tpm: Fix compilation warning when CONFIG_PNP is not defined
        TOMOYO: Update kernel-doc.
        tpm: Fix a typo
        tpm_tis: Probing function for Intel iTPM bug
        tpm_tis: Fix the probing for interrupts
        tpm_tis: Delay ACPI S3 suspend while the TPM is busy
        tpm_tis: Re-enable interrupts upon (S3) resume
        tpm: Fix display of data in pubek sysfs entry
        tpm_tis: Add timeouts sysfs entry
        tpm: Adjust interface timeouts if they are too small
        tpm: Use interface timeouts returned from the TPM
        tpm_tis: Introduce durations sysfs entry
        tpm: Adjust the durations if they are too small
        tpm: Use durations returned from TPM
        TOMOYO: Enable conditional ACL.
        TOMOYO: Allow using argv[]/envp[] of execve() as conditions.
        TOMOYO: Allow using executable's realpath and symlink's target as conditions.
        TOMOYO: Allow using owner/group etc. of file objects as conditions.
        ...
      
      Fix up trivial conflict in security/tomoyo/realpath.c
      95b68865
    • NeilBrown's avatar
      md/raid10: handle further errors during fix_read_error better. · 58c54fcc
      NeilBrown authored
      If we find more read/write errors we should record a bad block before
      failing the device.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      58c54fcc
    • NeilBrown's avatar
      md/raid10: Handle read errors during recovery better. · 5e570289
      NeilBrown authored
      Currently when we get a read error during recovery, we simply abort
      the recovery.
      
      Instead, repeat the read in page-sized blocks.
      On successful reads, write to the target.
      On read errors, record a bad block on the destination,
      and only if that fails do we abort the recovery.
      
      As we now retry reads we need to know where we read from.  This was in
      bi_sector but that can be changed during a read attempt.
      So store the correct from_addr and to_addr in the r10_bio for later
      access.
      
      
      Signed-off-by: NeilBrown<neilb@suse.de>
      5e570289
    • NeilBrown's avatar
      md/raid10: simplify read error handling during recovery. · e684e41d
      NeilBrown authored
      If a read error is detected during recovery the code currently
      fails the read device.
      This isn't really necessary.  recovery_request_write will signal
      a write error to end_sync_write and it will record a write
      error on the destination device which will record a bad block
      there or kick it from the array.
      
      So just remove this call to do md_error.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e684e41d
    • NeilBrown's avatar
      md/raid10: record bad blocks due to write errors during resync/recovery. · 1a0b7cd8
      NeilBrown authored
      If we get a write error during resync/recovery don't fail the device
      but instead record a bad block.  If that fails we can then fail the
      device.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      1a0b7cd8
    • NeilBrown's avatar
      md/raid10: attempt to fix read errors during resync/check · f84ee364
      NeilBrown authored
      We already attempt to fix read errors found during normal IO
      and a 'repair' process.
      It is best to try to repair them at any time they are found,
      so move a test so that during sync and check a read error will
      be corrected by over-writing with good data.
      
      If both (all) devices have known bad blocks in the sync section we
      won't try to fix even though the bad blocks might not overlap.  That
      should be considered later.
      
      Also if we hit a read error during recovery we don't try to fix it.
      It would only be possible to fix if there were at least three copies
      of data, which is not very common with RAID10.  But it should still
      be considered later.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      f84ee364
    • NeilBrown's avatar
      md/raid10: Handle write errors by updating badblock log. · bd870a16
      NeilBrown authored
      When we get a write error (in the data area, not in metadata),
      update the badblock log rather than failing the whole device.
      
      As the write may well be many blocks, we trying writing each
      block individually and only log the ones which fail.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      bd870a16
    • NeilBrown's avatar
      md/raid10: clear bad-block record when write succeeds. · 749c55e9
      NeilBrown authored
      If we succeed in writing to a block that was recorded as
      being bad, we clear the bad-block record.
      
      This requires some delayed handling as the bad-block-list update has
      to happen in process-context.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      749c55e9
    • NeilBrown's avatar
      md/raid10: avoid writing to known bad blocks on known bad drives. · d4432c23
      NeilBrown authored
      Writing to known bad blocks on drives that have seen a write error
      is asking for trouble.  So try to avoid these blocks.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      d4432c23
    • NeilBrown's avatar
      md/raid10 record bad blocks as needed during recovery. · e875ecea
      NeilBrown authored
      When recovering one or more devices, if all the good devices have
      bad blocks we should record a bad block on the device being rebuilt.
      
      If this fails, we need to abort the recovery.
      
      To ensure we don't think that we aborted later than we actually did,
      we need to move the check for MD_RECOVERY_INTR earlier in md_do_sync,
      in particular before mddev->curr_resync is updated.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e875ecea
    • NeilBrown's avatar
      md/raid10: avoid reading known bad blocks during resync/recovery. · 40c356ce
      NeilBrown authored
      During resync/recovery limit the size of the request to avoid
      reading into a bad block that does not start at-or-before the current
      read address.
      
      Similarly if there is a bad block at this address, don't allow the
      current request to extend beyond the end of that bad block.
      
      Now that we don't ever read from known bad blocks, it is safe to allow
      devices with those blocks into the array.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      40c356ce
    • NeilBrown's avatar
      md/raid10 - avoid reading from known bad blocks - part 3 · 8dbed5ce
      NeilBrown authored
      When attempting to repair a read error, don't read from
      devices with a known bad block.
      
      As we are only reading PAGE_SIZE blocks, we don't try to
      narrow down to smaller regions in the hope that only part of this
      page is bad - it isn't worth the effort.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      8dbed5ce
    • NeilBrown's avatar
      md/raid10: avoid reading from known bad blocks - part 2 · 7399c31b
      NeilBrown authored
      When redirecting a read error to a different device, we must
      again avoid bad blocks and possibly split the request.
      
      Spin_lock typo fixed thanks to Dan Carpenter <error27@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7399c31b
    • NeilBrown's avatar
      md/raid10: avoid reading from known bad blocks - part 1 · 856e08e2
      NeilBrown authored
      This patch just covers the basic read path:
       1/ read_balance needs to check for badblocks, and return not only
          the chosen slot, but also how many good blocks are available
          there.
       2/ read submission must be ready to issue multiple reads to
          different devices as different bad blocks on different devices
          could mean that a single large read cannot be served by any one
          device, but can still be served by the array.
          This requires keeping count of the number of outstanding requests
          per bio.  This count is stored in 'bi_phys_segments'
      
      On read error we currently just fail the request if another target
      cannot handle the whole request.  Next patch refines that a bit.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      856e08e2
    • NeilBrown's avatar
      md/raid10: Split handle_read_error out from raid10d. · 560f8e55
      NeilBrown authored
      raid10d() is too big and is about to get bigger, so split
      handle_read_error() out as a separate function.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      560f8e55
    • NeilBrown's avatar
      md/raid10: simplify/reindent some loops. · 1294b9c9
      NeilBrown authored
      When a loop ends with a large if, it can be neater to change the
      if to invert the condition and just 'continue'.
      Then the body of the if can be indented to a lower level.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      1294b9c9
    • NeilBrown's avatar
      md/raid5: Clear bad blocks on successful write. · b84db560
      NeilBrown authored
      On a successful write to a known bad block, flag the sh
      so that raid5d can remove the known bad block from the list.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      b84db560
    • NeilBrown's avatar
      md/raid5. Don't write to known bad block on doubtful devices. · 73e92e51
      NeilBrown authored
      If a device has seen write errors, don't write to any known
      bad blocks on that device.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      73e92e51
    • NeilBrown's avatar
      md/raid5: write errors should be recorded as bad blocks if possible. · bc2607f3
      NeilBrown authored
      When a write error is detected, don't mark the device as failed
      immediately but rather record the fact for handle_stripe to deal with.
      
      Handle_stripe then attempts to record a bad block.  Only if that fails
      does the device get marked as faulty.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      bc2607f3