• Qu Wenruo's avatar
    btrfs: raid56: introduce btrfs_raid_bio::error_bitmap · 2942a50d
    Qu Wenruo authored
    Currently btrfs raid56 uses btrfs_raid_bio::faila and failb to indicate
    which stripe(s) had IO errors.
    
    But that has some problems:
    
    - If one sector failed csum check, the whole stripe where the corruption
      is will be marked error.
      This can reduce the chance we do recover, like this:
    
              0  4K 8K
      Data 1  |XX|  |
      Data 2  |  |XX|
      Parity  |  |  |
    
      In above case, 0~4K in data 1 should be recovered using data 2 and
      parity, while 4K~8K in data 2 should be recovered using data 1 and
      parity.
    
      Currently if we trigger read on 0~4K of data 1, we will also recover
      4K~8K of data 1 using corrupted data 2 and parity, causing wrong
      result in rbio cache.
    
    - Harder to expand for future M-N scheme
      As we're limited to just faila/b, two corruptions.
    
    - Harder to expand to handle extra csum errors
      This can be problematic if we start to do csum verification.
    
    This patch will introduce an extra @error_bitmap, where one bit
    represents error that happened for that sector.
    
    The choice to introduce a new error bitmap other than reusing
    sector_ptr, is to avoid extra search between rbio::stripe_sectors[] and
    rbio::bio_sectors[].
    
    Since we can submit bio using sectors from both sectors, doing proper
    search on both array will more complex.
    
    Although the new bitmap will take extra memory, later we can remove
    things like @error and faila/b to save some memory.
    
    Currently the new error bitmap and failab mechanism coexists, the error
    bitmap is only updated at endio time and recover entrance.
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    2942a50d
raid56.c 70.7 KB