• Qu Wenruo's avatar
    btrfs: update stripe_sectors::uptodate in steal_rbio · 4d100466
    Qu Wenruo authored
    [BUG]
    With added debugging, it turns out the following write sequence would
    cause extra read which is unnecessary:
    
      # xfs_io -f -s -c "pwrite -b 32k 0 32k" -c "pwrite -b 32k 32k 32k" \
    		 -c "pwrite -b 32k 64k 32k" -c "pwrite -b 32k 96k 32k" \
    		 $mnt/file
    
    The debug message looks like this (btrfs header skipped):
    
      partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=32768 physical=323059712 len=32768
      partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536
      full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=32768
      full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=32768
      partial rmw, full stripe=298844160 opf=0x0 devid=1 type=1 offset=32768 physical=22052864 len=32768
      partial rmw, full stripe=298844160 opf=0x0 devid=2 type=2 offset=0 physical=277872640 len=65536
      full stripe rmw, full stripe=298844160 opf=0x1 devid=1 type=1 offset=0 physical=22020096 len=32768
      full stripe rmw, full stripe=298844160 opf=0x1 devid=3 type=-1 offset=0 physical=277872640 len=32768
      partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=0 physical=323026944 len=32768
      partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536
      ^^^^
       Still partial read, even 389152768 is already cached by the first.
       write.
    
      full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=32768 physical=323059712 len=32768
      full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=32768 physical=323059712 len=32768
      partial rmw, full stripe=298844160 opf=0x0 devid=1 type=1 offset=0 physical=22020096 len=32768
      partial rmw, full stripe=298844160 opf=0x0 devid=2 type=2 offset=0 physical=277872640 len=65536
      ^^^^
       Still partial read for 298844160.
    
      full stripe rmw, full stripe=298844160 opf=0x1 devid=1 type=1 offset=32768 physical=22052864 len=32768
      full stripe rmw, full stripe=298844160 opf=0x1 devid=3 type=-1 offset=32768 physical=277905408 len=32768
    
    This means every 32K writes, even they are in the same full stripe,
    still trigger read for previously cached data.
    
    This would cause extra RAID56 IO, making the btrfs raid56 cache useless.
    
    [CAUSE]
    Commit d4e28d9b ("btrfs: raid56: make steal_rbio() subpage
    compatible") tries to make steal_rbio() subpage compatible, but during
    that conversion, there is one thing missing.
    
    We no longer rely on PageUptodate(rbio->stripe_pages[i]), but
    rbio->stripe_nsectors[i].uptodate to determine if a sector is uptodate.
    
    This means, previously if we switch the pointer, everything is done,
    as the PageUptodate flag is still bound to that page.
    
    But now we have to manually mark the involved sectors uptodate, or later
    raid56_rmw_stripe() will find the stolen sector is not uptodate, and
    assemble the read bio for it, wasting IO.
    
    [FIX]
    We can easily fix the bug, by also update the
    rbio->stripe_sectors[].uptodate in steal_rbio().
    
    With this fixed, now the same write pattern no longer leads to the same
    unnecessary read:
    
      partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=32768 physical=323059712 len=32768
      partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536
      full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=32768
      full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=32768
      partial rmw, full stripe=298844160 opf=0x0 devid=1 type=1 offset=32768 physical=22052864 len=32768
      partial rmw, full stripe=298844160 opf=0x0 devid=2 type=2 offset=0 physical=277872640 len=65536
      full stripe rmw, full stripe=298844160 opf=0x1 devid=1 type=1 offset=0 physical=22020096 len=32768
      full stripe rmw, full stripe=298844160 opf=0x1 devid=3 type=-1 offset=0 physical=277872640 len=32768
      ^^^ No more partial read, directly into the write path.
      full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=32768 physical=323059712 len=32768
      full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=32768 physical=323059712 len=32768
      full stripe rmw, full stripe=298844160 opf=0x1 devid=1 type=1 offset=32768 physical=22052864 len=32768
      full stripe rmw, full stripe=298844160 opf=0x1 devid=3 type=-1 offset=32768 physical=277905408 len=32768
    
    Fixes: d4e28d9b ("btrfs: raid56: make steal_rbio() subpage compatible")
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    4d100466
raid56.c 74.1 KB