• Chris Mason's avatar
    Btrfs: Use PagePrivate2 to track pages in the data=ordered code. · 8b62b72b
    Chris Mason authored
    Btrfs writes go through delalloc to the data=ordered code.  This
    makes sure that all of the data is on disk before the metadata
    that references it.  The tracking means that we have to make sure
    each page in an extent is fully written before we add that extent into
    the on-disk btree.
    
    This was done in the past by setting the EXTENT_ORDERED bit for the
    range of an extent when it was added to the data=ordered code, and then
    clearing the EXTENT_ORDERED bit in the extent state tree as each page
    finished IO.
    
    One of the reasons we had to do this was because sometimes pages are
    magically dirtied without page_mkwrite being called.  The EXTENT_ORDERED
    bit is checked at writepage time, and if it isn't there, our page become
    dirty without going through the proper path.
    
    These bit operations make for a number of rbtree searches for each page,
    and can cause considerable lock contention.
    
    This commit switches from the EXTENT_ORDERED bit to use PagePrivate2.
    As pages go into the ordered code, PagePrivate2 is set on each one.
    This is a cheap operation because we already have all the pages locked
    and ready to go.
    
    As IO finishes, the PagePrivate2 bit is cleared and the ordered
    accoutning is updated for each page.
    
    At writepage time, if the PagePrivate2 bit is missing, we go into the
    writepage fixup code to handle improperly dirtied pages.
    Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    8b62b72b
extent_io.h 10.8 KB