• Omar Sandoval's avatar
    btrfs: add BTRFS_IOC_ENCODED_READ ioctl · 1881fba8
    Omar Sandoval authored
    There are 4 main cases:
    
    1. Inline extents: we copy the data straight out of the extent buffer.
    2. Hole/preallocated extents: we fill in zeroes.
    3. Regular, uncompressed extents: we read the sectors we need directly
       from disk.
    4. Regular, compressed extents: we read the entire compressed extent
       from disk and indicate what subset of the decompressed extent is in
       the file.
    
    This initial implementation simplifies a few things that can be improved
    in the future:
    
    - Cases 1, 3, and 4 allocate temporary memory to read into before
      copying out to userspace.
    - We don't do read repair, because it turns out that read repair is
      currently broken for compressed data.
    - We hold the inode lock during the operation.
    
    Note that we don't need to hold the mmap lock. We may race with
    btrfs_page_mkwrite() and read the old data from before the page was
    dirtied:
    
    btrfs_page_mkwrite         btrfs_encoded_read
    ---------------------------------------------------
    (enter)                    (enter)
                               btrfs_wait_ordered_range
    lock_extent_bits
    btrfs_page_set_dirty
    unlock_extent_cached
    (exit)
                               lock_extent_bits
                               read extent (dirty page hasn't been flushed,
                                            so this is the old data)
                               unlock_extent_cached
                               (exit)
    
    we read the old data from before the page was dirtied. But, that's true
    even if we were to hold the mmap lock:
    
    btrfs_page_mkwrite               btrfs_encoded_read
    -------------------------------------------------------------------
    (enter)                          (enter)
                                     btrfs_inode_lock(BTRFS_ILOCK_MMAP)
    down_read(i_mmap_lock) (blocked)
                                     btrfs_wait_ordered_range
                                     lock_extent_bits
    				 read extent (page hasn't been dirtied,
                                                  so this is the old data)
                                     unlock_extent_cached
                                     btrfs_inode_unlock(BTRFS_ILOCK_MMAP)
    down_read(i_mmap_lock) returns
    lock_extent_bits
    btrfs_page_set_dirty
    unlock_extent_cached
    
    In other words, this is inherently racy, so it's fine that we return the
    old data in this tiny window.
    Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    1881fba8
inode.c 313 KB