• Boris Burkov's avatar
    btrfs: track data relocation with simple quota · 2672a051
    Boris Burkov authored
    Relocation data allocations are quite tricky for simple quotas. The
    basic data relocation sequence is (ignoring details that aren't relevant
    to this fix):
    
    - create a fake relocation data fs root
    - create a fake relocation inode in that root
    - for each data extent:
      - preallocate a data extent on behalf of the fake inode
      - copy over the data
    - for each extent
      - swap the refs so that the original file extent now refers to the new
        extent item
    - drop the fake root, dropping its refs on the old extents, which lets
      us delete them.
    
    Done naively, this results in storing an extent item in the extent tree
    whose owner_ref points at the relocation data root and a no-op squota
    recording, since the reloc root is not a legit fstree. So far, that's
    OK. The problem comes when you do the swap, and leave an extent item
    owned by this bogus root as the real permanent extents of the file. If
    the file then drops that ref, we free it and no-op account that against
    the fake relocation root. Essentially, this means that relocation is
    simple quota "extent laundering", since we re-own the extents into a
    fake root.
    
    Simple quotas very intentionally doesn't have a mechanism for
    transferring ownership of extents, as that is exactly the complicated
    thing we are trying to avoid with the new design. Further, it cannot be
    correctly done in this case, since at the time you create the new
    "real" refs, there is no way to know which was the original owner before
    relocation unless we track it.
    
    Therefore, it makes more sense to trick the preallocation to handle
    relocation as a special case and note the proper owner ref from the
    beginning. That way, we never write out an extent item without the
    correct owner ref that it will eventually have.
    
    This could be done by wiring a special root parameter all the way
    through the allocation code path, but to avoid that special case
    touching all the code, take advantage of the serial nature of relocation
    to store the src root on the relocation root object. Then when we finish
    the prealloc, if it happens to be this case, prepare the delayed ref
    appropriately.
    
    We must also add logic to handle relocating adjacent extents with
    different owning roots. Those cannot be preallocated together in a
    cluster as it would lose the separate ownership information.
    
    This is obviously a smelly bit of code, but I think it is the best
    solution to the problem, given the relocation implementation.
    Signed-off-by: default avatarBoris Burkov <boris@bur.io>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    2672a051
relocation.c 117 KB