• Filipe Manana's avatar
    btrfs: send, recompute reference path after orphanization of a directory · 9c2b4e03
    Filipe Manana authored
    During an incremental send, when an inode has multiple new references we
    might end up emitting rename operations for orphanizations that have a
    source path that is no longer valid due to a previous orphanization of
    some directory inode. This causes the receiver to fail since it tries
    to rename a path that does not exists.
    
    Example reproducer:
    
      $ cat reproducer.sh
      #!/bin/bash
    
      mkfs.btrfs -f /dev/sdi >/dev/null
      mount /dev/sdi /mnt/sdi
    
      touch /mnt/sdi/f1
      touch /mnt/sdi/f2
      mkdir /mnt/sdi/d1
      mkdir /mnt/sdi/d1/d2
    
      # Filesystem looks like:
      #
      # .                           (ino 256)
      # |----- f1                   (ino 257)
      # |----- f2                   (ino 258)
      # |----- d1/                  (ino 259)
      #        |----- d2/           (ino 260)
    
      btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap1
      btrfs send -f /tmp/snap1.send /mnt/sdi/snap1
    
      # Now do a series of changes such that:
      #
      # *) inode 258 has one new hardlink and the previous name changed
      #
      # *) both names conflict with the old names of two other inodes:
      #
      #    1) the new name "d1" conflicts with the old name of inode 259,
      #       under directory inode 256 (root)
      #
      #    2) the new name "d2" conflicts with the old name of inode 260
      #       under directory inode 259
      #
      # *) inodes 259 and 260 now have the old names of inode 258
      #
      # *) inode 257 is now located under inode 260 - an inode with a number
      #    smaller than the inode (258) for which we created a second hard
      #    link and swapped its names with inodes 259 and 260
      #
      ln /mnt/sdi/f2 /mnt/sdi/d1/f2_link
      mv /mnt/sdi/f1 /mnt/sdi/d1/d2/f1
    
      # Swap d1 and f2.
      mv /mnt/sdi/d1 /mnt/sdi/tmp
      mv /mnt/sdi/f2 /mnt/sdi/d1
      mv /mnt/sdi/tmp /mnt/sdi/f2
    
      # Swap d2 and f2_link
      mv /mnt/sdi/f2/d2 /mnt/sdi/tmp
      mv /mnt/sdi/f2/f2_link /mnt/sdi/f2/d2
      mv /mnt/sdi/tmp /mnt/sdi/f2/f2_link
    
      # Filesystem now looks like:
      #
      # .                                (ino 256)
      # |----- d1                        (ino 258)
      # |----- f2/                       (ino 259)
      #        |----- f2_link/           (ino 260)
      #        |       |----- f1         (ino 257)
      #        |
      #        |----- d2                 (ino 258)
    
      btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap2
      btrfs send -f /tmp/snap2.send -p /mnt/sdi/snap1 /mnt/sdi/snap2
    
      mkfs.btrfs -f /dev/sdj >/dev/null
      mount /dev/sdj /mnt/sdj
    
      btrfs receive -f /tmp/snap1.send /mnt/sdj
      btrfs receive -f /tmp/snap2.send /mnt/sdj
    
      umount /mnt/sdi
      umount /mnt/sdj
    
    When executed the receive of the incremental stream fails:
    
      $ ./reproducer.sh
      Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
      At subvol /mnt/sdi/snap1
      Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
      At subvol /mnt/sdi/snap2
      At subvol snap1
      At snapshot snap2
      ERROR: rename d1/d2 -> o260-6-0 failed: No such file or directory
    
    This happens because:
    
    1) When processing inode 257 we end up computing the name for inode 259
       because it is an ancestor in the send snapshot, and at that point it
       still has its old name, "d1", from the parent snapshot because inode
       259 was not yet processed. We then cache that name, which is valid
       until we start processing inode 259 (or set the progress to 260 after
       processing its references);
    
    2) Later we start processing inode 258 and collecting all its new
       references into the list sctx->new_refs. The first reference in the
       list happens to be the reference for name "d1" while the reference for
       name "d2" is next (the last element of the list).
       We compute the full path "d1/d2" for this second reference and store
       it in the reference (its ->full_path member). The path used for the
       new parent directory was "d1" and not "f2" because inode 259, the
       new parent, was not yet processed;
    
    3) When we start processing the new references at process_recorded_refs()
       we start with the first reference in the list, for the new name "d1".
       Because there is a conflicting inode that was not yet processed, which
       is directory inode 259, we orphanize it, renaming it from "d1" to
       "o259-6-0";
    
    4) Then we start processing the new reference for name "d2", and we
       realize it conflicts with the reference of inode 260 in the parent
       snapshot. So we issue an orphanization operation for inode 260 by
       emitting a rename operation with a destination path of "o260-6-0"
       and a source path of "d1/d2" - this source path is the value we
       stored in the reference earlier at step 2), corresponding to the
       ->full_path member of the reference, however that path is no longer
       valid due to the orphanization of the directory inode 259 in step 3).
       This makes the receiver fail since the path does not exists, it should
       have been "o259-6-0/d2".
    
    Fix this by recomputing the full path of a reference before emitting an
    orphanization if we previously orphanized any directory, since that
    directory could be a parent in the new path. This is a rare scenario so
    keeping it simple and not checking if that previously orphanized directory
    is in fact an ancestor of the inode we are trying to orphanize.
    
    A test case for fstests follows soon.
    
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    9c2b4e03
send.c 181 KB