• BingJing Chang's avatar
    btrfs: send: fix failures when processing inodes with no links · 9ed0a72e
    BingJing Chang authored
    There is a bug causing send failures when processing an orphan directory
    with no links. In commit 46b2f459 ("Btrfs: fix send failure when
    root has deleted files still open")', the orphan inode issue was
    addressed. The send operation fails with a ENOENT error because of any
    attempts to generate a path for the inode with a link count of zero.
    Therefore, in that patch, sctx->ignore_cur_inode was introduced to be
    set if the current inode has a link count of zero for bypassing some
    unnecessary steps. And a helper function btrfs_unlink_all_paths() was
    introduced and called to clean up old paths found in the parent
    snapshot. However, not only regular files but also directories can be
    orphan inodes. So if the send operation meets an orphan directory, it
    will issue a wrong unlink command for that directory now. Soon the
    receive operation fails with a EISDIR error. Besides, the send operation
    also fails with a ENOENT error later when it tries to generate a path of
    it.
    
    Similar example but making an orphan dir for an incremental send:
    
      $ btrfs subvolume create vol
      $ mkdir vol/dir
      $ touch vol/dir/foo
    
      $ btrfs subvolume snapshot -r vol snap1
      $ btrfs subvolume snapshot -r vol snap2
    
      # Turn the second snapshot to RW mode and delete the whole dir while
      # holding an open file descriptor on it.
      $ btrfs property set snap2 ro false
      $ exec 73<snap2/dir
      $ rm -rf snap2/dir
    
      # Set the second snapshot back to RO mode and do an incremental send.
      $ btrfs property set snap2 ro true
      $ mkdir receive_dir
      $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
      At subvol snap2
      At snapshot snap2
      ERROR: send ioctl failed with -2: No such file or directory
      ERROR: unlink dir failed. Is a directory
    
    Actually, orphan inodes are more common use cases in cascading backups.
    (Please see the illustration below.) In a cascading backup, a user wants
    to replicate a couple of snapshots from Machine A to Machine B and from
    Machine B to Machine C. Machine B doesn't take any RO snapshots for
    sending. All a receiver does is create an RW snapshot of its parent
    snapshot, apply the send stream and turn it into RO mode at the end.
    Even if all paths of some inodes are deleted in applying the send
    stream, these inodes would not be deleted and become orphans after
    changing the subvolume from RW to RO. Moreover, orphan inodes can occur
    not only in send snapshots but also in parent snapshots because Machine
    B may do a batch replication of a couple of snapshots.
    
    An illustration for cascading backups:
    
      Machine A (snapshot {1..n}) --> Machine B --> Machine C
    
    The idea to solve the problem is to delete all the items of orphan
    inodes before using these snapshots for sending. I used to think that
    the reasonable timing for doing that is during the ioctl of changing the
    subvolume from RW to RO because it sounds good that we will not modify
    the fs tree of a RO snapshot anymore. However, attempting to do the
    orphan cleanup in the ioctl would be pointless. Because if someone is
    holding an open file descriptor on the inode, the reference count of the
    inode will never drop to 0. Then iput() cannot trigger eviction, which
    finally deletes all the items of it. So we try to extend the original
    patch to handle orphans in send/parent snapshots. Here are several cases
    that need to be considered:
    
    Case 1: BTRFS_COMPARE_TREE_NEW
           |  send snapshot  | action
     --------------------------------
     nlink |        0        | ignore
    
    In case 1, when we get a BTRFS_COMPARE_TREE_NEW tree comparison result,
    it means that a new inode is found in the send snapshot and it doesn't
    appear in the parent snapshot. Since this inode has a link count of zero
    (It's an orphan and there're no paths for it.), we can leverage
    sctx->ignore_cur_inode in the original patch to prevent it from being
    created.
    
    Case 2: BTRFS_COMPARE_TREE_DELETED
           | parent snapshot | action
     ----------------------------------
     nlink |        0        | as usual
    
    In case 2, when we get a BTRFS_COMPARE_TREE_DELETED tree comparison
    result, it means that the inode only appears in the parent snapshot.
    As usual, the send operation will try to delete all its paths. However,
    this inode has a link count of zero, so no paths of it will be found. No
    deletion operations will be issued. We don't need to change any logic.
    
    Case 3: BTRFS_COMPARE_TREE_CHANGED
               |       | parent snapshot | send snapshot | action
     -----------------------------------------------------------------------
     subcase 1 | nlink |        0        |       0       | ignore
     subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
     subcase 3 | nlink |        0        |      >0       | new_gen(creation)
    
    In case 3, when we get a BTRFS_COMPARE_TREE_CHANGED tree comparison result,
    it means that the inode appears in both snapshots. Here are 3 subcases.
    
    First, when the inode has link counts of zero in both snapshots. Since
    there are no paths for this inode in (source/destination) parent
    snapshots and we don't care about whether there is also an orphan inode
    in destination or not, we can set sctx->ignore_cur_inode on to prevent
    it from being created.
    
    For the second and the third subcases, if there are paths in one
    snapshot and there're no paths in the other snapshot for this inode. We
    can treat this inode as a new generation. We can also leverage the logic
    handling a new generation of an inode with small adjustments. Then it
    will delete all old paths and create a new inode with new attributes and
    paths only when there's a positive link count in the send snapshot.
    
    In subcase 2, the send operation only needs to delete all old paths as
    in the parent snapshot. But it may require more operations for a
    directory to remove its old paths. If a not-empty directory is going to
    be deleted (because it has a link count of zero in the send snapshot)
    but there are files/directories with bigger inode numbers under it, the
    send operation will need to rename it to its orphan name first. After
    processing and deleting the last item under this directory, the send
    operation will check this directory, aka the parent directory of the
    last item, again and issue a rmdir operation to remove it finally.
    
    Therefore, we also need to treat inodes with a link count of zero as if
    they didn't exist in get_cur_inode_state(), which is used in
    process_recorded_refs(). By doing this, when checking a directory with
    orphan names after the last item under it has been deleted, the send
    operation now can properly issue a rmdir operation. Otherwise, without
    doing this, the orphan directory with an orphan name would be kept here
    at the end due to the existing inode with a link count of zero being
    found.
    
    In subcase 3, as in case 2, no old paths would be found, so no deletion
    operations will be issued. The send operation will only create a new one
    for that inode.
    
    Note that subcase 3 is not common. That's because it's easy to reduce
    the hard links of an inode, but once all valid paths are removed,
    there are no valid paths for creating other hard links. The only way to
    do that is trying to send an older snapshot after a newer snapshot has
    been sent.
    Reviewed-by: default avatarRobbie Ko <robbieko@synology.com>
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarBingJing Chang <bingjingc@synology.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    9ed0a72e
send.c 204 KB