• Filipe Manana's avatar
    Btrfs: fix log replay failure after unlink and link combination · 1f250e92
    Filipe Manana authored
    If we have a file with 2 (or more) hard links in the same directory,
    remove one of the hard links, create a new file (or link an existing file)
    in the same directory with the name of the removed hard link, and then
    finally fsync the new file, we end up with a log that fails to replay,
    causing a mount failure.
    
    Example:
    
      $ mkfs.btrfs -f /dev/sdb
      $ mount /dev/sdb /mnt
    
      $ mkdir /mnt/testdir
      $ touch /mnt/testdir/foo
      $ ln /mnt/testdir/foo /mnt/testdir/bar
    
      $ sync
    
      $ unlink /mnt/testdir/bar
      $ touch /mnt/testdir/bar
      $ xfs_io -c "fsync" /mnt/testdir/bar
    
      <power failure>
    
      $ mount /dev/sdb /mnt
      mount: mount(2) failed: /mnt: No such file or directory
    
    When replaying the log, for that example, we also see the following in
    dmesg/syslog:
    
      [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
      [71813.674204] ------------[ cut here ]------------
      [71813.675694] BTRFS: Transaction aborted (error -2)
      [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
      [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
      [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G        W        4.15.0-rc9-btrfs-next-56+ #1
      [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
      [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
      [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
      [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001
      [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff
      [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001
      [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8
      [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101
      [71813.679669] FS:  00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
      [71813.679669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0
      [71813.679669] Call Trace:
      [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
      [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
      [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
      [71813.679669]  ? __lock_is_held+0x39/0x71
      [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
      [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
      [71813.679669]  ? rcu_read_unlock+0x3a/0x57
      [71813.679669]  ? __lock_is_held+0x39/0x71
      [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
      [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
      [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
      [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
      [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
      [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
      [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
      [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
      [71813.679669]  ? mount_fs+0x64/0x10b
      [71813.679669]  mount_fs+0x64/0x10b
      [71813.679669]  vfs_kern_mount+0x68/0xce
      [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
      [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
      [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
      [71813.679669]  ? mount_fs+0x64/0x10b
      [71813.679669]  mount_fs+0x64/0x10b
      [71813.679669]  vfs_kern_mount+0x68/0xce
      [71813.679669]  do_mount+0x6e5/0x973
      [71813.679669]  ? memdup_user+0x3e/0x5c
      [71813.679669]  SyS_mount+0x72/0x98
      [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
      [71813.679669] RIP: 0033:0x7f7cedf150ba
      [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
      [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
      [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
      [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
      [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
      [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
      [71814.128078] BTRFS error (device dm-0): open_ctree failed
    
    This happens because the log has inode reference items for both inode 258
    (the first file we created) and inode 259 (the second file created), and
    when processing the reference item for inode 258, we replace the
    corresponding item in the subvolume tree (which has two names, "foo" and
    "bar") witht he one in the log (which only has one name, "foo") without
    removing the corresponding dir index keys from the parent directory.
    Later, when processing the inode reference item for inode 259, which has
    a name of "bar" associated to it, we notice that dir index entries exist
    for that name and for a different inode, so we attempt to unlink that
    name, which fails because the inode reference item for inode 258 no longer
    has the name "bar" associated to it, making a call to btrfs_unlink_inode()
    fail with a -ENOENT error.
    
    Fix this by unlinking all the names in an inode reference item from a
    subvolume tree that are not present in the inode reference item found in
    the log tree, before overwriting it with the item from the log tree.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    1f250e92
tree-log.c 159 KB