• Filipe Manana's avatar
    Btrfs: fix freeing used extents after removing empty block group · ae0ab003
    Filipe Manana authored
    There's a race between adding a block group to the list of the unused
    block groups and removing an unused block group (cleaner kthread) that
    leads to freeing extents that are in use or a crash during transaction
    commmit. Basically the cleaner kthread, when executing
    btrfs_delete_unused_bgs(), might catch the newly added block group to
    the list fs_info->unused_bgs and clear the range representing the whole
    group from fs_info->freed_extents[] before the task that added the block
    group to the list (running update_block_group()) marked the last freed
    extent as dirty in fs_info->freed_extents (pinned_extents).
    
    That is:
    
         CPU 1                                CPU 2
    
                                      btrfs_delete_unused_bgs()
    update_block_group()
       add block group to
       fs_info->unused_bgs
                                        got block group from the list
                                        clear_extent_bits for the whole
                                        block group range in freed_extents[]
       set_extent_dirty for the
       range covering the freed
       extent in freed_extents[]
       (fs_info->pinned_extents)
    
                                      block group deleted, and a new block
                                      group with the same logical address is
                                      created
    
                                      reserve space from the new block group
                                      for new data or metadata - the reserved
                                      space overlaps the range specified by
                                      CPU 1 for set_extent_dirty()
    
                                      commit transaction
                                        find all ranges marked as dirty in
                                        fs_info->pinned_extents, clear them
                                        and add them to the free space cache
    
    Alternatively, if CPU 2 doesn't create a new block group with the same
    logical address, we get a crash/BUG_ON at transaction commit when unpining
    extent ranges because we can't find a block group for the range marked as
    dirty by CPU 1. Sample trace:
    
    [ 2163.426462] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    [ 2163.426640] Modules linked in: btrfs xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32c_generic libcrc32c dm_mod nfsd auth_rpc
    gss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse parport_pc parport i2c_piix4 processor thermal_sys i2ccore evdev button pcspkr microcode serio_raw ext4 crc16 jbd2 mbcache
     sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata e1000 scsi_mod virtio_pci virtio_ring virtio
    [ 2163.428209] CPU: 0 PID: 11858 Comm: btrfs-transacti Tainted: G        W      3.17.0-rc5-btrfs-next-1+ #1
    [ 2163.428519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
    [ 2163.428875] task: ffff88009f2c0650 ti: ffff8801356bc000 task.ti: ffff8801356bc000
    [ 2163.429157] RIP: 0010:[<ffffffffa037728e>]  [<ffffffffa037728e>] unpin_extent_range.isra.58+0x62/0x192 [btrfs]
    [ 2163.429562] RSP: 0018:ffff8801356bfda8  EFLAGS: 00010246
    [ 2163.429802] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
    [ 2163.429990] RDX: 0000000041bfffff RSI: 0000000001c00000 RDI: ffff880024307080
    [ 2163.430042] RBP: ffff8801356bfde8 R08: 0000000000000068 R09: ffff88003734f118
    [ 2163.430042] R10: ffff8801356bfcb8 R11: fffffffffffffb69 R12: ffff8800243070d0
    [ 2163.430042] R13: 0000000083c04000 R14: ffff8800751b0f00 R15: ffff880024307000
    [ 2163.430042] FS:  0000000000000000(0000) GS:ffff88013f400000(0000) knlGS:0000000000000000
    [ 2163.430042] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 2163.430042] CR2: 00007ff10eb43fc0 CR3: 0000000004cb8000 CR4: 00000000000006f0
    [ 2163.430042] Stack:
    [ 2163.430042]  ffff8800243070d0 0000000083c08000 0000000083c07fff ffff88012d6bc800
    [ 2163.430042]  ffff8800243070d0 ffff8800751b0f18 ffff8800751b0f00 0000000000000000
    [ 2163.430042]  ffff8801356bfe18 ffffffffa037a481 0000000083c04000 0000000083c07fff
    [ 2163.430042] Call Trace:
    [ 2163.430042]  [<ffffffffa037a481>] btrfs_finish_extent_commit+0xac/0xbf [btrfs]
    [ 2163.430042]  [<ffffffffa038c06d>] btrfs_commit_transaction+0x6ee/0x882 [btrfs]
    [ 2163.430042]  [<ffffffffa03881f1>] transaction_kthread+0xf2/0x1a4 [btrfs]
    [ 2163.430042]  [<ffffffffa03880ff>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
    [ 2163.430042]  [<ffffffff8105966b>] kthread+0xb7/0xbf
    [ 2163.430042]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
    [ 2163.430042]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
    [ 2163.430042]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
    
    So fix this by making update_block_group() first set the range as dirty
    in pinned_extents before adding the block group to the unused_bgs list.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    ae0ab003
extent-tree.c 258 KB