• Johannes Thumshirn's avatar
    btrfs: zoned: fix use-after-free in do_zone_finish() · 1ec17ef5
    Johannes Thumshirn authored
    Shinichiro reported the following use-after-free triggered by the device
    replace operation in fstests btrfs/070.
    
     BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
     ==================================================================
     BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
     Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007
    
     CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
     Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
     Call Trace:
      <TASK>
      dump_stack_lvl+0x5b/0x90
      print_report+0xcf/0x670
      ? __virt_addr_valid+0x200/0x3e0
      kasan_report+0xd8/0x110
      ? do_zone_finish+0x91a/0xb90 [btrfs]
      ? do_zone_finish+0x91a/0xb90 [btrfs]
      do_zone_finish+0x91a/0xb90 [btrfs]
      btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
      ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
      ? btrfs_put_root+0x2d/0x220 [btrfs]
      ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
      cleaner_kthread+0x21e/0x380 [btrfs]
      ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
      kthread+0x2e3/0x3c0
      ? __pfx_kthread+0x10/0x10
      ret_from_fork+0x31/0x70
      ? __pfx_kthread+0x10/0x10
      ret_from_fork_asm+0x1b/0x30
      </TASK>
    
     Allocated by task 3493983:
      kasan_save_stack+0x33/0x60
      kasan_save_track+0x14/0x30
      __kasan_kmalloc+0xaa/0xb0
      btrfs_alloc_device+0xb3/0x4e0 [btrfs]
      device_list_add.constprop.0+0x993/0x1630 [btrfs]
      btrfs_scan_one_device+0x219/0x3d0 [btrfs]
      btrfs_control_ioctl+0x26e/0x310 [btrfs]
      __x64_sys_ioctl+0x134/0x1b0
      do_syscall_64+0x99/0x190
      entry_SYSCALL_64_after_hwframe+0x6e/0x76
    
     Freed by task 3494056:
      kasan_save_stack+0x33/0x60
      kasan_save_track+0x14/0x30
      kasan_save_free_info+0x3f/0x60
      poison_slab_object+0x102/0x170
      __kasan_slab_free+0x32/0x70
      kfree+0x11b/0x320
      btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
      btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
      btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
      btrfs_ioctl+0xb27/0x57d0 [btrfs]
      __x64_sys_ioctl+0x134/0x1b0
      do_syscall_64+0x99/0x190
      entry_SYSCALL_64_after_hwframe+0x6e/0x76
    
     The buggy address belongs to the object at ffff8881543c8000
      which belongs to the cache kmalloc-1k of size 1024
     The buggy address is located 96 bytes inside of
      freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)
    
     The buggy address belongs to the physical page:
     page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
     head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
     flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
     page_type: 0xffffffff()
     raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
     raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
     page dumped because: kasan: bad access detected
    
     Memory state around the buggy address:
      ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                            ^
      ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    
    This UAF happens because we're accessing stale zone information of a
    already removed btrfs_device in do_zone_finish().
    
    The sequence of events is as follows:
    
    btrfs_dev_replace_start
      btrfs_scrub_dev
       btrfs_dev_replace_finishing
        btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
        btrfs_rm_dev_replace_free_srcdev
         btrfs_free_device                              <-- device freed
    
    cleaner_kthread
     btrfs_delete_unused_bgs
      btrfs_zone_finish
       do_zone_finish              <-- refers the freed device
    
    The reason for this is that we're using a cached pointer to the chunk_map
    from the block group, but on device replace this cached pointer can
    contain stale device entries.
    
    The staleness comes from the fact, that btrfs_block_group::physical_map is
    not a pointer to a btrfs_chunk_map but a memory copy of it.
    
    Also take the fs_info::dev_replace::rwsem to prevent
    btrfs_dev_replace_update_device_in_mapping_tree() from changing the device
    underneath us again.
    
    Note: btrfs_dev_replace_update_device_in_mapping_tree() is holding
    fs_info::mapping_tree_lock, but as this is a spinning read/write lock we
    cannot take it as the call to blkdev_zone_mgmt() requires a memory
    allocation which may not sleep.
    But btrfs_dev_replace_update_device_in_mapping_tree() is always called with
    the fs_info::dev_replace::rwsem held in write mode.
    
    Many thanks to Shinichiro for analyzing the bug.
    Reported-by: default avatarShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    CC: stable@vger.kernel.org # 6.8
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    1ec17ef5
zoned.c 68.9 KB