1. 16 Jan, 2012 25 commits
  2. 11 Jan, 2012 12 commits
    • Li Zefan's avatar
      Btrfs: fix possible deadlock when opening a seed device · b367e47f
      Li Zefan authored
      The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
      but when we mount a filesystem which has backing seed devices, we have
      this lock chain:
      
          open_ctree()
              lock(chunk_mutex);
              read_chunk_tree();
                  read_one_dev();
                      open_seed_devices();
                          lock(uuid_mutex);
      
      and then we hit a lockdep splat.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      b367e47f
    • Li Zefan's avatar
      Btrfs: update global block_rsv when creating a new block group · c7c144db
      Li Zefan authored
      A bug was triggered while using seed device:
      
          # mkfs.btrfs /dev/loop1
          # btrfstune -S 1 /dev/loop1
          # mount -o /dev/loop1 /mnt
          # btrfs dev add /dev/loop2 /mnt
      
      btrfs: block rsv returned -28
      ------------[ cut here ]------------
      WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]()
      ...
      Call Trace:
      ...
      [<f7b7c31c>] btrfs_cow_block+0x101/0x147 [btrfs]
      [<f7b7eaa6>] btrfs_search_slot+0x1b8/0x55f [btrfs]
      [<f7b7f844>] btrfs_insert_empty_items+0x42/0x7f [btrfs]
      [<f7b7f8c1>] btrfs_insert_item+0x40/0x7e [btrfs]
      [<f7b8ac02>] btrfs_make_block_group+0x243/0x2aa [btrfs]
      [<f7bb3f53>] __btrfs_alloc_chunk+0x672/0x70e [btrfs]
      [<f7bb41ff>] init_first_rw_device+0x77/0x13c [btrfs]
      [<f7bb5a62>] btrfs_init_new_device+0x664/0x9fd [btrfs]
      [<f7bbb65a>] btrfs_ioctl+0x694/0xdbe [btrfs]
      [<c04f55f7>] do_vfs_ioctl+0x496/0x4cc
      [<c04f5660>] sys_ioctl+0x33/0x4f
      [<c07b9edf>] sysenter_do_call+0x12/0x38
      ---[ end trace 906adac595facc7d ]---
      
      Since seed device is readonly, there's no usable space in the filesystem.
      Afterwards we add a sprout device to it, and the kernel creates a METADATA
      block group and a SYSTEM block group where comes free space we can reserve,
      but we still get revervation failure because the global block_rsv hasn't
      been updated accordingly.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      c7c144db
    • Li Zefan's avatar
      Btrfs: rewrite btrfs_trim_block_group() · 7fe1e641
      Li Zefan authored
      There are various bugs in block group trimming:
      
      - It may trim from offset smaller than user-specified offset.
      - It may trim beyond user-specified range.
      - It may leak free space for extents smaller than specified minlen.
      - It may truncate the last trimmed extent thus leak free space.
      - With mixed extents+bitmaps, some extents may not be trimmed.
      - With mixed extents+bitmaps, some bitmaps may not be trimmed (even
      none will be trimmed). Even for those trimmed, not all the free space
      in the bitmaps will be trimmed.
      
      I rewrite btrfs_trim_block_group() and break it into two functions.
      One is to trim extents only, and the other is to trim bitmaps only.
      
      Before patching:
      
      	# fstrim -v /mnt/
      	/mnt/: 1496465408 bytes were trimmed
      
      After patching:
      
      	# fstrim -v /mnt/
      	/mnt/: 2193768448 bytes were trimmed
      
      And this matches the total free space:
      
      	# btrfs fi df /mnt
      	Data: total=3.58GB, used=1.79GB
      	System, DUP: total=8.00MB, used=4.00KB
      	System: total=4.00MB, used=0.00
      	Metadata, DUP: total=205.12MB, used=97.14MB
      	Metadata: total=8.00MB, used=0.00
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      7fe1e641
    • Li Zefan's avatar
      Btrfs: simplfy calculation of stripe length for discard operation · ec9ef7a1
      Li Zefan authored
      For btrfs raid, while discarding a range of space, we'll need to know
      the start offset and length to discard for each device, and it's done
      in btrfs_map_block().
      
      However the calculation is a bit complex for raid0 and raid10, so I
      reimplement it based on a fact that:
      
              dev1          dev2           dev3    (raid0)
              -----------------------------------
              s0 s3 s6      s1 s4 s7       s2 s5
      
      Each device has (total_stripes / nr_dev) stripes, or plus one.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      ec9ef7a1
    • Li Zefan's avatar
      Btrfs: don't pre-allocate btrfs bio · de11cc12
      Li Zefan authored
      We pre-allocate a btrfs bio with fixed size, and then may re-allocate
      memory if we find stripes are bigger than the fixed size. But this
      pre-allocation is not necessary.
      
      Also we don't have to calcuate the stripe number twice.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      de11cc12
    • Li Zefan's avatar
      Btrfs: don't pass a trans handle unnecessarily in volumes.c · 125ccb0a
      Li Zefan authored
      Some functions never use the transaction handle passed to them.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      125ccb0a
    • Li Zefan's avatar
      Btrfs: reserve metadata space in btrfs_ioctl_setflags() · 4da6f1a3
      Li Zefan authored
      Check and reserve space for btrfs_update_inode().
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      4da6f1a3
    • Li Zefan's avatar
      Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags() · f062abf0
      Li Zefan authored
      We can recover from errors and return -errno to user space.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      f062abf0
    • Li Zefan's avatar
      Btrfs: check the return value of io_ctl_init() · 706efc66
      Li Zefan authored
      It can return -ENOMEM.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      706efc66
    • Li Zefan's avatar
      Btrfs: avoid possible NULL deref in io_ctl_drop_pages() · a1ee5a45
      Li Zefan authored
      If we run into some failure path in io_ctl_prepare_pages(),
      io_ctl->pages[] array may have some NULL pointers.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      a1ee5a45
    • Li Zefan's avatar
      Btrfs: add pinned extents to on-disk free space cache correctly · db804f23
      Li Zefan authored
      I got this while running xfstests:
      
      [24256.836098] block group 317849600 has an wrong amount of free space
      [24256.836100] btrfs: failed to load free space cache for block group 317849600
      
      We should clamp the extent returned by find_first_extent_bit(),
      so the start of the extent won't smaller than the start of the
      block group.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      db804f23
    • Li Zefan's avatar
      Merge branch 'for-linus' of... · d25223a0
      Li Zefan authored
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs into for-linus
      d25223a0
  3. 08 Jan, 2012 2 commits
  4. 06 Jan, 2012 1 commit
    • Alexandre Oliva's avatar
      Btrfs: test free space only for unclustered allocation · a5f6f719
      Alexandre Oliva authored
      Since the clustered allocation may be taking extents from a different
      block group, there's no point in spin-locking and testing the current
      block group free space before attempting to allocate space from a
      cluster, even more so when we might refrain from even trying the
      cluster in the current block group because, after the cluster was set
      up, not enough free space remained.  Furthermore, cluster creation
      attempts fail fast when the block group doesn't have enough free
      space, so the test was completely superfluous.
      
      I've move the free space test past the cluster allocation attempt,
      where it is more useful, and arranged for a cluster in the current
      block group to be released before trying an unclustered allocation,
      when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in
      the cluster stands a chance of being combined with additional free
      space in the block group so as to succeed in the allocation attempt.
      Signed-off-by: default avatarAlexandre Oliva <oliva@lsd.ic.unicamp.br>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      a5f6f719