1. 28 Jan, 2011 4 commits
  2. 26 Jan, 2011 12 commits
  3. 17 Jan, 2011 1 commit
    • liubo's avatar
      Btrfs: forced readonly mounts on errors · acce952b
      liubo authored
      This patch comes from "Forced readonly mounts on errors" ideas.
      
      As we know, this is the first step in being more fault tolerant of disk
      corruptions instead of just using BUG() statements.
      
      The major content:
      - add a framework for generating errors that should result in filesystems
        going readonly.
      - keep FS state in disk super block.
      - make sure that all of resource will be freed and released at umount time.
      - make sure that fter FS is forced readonly on error, there will be no more
        disk change before FS is corrected. For this, we should stop write operation.
      
      After this patch is applied, the conversion from BUG() to such a framework can
      happen incrementally.
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      acce952b
  4. 16 Jan, 2011 16 commits
    • Ben Hutchings's avatar
      btrfs: Require CAP_SYS_ADMIN for filesystem rebalance · 6f88a440
      Ben Hutchings authored
      Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire
      filesystem and may run uninterruptibly for a long time.  This does not
      seem to be something that an unprivileged user should be able to do.
      Reported-by: default avatarAron Xu <happyaron.xu@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      6f88a440
    • Josef Bacik's avatar
      Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check · f690efb1
      Josef Bacik authored
      If we run low on space we could get a bunch of warnings out of
      btrfs_block_rsv_check, but this is mostly just called via the transaction code
      to see if we need to end the transaction, it expects to see failures, so let's
      not WARN and freak everybody out for no reason.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      f690efb1
    • Tsutomu Itoh's avatar
      btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() · 5e540f77
      Tsutomu Itoh authored
      In btrfs_read_fs_root_no_radix(), 'root' is not freed if
      btrfs_search_slot() returns error.
      Signed-off-by: default avatarTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      5e540f77
    • Tsutomu Itoh's avatar
      btrfs: check NULL or not · 91ca338d
      Tsutomu Itoh authored
      Should check if functions returns NULL or not.
      Signed-off-by: default avatarTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      91ca338d
    • Jesper Juhl's avatar
      btrfs: Don't pass NULL ptr to func that may deref it. · ff175d57
      Jesper Juhl authored
      Hi,
      
      In fs/btrfs/inode.c::fixup_tree_root_location() we have this code:
      
      ...
       		if (!path) {
       			err = -ENOMEM;
       			goto out;
       		}
      ...
       	out:
       		btrfs_free_path(path);
       		return err;
      
      btrfs_free_path() passes its argument on to other functions and some of
      them end up dereferencing the pointer.
      In the code above that pointer is clearly NULL, so btrfs_free_path() will
      eventually cause a NULL dereference.
      
      There are many ways to cut this cake (fix the bug). The one I chose was to
      make btrfs_free_path() deal gracefully with NULL pointers. If you
      disagree, feel free to come up with an alternative patch.
      Signed-off-by: default avatarJesper Juhl <jj@chaosbits.net>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      ff175d57
    • Dave Young's avatar
      btrfs: mount failure return value fix · 20b45077
      Dave Young authored
      I happened to pass swap partition as root partition in cmdline,
      then kernel panic and tell me about "Cannot open root device".
      It is not correct, in fact it is a fs type mismatch instead of 'no device'.
      
      Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL.
      The logic in init/do_mounts.c:
              for (p = fs_names; *p; p += strlen(p)+1) {
                      int err = do_mount_root(name, p, flags, root_mount_data);
                      switch (err) {
                              case 0:
                                      goto out;
                              case -EACCES:
                                      flags |= MS_RDONLY;
                                      goto retry;
                              case -EINVAL:
                                      continue;
                      }
      		print "Cannot open root device"
      		panic
      	}
      SO fs type after btrfs will have no chance to mount
      
      Here fix the return value as -EINVAL
      Signed-off-by: default avatarDave Young <hidave.darkstar@gmail.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      20b45077
    • Jesper Juhl's avatar
      btrfs: Mem leak in btrfs_get_acl() · 42838bb2
      Jesper Juhl authored
      It seems to me that we leak the memory allocated to 'value' in
      btrfs_get_acl() if the call to posix_acl_from_xattr() fails.
      Here's a patch that attempts to correct that problem.
      Signed-off-by: default avatarJesper Juhl <jj@chaosbits.net>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      42838bb2
    • Miao Xie's avatar
      btrfs: fix wrong free space information of btrfs · 6d07bcec
      Miao Xie authored
      When we store data by raid profile in btrfs with two or more different size
      disks, df command shows there is some free space in the filesystem, but the
      user can not write any data in fact, df command shows the wrong free space
      information of btrfs.
      
       # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
       # btrfs-show
       Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
       	Total devices 2 FS bytes used 28.00KB
       	devid    1 size 5.01GB used 2.03GB path /dev/sda9
       	devid    2 size 10.00GB used 2.01GB path /dev/sda10
       # btrfs device scan /dev/sda9 /dev/sda10
       # mount /dev/sda9 /mnt
       # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999
         (fill the filesystem)
       # sync
       # df -TH
       Filesystem	Type	Size	Used	Avail	Use%	Mounted on
       /dev/sda9	btrfs	17G	8.6G	5.4G	62%	/mnt
       # btrfs-show
       Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
       	Total devices 2 FS bytes used 3.99GB
       	devid    1 size 5.01GB used 5.01GB path /dev/sda9
       	devid    2 size 10.00GB used 4.99GB path /dev/sda10
      
      It is because btrfs cannot allocate chunks when one of the pairing disks has
      no space, the free space on the other disks can not be used for ever, and should
      be subtracted from the total space, but btrfs doesn't subtract this space from
      the total. It is strange to the user.
      
      This patch fixes it by calcing the free space that can be used to allocate
      chunks.
      
      Implementation:
      1. get all the devices free space, and align them by stripe length.
      2. sort the devices by the free space.
      3. check the free space of the devices,
         3.1. if it is not zero, and then check the number of the devices that has
              more free space than this device,
              if the number of the devices is beyond the min stripe number, the free
              space can be used, and add into total free space.
              if the number of the devices is below the min stripe number, we can not
              use the free space, the check ends.
         3.2. if the free space is zero, check the next devices, goto 3.1
      
      This implementation is just likely fake chunk allocation.
      
      After appling this patch, df can show correct space information:
       # df -TH
       Filesystem	Type	Size	Used	Avail	Use%	Mounted on
       /dev/sda9	btrfs	17G	8.6G	0	100%	/mnt
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      6d07bcec
    • Miao Xie's avatar
      btrfs: make the chunk allocator utilize the devices better · b2117a39
      Miao Xie authored
      With this patch, we change the handling method when we can not get enough free
      extents with default size.
      
      Implementation:
      1. Look up the suitable free extent on each device and keep the search result.
         If not find a suitable free extent, keep the max free extent
      2. If we get enough suitable free extents with default size, chunk allocation
         succeeds.
      3. If we can not get enough free extents, but the number of the extent with
         default size is >= min_stripes, we just change the mapping information
         (reduce the number of stripes in the extent map), and chunk allocation
         succeeds.
      4. If the number of the extent with default size is < min_stripes, sort the
         devices by its max free extent's size descending
      5. Use the size of the max free extent on the (num_stripes - 1)th device as the
         stripe size to allocate the device space
      
      By this way, the chunk allocator can allocate chunks as large as possible when
      the devices' space is not enough and make full use of the devices.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      b2117a39
    • Miao Xie's avatar
      btrfs: restructure find_free_dev_extent() · 7bfc837d
      Miao Xie authored
      - make it return the start position and length of the max free space when it can
        not find a suitable free space.
      - make it more readability
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      7bfc837d
    • Miao Xie's avatar
      btrfs: fix wrong calculation of stripe size · 1974a3b4
      Miao Xie authored
      There are two tiny problem:
      - One is When we check the chunk size is greater than the max chunk size or not,
        we should take mirrors into account, but the original code didn't.
      - The other is btrfs shouldn't use the size of the residual free space as the
        length of of a dup chunk when doing chunk allocation. It is because the device
        space that a dup chunk needs is twice as large as the chunk size, if we use
        the size of the residual free space as the length of a dup chunk, we can not
        get enough free space. Fix it.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      1974a3b4
    • Miao Xie's avatar
      btrfs: try to reclaim some space when chunk allocation fails · d52a5b5f
      Miao Xie authored
      We cannot write data into files when when there is tiny space in the filesystem.
      
      Reproduce steps:
       # mkfs.btrfs /dev/sda1
       # mount /dev/sda1 /mnt
       # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
       # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99999999999999
         (fill the filesystem)
       # umount /mnt
       # mount /dev/sda1 /mnt
       # rm -f /mnt/tmpfile0
       # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
         (failed with nospec)
      
      But if we do the last step again, we can write data successfully. The reason of
      the problem is that btrfs didn't try to commit the current transaction and
      reclaim some space when chunk allocation failed.
      
      This patch fixes it by committing the current transaction to reclaim some
      space when chunk allocation fails.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      d52a5b5f
    • Miao Xie's avatar
      btrfs: fix wrong data space statistics · 299a08b1
      Miao Xie authored
      Josef has implemented mixed data/metadata chunks, we must add those chunks'
      space just like data chunks.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      299a08b1
    • Stefan Schmidt's avatar
      fs/btrfs: Fix build of ctree · f580eb09
      Stefan Schmidt authored
      CC [M]  fs/btrfs/ctree.o
      In file included from fs/btrfs/ctree.c:21:0:
      fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type
      fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type
      make[2]: *** [fs/btrfs/ctree.o] Error 1
      make[1]: *** [fs/btrfs] Error 2
      make: *** [fs] Error 2
      
      We need to include kobject.h here.
      Reported-by: default avatarJeff Garzik <jeff@garzik.org>
      Fix-suggested-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      f580eb09
    • Chris Mason's avatar
    • Chris Mason's avatar
  5. 04 Jan, 2011 1 commit
  6. 23 Dec, 2010 3 commits
    • Li Zefan's avatar
      Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls · 0caa102d
      Li Zefan authored
      This allows us to set a snapshot or a subvolume readonly or writable
      on the fly.
      
      Usage:
      
      Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and then
      call ioctl(BTRFS_IOCTL_SUBVOL_SETFLAGS);
      
      Changelog for v3:
      
      - Change to pass __u64 as ioctl parameter.
      
      Changelog for v2:
      
      - Add _GETFLAGS ioctl.
      - Check if the passed fd is the root of a subvolume.
      - Change the name from _SNAP_SETFLAGS to _SUBVOL_SETFLAGS.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      0caa102d
    • Li Zefan's avatar
      Btrfs: Add readonly snapshots support · b83cc969
      Li Zefan authored
      Usage:
      
      Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call
      ioctl(BTRFS_I0CTL_SNAP_CREATE_V2).
      
      Implementation:
      
      - Set readonly bit of btrfs_root_item->flags.
      - Add readonly checks in btrfs_permission (inode_permission),
      btrfs_setattr, btrfs_set/remove_xattr and some ioctls.
      
      Changelog for v3:
      
      - Eliminate btrfs_root->readonly, but check btrfs_root->root_item.flags.
      - Rename BTRFS_ROOT_SNAP_RDONLY to BTRFS_ROOT_SUBVOL_RDONLY.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      b83cc969
    • Li Zefan's avatar
      Btrfs: Refactor btrfs_ioctl_snap_create() · fa0d2b9b
      Li Zefan authored
      Split it into two functions for two different ioctls, since they
      share no common code.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      fa0d2b9b
  7. 22 Dec, 2010 3 commits
    • Li Zefan's avatar
      btrfs: Extract duplicate decompress code · 3a39c18d
      Li Zefan authored
      Add a common function to copy decompressed data from working buffer
      to bio pages.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      3a39c18d
    • Li Zefan's avatar
      btrfs: Allow to specify compress method when defrag · 1a419d85
      Li Zefan authored
      Update defrag ioctl, so one can choose lzo or zlib when turning
      on compression in defrag operation.
      
      Changelog:
      
      v1 -> v2
      - Add incompability flag.
      - Fix to check invalid compress type.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      1a419d85
    • Li Zefan's avatar
      btrfs: Add lzo compression support · a6fa6fae
      Li Zefan authored
      Lzo is a much faster compression algorithm than gzib, so would allow
      more users to enable transparent compression, and some users can
      choose from compression ratio and speed for different applications
      
      Usage:
      
       # mount -t btrfs -o compress[=<zlib,lzo>] dev /mnt
      or
       # mount -t btrfs -o compress-force[=<zlib,lzo>] dev /mnt
      
      "-o compress" without argument is still allowed for compatability.
      
      Compatibility:
      
      If we mount a filesystem with lzo compression, it will not be able be
      mounted in old kernels. One reason is, otherwise btrfs will directly
      dump compressed data, which sits in inline extent, to user.
      
      Performance:
      
      The test copied a linux source tarball (~400M) from an ext4 partition
      to the btrfs partition, and then extracted it.
      
      (time in second)
                 lzo        zlib        nocompress
      copy:      10.6       21.7        14.9
      extract:   70.1       94.4        66.6
      
      (data size in MB)
                 lzo        zlib        nocompress
      copy:      185.87     108.69      394.49
      extract:   193.80     132.36      381.21
      
      Changelog:
      
      v1 -> v2:
      - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig.
      - Add incompability flag.
      - Fix error handling in compress code.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      a6fa6fae