1. 26 Jan, 2011 8 commits
  2. 04 Jan, 2011 1 commit
  3. 14 Dec, 2010 2 commits
    • Chris Mason's avatar
      Btrfs: prevent RAID level downgrades when space is low · 83a50de9
      Chris Mason authored
      The extent allocator has code that allows us to fill
      allocations from any available block group, even if it doesn't
      match the raid level we've requested.
      
      This was put in because adding a new drive to a filesystem
      made with the default mkfs options actually upgrades the metadata from
      single spindle dup to full RAID1.
      
      But, the code also allows us to allocate from a raid0 chunk when we
      really want a raid1 or raid10 chunk.  This can cause big trouble because
      mkfs creates a small (4MB) raid0 chunk for data and metadata which then
      goes unused for raid1/raid10 installs.
      
      The allocator will happily wander in and allocate from that chunk when
      things get tight, which is not correct.
      
      The fix here is to make sure that we provide duplication when the
      caller has asked for it.  It does all the dups to be any raid level,
      which preserves the dup->raid1 upgrade abilities.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      83a50de9
    • Chris Mason's avatar
      Btrfs: account for missing devices in RAID allocation profiles · cd02dca5
      Chris Mason authored
      When we mount in RAID degraded mode without adding a new device to
      replace the failed one, we can end up using the wrong RAID flags for
      allocations.
      
      This results in strange combinations of block groups (raid1 in a raid10
      filesystem) and corruptions when we try to allocate blocks from single
      spindle chunks on drives that are actually missing.
      
      The first device has two small 4MB chunks in it that mkfs creates and
      these are usually unused in a raid1 or raid10 setup.  But, in -o degraded,
      the allocator will fall back to these because the mask of desired raid groups
      isn't correct.
      
      The fix here is to count the missing devices as we build up the list
      of devices in the system.  This count is used when picking the
      raid level to make sure we continue using the same levels that were
      in place before we lost a drive.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      cd02dca5
  4. 13 Dec, 2010 1 commit
    • Chris Mason's avatar
      Btrfs: EIO when we fail to read tree roots · 68433b73
      Chris Mason authored
      If we just get a plain IO error when we read tree roots, the code
      wasn't properly sending that error up the chain.  This allowed mounts to
      continue when they should failed, and allowed operations
      on partially setup root structs.  The end result was usually oopsen
      on spinlocks that hadn't been spun up correctly.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      68433b73
  5. 10 Dec, 2010 7 commits
  6. 09 Dec, 2010 4 commits
  7. 29 Nov, 2010 2 commits
  8. 27 Nov, 2010 6 commits
    • Josef Bacik's avatar
      Btrfs: setup blank root and fs_info for mount time · 450ba0ea
      Josef Bacik authored
      There is a problem with how we use sget, it searches through the list of supers
      attached to the fs_type looking for a super with the same fs_devices as what
      we're trying to mount.  This depends on sb->s_fs_info being filled, but we don't
      fill that in until we get to btrfs_fill_super, so we could hit supers on the
      fs_type super list that have a null s_fs_info.  In order to fix that we need to
      go ahead and setup a blank root with a blank fs_info to hold fs_devices, that
      way our test will work out right and then we can set s_fs_info in
      btrfs_set_super, and then open_ctree will simply use our pre-allocated root and
      fs_info when setting everything up.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      450ba0ea
    • Josef Bacik's avatar
      Btrfs: fix fiemap · 975f84fe
      Josef Bacik authored
      There are two big problems currently with FIEMAP
      
      1) We return extents for holes.  This isn't supposed to happen, we just don't
      return extents for holes and then userspace interprets the lack of an extent as
      a hole.
      
      2) We sometimes don't set FIEMAP_EXTENT_LAST properly.  This is because we wait
      to see a EXTENT_FLAG_VACANCY flag on the em, but this won't happen if say we ask
      fiemap to map up to the last extent in a file, and there is nothing but holes up
      to the i_size.  To fix this we need to lookup the last extent in this file and
      save the logical offset, so if we happen to try and map that extent we can be
      sure to set FIEMAP_EXTENT_LAST.
      
      With this patch we now pass xfstest 225, which we never have before.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      975f84fe
    • Ian Kent's avatar
      Btrfs - fix race between btrfs_get_sb() and umount · 619c8c76
      Ian Kent authored
      When mounting a btrfs file system btrfs_test_super() may attempt to
      use sb->s_fs_info, the btrfs root, of a super block that is going away
      and that has had the btrfs root set to NULL in its ->put_super(). But
      if the super block is going away it cannot be an existing super block
      so we can return false in this case.
      Signed-off-by: default avatarIan Kent <raven@themaw.net>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      619c8c76
    • Josef Bacik's avatar
      Btrfs: update inode ctime when using links · bc1cbf1f
      Josef Bacik authored
      Currently we fail xfstest 236 because we're not updating the inode ctime on
      link.  This is a simple fix, and makes it so we pass 236 now.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      bc1cbf1f
    • Josef Bacik's avatar
      Btrfs: make sure new inode size is ok in fallocate · 0ed42a63
      Josef Bacik authored
      We have been failing xfstest 228 forever, because we don't check to make sure
      the new inode size is acceptable as far as RLIMIT is concerned.  Just check to
      make sure it's ok to create a inode with this new size and error out if not.
      With this patch we now pass 228.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      0ed42a63
    • Josef Bacik's avatar
      Btrfs: fix typo in fallocate to make it honor actual size · 55a61d1d
      Josef Bacik authored
      There is a typo in __btrfs_prealloc_file_range() where we set the i_size to
      actual_len/cur_offset, and then just set it to cur_offset again, and do the same
      with btrfs_ordered_update_i_size().  This fixes it back to keeping i_size in a
      local variable and then updating i_size properly.  Tested this with
      
      xfs_io -F -f -c "falloc 0 1" -c "pwrite 0 1" foo
      
      stat'ing foo gives us a size of 1 instead of 4096 like it was.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      55a61d1d
  9. 22 Nov, 2010 9 commits
    • Chris Mason's avatar
      Btrfs: avoid NULL pointer deref in try_release_extent_buffer · 45f49bce
      Chris Mason authored
      If we fail to find a pointer in the radix tree, don't try
      to deref the NULL one we do have.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      45f49bce
    • Josef Bacik's avatar
      Btrfs: make btrfs_add_nondir take parent inode as an argument · a1b075d2
      Josef Bacik authored
      Everybody who calls btrfs_add_nondir just passes in the dentry of the new file
      and then dereference dentry->d_parent->d_inode, but everybody who calls
      btrfs_add_nondir() are already passed the parent's inode.  So instead of
      dereferencing dentry->d_parent, just make btrfs_add_nondir take the dir inode as
      an argument and pass that along so we don't have to worry about d_parent.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      a1b075d2
    • Josef Bacik's avatar
      Btrfs: hold i_mutex when calling btrfs_log_dentry_safe · 495e8677
      Josef Bacik authored
      Since we walk up the path logging all of the parts of the inode's path, we need
      to hold i_mutex to make sure that the inode is not renamed while we're logging
      everything.  btrfs_log_dentry_safe does dget_parent and all of that jazz, but we
      may get unexpected results if the rename changes the inode's location while
      we're higher up the path logging those dentries, so do this for safety reasons.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      495e8677
    • Josef Bacik's avatar
      Btrfs: use dget_parent where we can UPDATED · 6a912213
      Josef Bacik authored
      There are lots of places where we do dentry->d_parent->d_inode without holding
      the dentry->d_lock.  This could cause problems with rename.  So instead we need
      to use dget_parent() and hold the reference to the parent as long as we are
      going to use it's inode and then dput it at the end.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Cc: raven@themaw.net
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      6a912213
    • Josef Bacik's avatar
      Btrfs: fix more ESTALE problems with NFS · 76195853
      Josef Bacik authored
      When creating new inodes we don't setup inode->i_generation.  So if we generate
      an fh with a newly created inode we save the generation of 0, but if we flush
      the inode to disk and have to read it back when getting the inode on the server
      we'll have the right i_generation, so gens wont match and we get ESTALE.  This
      patch properly sets inode->i_generation when we create the new inode and now I'm
      no longer getting ESTALE.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      76195853
    • Josef Bacik's avatar
      Btrfs: handle NFS lookups properly · 2ede0daf
      Josef Bacik authored
      People kept reporting NFS issues, specifically getting ESTALE alot.  I figured
      out how to reproduce the problem
      
      SERVER
      mkfs.btrfs /dev/sda1
      mount /dev/sda1 /mnt/btrfs-test
      <add /mnt/btrfs-test to /etc/exports>
      btrfs subvol create /mnt/btrfs-test/foo
      service nfs start
      
      CLIENT
      mount server:/mnt/btrfs /mnt/test
      cd /mnt/test/foo
      ls
      
      SERVER
      echo 3 > /proc/sys/vm/drop_caches
      
      CLIENT
      ls			<-- get an ESTALE here
      
      This is because the standard way to lookup a name in nfsd is to use readdir, and
      what it does is do a readdir on the parent directory looking for the inode of
      the child.  So in this case the parent being / and the child being foo.  Well
      subvols all have the same inode number, so doing a readdir of / looking for
      inode 256 will return '.', which obviously doesn't match foo.  So instead we
      need to have our own .get_name so that we can find the right name.
      
      Our .get_name will either lookup the inode backref or the root backref,
      whichever we're looking for, and return the name we find.  Running the above
      reproducer with this patch results in everything acting the way its supposed to.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      2ede0daf
    • Mariusz Kozlowski's avatar
      btrfs: make 1-bit signed fileds unsigned · 0410c94a
      Mariusz Kozlowski authored
      Fixes these sparse warnings:
      fs/btrfs/ctree.h:811:17: error: dubious one-bit signed bitfield
      fs/btrfs/ctree.h:812:20: error: dubious one-bit signed bitfield
      fs/btrfs/ctree.h:813:19: error: dubious one-bit signed bitfield
      Signed-off-by: default avatarMariusz Kozlowski <mk@lab.zgora.pl>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      0410c94a
    • Li Zefan's avatar
      btrfs: Show device attr correctly for symlinks · f209561a
      Li Zefan authored
      Symlinks and files of other types show different device numbers, though
      they are on the same partition:
      
       $ touch tmp; ln -s tmp tmp2; stat tmp tmp2
         File: `tmp'
         Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
       Device: 15h/21d	Inode: 984027      Links: 1
       --- snip ---
         File: `tmp2' -> `tmp'
         Size: 3         	Blocks: 0          IO Block: 4096   symbolic link
       Device: 13h/19d	Inode: 984028      Links: 1
      Reported-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      f209561a
    • Li Zefan's avatar
      btrfs: Set file size correctly in file clone · 5f3888ff
      Li Zefan authored
      Set src_offset = 0, src_length = 20K, dest_offset = 20K. And the
      original filesize of the dest file 'file2' is 30K:
      
        # ls -l /mnt/file2
        -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2
      
      Now clone file1 to file2, the dest file should be 40K, but it
      still shows 30K:
      
        # ls -l /mnt/file2
        -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      5f3888ff