Commits · ed0fb78fb6aa294a719f8f5654fdff0ec8bc00bc · Kirill Smelkov / linux

20 Jan, 2013 1 commit

Btrfs: bring back balance pause/resume logic · ed0fb78f

Ilya Dryomov authored Jan 20, 2013

Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1
as part of dev-replace merge). Offending commit took a stab at making
mutually exclusive volume operations (add_dev, rm_dev, resize, balance,
replace_dev) not block behind volume_mutex if another such operation is
in progress and instead return an error right away. Balancing front-end
relied on the blocking behaviour, so the fix is ugly, but short of a
complete rework, it's the best we can do.
Reported-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ed0fb78f

14 Jan, 2013 14 commits

btrfs: update timestamps on truncate() · 3972f260

Eric Sandeen authored Jan 12, 2013

truncate() vs. ftruncate() differ in the VFS; truncate()
doesn't set (ATTR_CTIME | ATTR_MTIME), and it's up to the
fs to do the timestamp updates if the size changes.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>

3972f260

btrfs: fix btrfs_cont_expand() freeing IS_ERR em · f2767956

Zach Brown authored Jan 08, 2013

btrfs_cont_expand() tries to free an IS_ERR em as it gets an error from
btrfs_get_extent() and breaks out of its loop.

An instance of -EEXIST was reported in the wild:

  https://bugzilla.redhat.com/show_bug.cgi?id=874407

I have no idea if that -EEXIST is surprising, or not.  Regardless, this
error handling should be cleaned up to handle other reasonable errors
(ENOMEM, EIO; whatever).

This seemed to be the only buggy freeing of the relatively rare IS_ERR
em so I opted to fix the caller rather than teach free_extent_map() to
use IS_ERR_OR_NULL().
Signed-off-by: Zach Brown <zab@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>

f2767956

Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents · f9e4fb53

Liu Bo authored Jan 07, 2013

xfstests case 285 complains.

It it because btrfs did not try to find unwritten delalloc
bytes(only dirty pages, not yet writeback) behind prealloc
extents, it ends up finding nothing while we're with SEEK_DATA.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

f9e4fb53

Btrfs: fix off-by-one in lseek · 1214b53f

Liu Bo authored Jan 07, 2013

Lock end is inclusive.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

1214b53f

Btrfs: reset path lock state to zero · 3268a246

Liu Bo authored Dec 28, 2012

We forgot to reset the path lock state to zero after we unlock the path block,
and this can lead to the ASSERT checker in tree unlock API.
Reported-by: Slava Barinov <rayslava@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

3268a246

Btrfs: let allocation start from the right raid type · ac5c9300

Liu Bo authored Dec 27, 2012

This'd avoid us empty looping.

Say we have only one disk and the metadata raid type will be defaultly DUP,
and we do not need to start from index=0(RAID10) and get over two empty
loops to index=2(DUP).
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

ac5c9300

Btrfs: add orphan before truncating pagecache · f3fe820c

Josef Bacik authored Jan 07, 2013

Running xfstests 83 in a loop would sometimes fail the fsck. This happens
because if we invalidate a page that already has an ordered extent setup for
it we will complete the ordered extent ourselves, assuming that the truncate
will clean everything up. The problem with this is there is plenty of time
for the truncate to fail after we've done this work. So to fix this we need
to add the orphan item first to make sure the cleanup gets done properly,
and then we can truncate the pagecache and all that stuff and be safe. This
fixes the btrfsck failures I was seeing while running 83 in a loop. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

f3fe820c

Btrfs: set flushing if we're limited flushing · 72bcd99d

Josef Bacik authored Dec 18, 2012

We still need to say we're flushing if we're limit flushing to keep somebody
from coming in and stealing our reservation.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

72bcd99d

Btrfs: fix missing write access release in btrfs_ioctl_resize() · 97547676

Miao Xie authored Dec 21, 2012

We forget to give up the write access after we find some device operation
is going on. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

97547676

Btrfs: fix resize a readonly device · dba60f3f

Miao Xie authored Dec 21, 2012

We should not resize a readonly device, fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

dba60f3f

Btrfs: do not delete a subvolume which is in a R/O subvolume · 5c39da5b

Miao Xie authored Oct 22, 2012

Step to reproduce:
 # mkfs.btrfs <disk>
 # mount <disk> <mnt>
 # btrfs sub create <mnt>/subv0
 # btrfs sub snap <mnt> <mnt>/subv0/snap0
 # change <mnt>/subv0 from R/W to R/O
 # btrfs sub del <mnt>/subv0/snap0

We deleted the snapshot successfully. I think we should not be able to delete
the snapshot since the parent subvolume is R/O.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

5c39da5b

Btrfs: disable qgroup id 0 · d86e56cf

Miao Xie authored Nov 15, 2012

Qgroup id 0 is a special number, we should set the id of a qgroup to 0.
Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

d86e56cf

btrfs: get the device in write mode when deleting it · cc975eb4

Lukas Czerner authored Dec 07, 2012

When we're deleting the device we should get it in write mode since
we're going to re-write the super block magic on that device. And it
should fail if the device is read-only.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>

cc975eb4

Btrfs: fix memory leak in name_cache_insert() · cfa7a9cc

Tsutomu Itoh authored Dec 17, 2012

We should free name_cache_entry before returning from the
error handling code.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>

cfa7a9cc

19 Dec, 2012 1 commit

Revert "Btrfs: reorder tree mod log operations in deleting a pointer" · 57ba86c0

Chris Mason authored Dec 18, 2012

This reverts commit 6a7a665d.

This was bug was fixed differently in 3.6, so this commit
isn't needed.

Conflicts:
	fs/btrfs/ctree.c
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

57ba86c0

18 Dec, 2012 1 commit

Revert "Btrfs: MOD_LOG_KEY_REMOVE_WHILE_MOVING never change node's nritems" · 4c3e6969

Chris Mason authored Dec 18, 2012

This reverts commit 95c80bb1.

The bug addressed by this commit was fixed differently back in 3.6
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

4c3e6969

17 Dec, 2012 23 commits

Btrfs: fix a bug of per-file nocow · 213490b3

Liu Bo authored Sep 11, 2012

Users report a bug, the reproducer is:
$ mkfs.btrfs /dev/loop0
$ mount /dev/loop0 /mnt/btrfs/
$ mkdir /mnt/btrfs/dir
$ chattr +C /mnt/btrfs/dir/
$ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=10;
$ lsattr /mnt/btrfs/dir/foo
---------------C- /mnt/btrfs/dir/foo
$ filefrag /mnt/btrfs/dir/foo
/mnt/btrfs/dir/foo: 1 extent found    ---> an extent
$ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=1 seek=5 conv=notrunc,nocreat; sync
$ filefrag /mnt/btrfs/dir/foo
/mnt/btrfs/dir/foo: 3 extents found   ---> with nocow, btrfs breaks the extent into three parts

The new created file should not only inherit the NODATACOW flag, but also
honor NODATASUM flag, because we must do COW on a file extent with checksum.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

213490b3

Btrfs: fix hash overflow handling · 9c52057c

Chris Mason authored Dec 17, 2012

The handling for directory crc hash overflows was fairly obscure,
split_leaf returns EOVERFLOW when we try to extend the item and that is
supposed to bubble up to userland.  For a while it did so, but along the
way we added better handling of errors and forced the FS readonly if we
hit IO errors during the directory insertion.

Along the way, we started testing only for EEXIST and the EOVERFLOW case
was dropped.  The end result is that we may force the FS readonly if we
catch a directory hash bucket overflow.

This fixes a few problem spots.  First I add tests for EOVERFLOW in the
places where we can safely just return the error up the chain.

btrfs_rename is harder though, because it tries to insert the new
directory item only after it has already unlinked anything the rename
was going to overwrite.  Rather than adding very complex logic, I added
a helper to test for the hash overflow case early while it is still safe
to bail out.

Snapshot and subvolume creation had a similar problem, so they are using
the new helper now too.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Reported-by: Pascal Junod <pascal@junod.info>

9c52057c

Btrfs: don't take inode delalloc mutex if we're a free space inode · c64c2bd8

Josef Bacik authored Dec 14, 2012

This confuses and angers lockdep even though it's ok.  We don't really need
the lock for free space inodes since only the transaction committer will be
reserving space.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

c64c2bd8

Btrfs: fix autodefrag and umount lockup · 1135d6df

Josef Bacik authored Dec 14, 2012

This happens because writeback_inodes_sb_nr_if_idle does down_read.  This
doesn't work for us and it has not been fixed upstream yet, so do it
ourselves and use that instead so we can stop having this stupid long
standing lockup.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

1135d6df

Btrfs: fix permissions of empty files not affected by umask · 9185aa58

Filipe Brandenburger authored Nov 30, 2012

When a new file is created with btrfs_create(), the inode will initially be
created with permissions 0666 and later on in btrfs_init_acl() it will be
adapted to mask out the umask bits. The problem is that this change won't make
it into the btrfs_inode unless there's another change to the inode (e.g. writing
content changing the size or touching the file changing the mtime.)

This fix adds a call to btrfs_update_inode() to btrfs_create() to make sure that
the change will not get lost if the in-memory inode is flushed before other
changes are made to the file.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

9185aa58

Btrfs: put raid properties into global table · 31e50229

Liu Bo authored Nov 21, 2012

Raid properties can be shared among raid calculation code, we can put
them into a global table to keep it simple.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

31e50229

Btrfs: fix BUG() in scrub when first superblock reading gives EIO · 4ded4f63

Stefan Behrens authored Nov 14, 2012

This fixes a very special case that can be reproduced by just
disconnecting a disk at runtime, and without unmounting the
filesystem first, start scrub on the filesystem with the
disconnected disk. All read and write EIOs are handled
correctly, only the first superblock is an exception and gives
a BUG() in a subfunction. The BUG() is correct, it would crash
later otherwise. The subfunction must not be called for
superblocks and this is what the fix changes.
Reported-by: Joeri Vanthienen <mail@joerivanthienen.be>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

4ded4f63

Btrfs: do not call file_update_time in aio_write · 6c760c07

Josef Bacik authored Nov 09, 2012

This starts a transaction and dirties the inode everytime we call it, which
is super expensive if you have a write heavy workload. We will be updating
the inode when the IO completes and we reserve the space for the inode
update when we reserve space for the write, so there is no chance of loss of
information or enospc issues. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

6c760c07

Btrfs: only unlock and relock if we have to · 5124e00e

Josef Bacik authored Nov 07, 2012

I noticed while doing fsync tests that we were always dropping the path and
re-searching when we first cow the log root even though we've already gotten
the write lock on the root. That's because we don't take into account that
there might not be a parent node, so fix the check to make sure there is
actually a parent node before we undo all of this work for nothing. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

5124e00e

Btrfs: use tokens where we can in the tree log · 0b1c6cca

Josef Bacik authored Oct 23, 2012

If we are syncing over and over the overhead of doing all those maps in
fill_inode_item and log_changed_extents really starts to hurt, so use map
tokens so we can avoid all the extra mapping. Since the token maps from our
offset to the end of the page make sure to set the first thing in the item
first so we really only do one map. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

0b1c6cca

Btrfs: optimize leaf_space_used · 41be1f3b

Josef Bacik authored Oct 15, 2012

This gets called at least 4 times for every level while adding an object,
and it involves 3 kmapping calls, which on my box take about 5us a piece.
So instead use a token, which brings us down to 1 kmap call and makes this
function take 1/3 of the time per call. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

41be1f3b

Btrfs: don't memset new tokens · ad914559

Josef Bacik authored Oct 15, 2012

Our token logic depends on token->kaddr being set, and if it is not it sets
everything properly as needed.  So instead of memsetting just set
token->kaddr to NULL.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

ad914559

Btrfs: only clear dirty on the buffer if it is marked as dirty · ed7b63eb

Josef Bacik authored Oct 15, 2012

No reason to set the path blocking or loop through all of the pages if the
extent buffer isn't actually marked dirty.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

ed7b63eb

Btrfs: move checks in set_page_dirty under DEBUG · bb146eb2

Josef Bacik authored Oct 15, 2012

This is a high traffic function, let's try and do as little as possible
during normal operations shall we?
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

bb146eb2

Btrfs: log changed inodes based on the extent map tree · 70c8a91c

Josef Bacik authored Oct 11, 2012

We don't really need to copy extents from the source tree since we have all
of the information already available to us in the extent_map tree.  So
instead just write the extents straight to the log tree and don't bother to
copy the extent items from the source tree.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

70c8a91c

Btrfs: add path->really_keep_locks · d6393786

Josef Bacik authored Dec 12, 2012

You'd think path->keep_locks would keep all the locks wouldn't you? You'd
be wrong. It only keeps them if the slot is pointing to the last item in
the node. This is for use with btrfs_next_leaf, which needs this sort of
thing. But the horrible horrible things I'm going to do to the tree log
means I really need everything held from root to leaf so I can add and
delete items in the same search. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

d6393786

Btrfs: do not mark ems as prealloc if we are writing to them · b11e234d

Josef Bacik authored Dec 03, 2012

We are going to use EM's to log extents in the future, so we need to not
mark them as prealloc if they aren't actually prealloc extents. Instead
mark them with FILLING so we know to ammend mod_start/mod_len and that way
we don't confuse the extent logging code. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

b11e234d

Btrfs: keep track of the extents original block length · b4939680

Josef Bacik authored Dec 03, 2012

If we've written to a prealloc extent we need to know the original block len
for the extent. We can't figure this out currently since ->block_len is
just set to the extent length. So introduce ->orig_block_len so that we
know how many bytes were in the original extent for proper extent logging
that future patches will need. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

b4939680

Btrfs: inline csums if we're fsyncing · b812ce28

Josef Bacik authored Nov 16, 2012

The tree logging stuff needs the csums to be on the ordered extents in order
to log them properly, so mark that we're sync and inline the csum creation
so we don't have to wait on the csumming to be done when logging extents
that are still in flight.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

b812ce28

Btrfs: don't bother copying if we're only logging the inode · a95249b3

Josef Bacik authored Oct 11, 2012

We don't copy inode items anwyay, we just copy them straight into the log
from the in memory inode. So if we know we're only logging the inode, don't
bother dropping anything, just try to insert it and either if it succeeds or
we get EEXIST we can update the inode item in the log and carry on. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

a95249b3

Btrfs: only log the inode item if we can get away with it · e9976151

Josef Bacik authored Oct 11, 2012

Currently we copy all the file information into the log, inode item, the
refs, xattrs etc. Except most of this doesn't change from fsync to fsync,
just the inode item changes. So set a flag if an xattr changes or a link is
added, and otherwise only log the inode item. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

e9976151

Btrfs: rename root_times_lock to root_item_lock · 5f3ab90a

Anand Jain authored Dec 07, 2012

Originally root_times_lock was introduced as part of send/receive
code however newly developed patch to label the subvol reused
the same lock, so renaming it for a meaningful name.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

5f3ab90a

btrfs: Notify udev when removing device · b8b8ff59

Lukas Czerner authored Dec 06, 2012

Currently udev does not know about the device being removed from the
file system. This may result in the situation where we're unable to
mount the file system by UUID or by LABEL because the by-uuid and
by-label links may still point to the device which is no longer part of
the btrfs file system and hence does not have any btrfs super block.

It can be easily reproduced by the following:

mkfs.btrfs -L bugfs /dev/loop[0-6]
mount /dev/loop0 /mnt/test
btrfs device delete /dev/loop0 /mnt/test
umount /mnt/test

mount LABEL=bugfs /mnt/test <---- this fails

then see:

ls -l /dev/disk/by-label/bugfs

which will still point to the /dev/loop0

We did not noticed this before because libblkid would send the udev
event for us when it notice that the link does not fit the reality,
however it does not do that anymore and completely relies on udev
information.

Fix this by sending the KOBJ_CHANGE event to the bdev kobject after
successful device removal.

Note that this does not affect device addition, because we will open the
device prior the addition from userspace and udev will notice that and
reread the device afterwards.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

b8b8ff59