Commits · 8e52acf70459020d7e9e9fda25066be4da520943 · nexedi / linux

18 Apr, 2012 6 commits

Btrfs: retrurn void from clear_state_bit · 8e52acf7

Li Zefan authored Mar 12, 2012

Currently it returns a set of bits that were cleared, but this return
value is not used at all.

Moreover it doesn't seem to be useful, because we may clear the bits
of a few extent_states, but only the cleared bits of last one is
returned.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>

8e52acf7

btrfs: add missing unlocks to transaction abort paths · 871383be

David Sterba authored Apr 02, 2012

Added in commit 49b25e05
("btrfs: enhance transaction abort infrastructure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>

871383be

Btrfs: do not mount when we have a sectorsize unequal to PAGE_SIZE · 8d082fb7

Liu Bo authored Apr 03, 2012

Our code is not ready to cope with a sectorsize that's not equal to PAGE_SIZE.
It will lead to hanging-on while writing something.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>

8d082fb7

btrfs: don't add both copies of DUP to reada extent tree · 207a232c

Arne Jansen authored Feb 25, 2012

Normally when there are 2 copies of a block, we add both to the
reada extent tree and prefetch only the one that is easier to reach.
This way we can better utilize multiple devices.
In case of DUP this makes no sense as both copies reside on the
same device.
Signed-off-by: Arne Jansen <sensille@gmx.net>

207a232c

btrfs: fix race in reada · 8c9c2bf7

Arne Jansen authored Feb 25, 2012

When inserting into the radix tree returns EEXIST, get the existing
entry without giving up the spinlock in between.
There was a race for both the zones trees and the extent tree.
Signed-off-by: Arne Jansen <sensille@gmx.net>

8c9c2bf7

Btrfs: avoid setting ->d_op twice · 848cce0d

Li Zefan authored Feb 21, 2012

Follow those instructions, and you'll trigger a warning in the
beginning of d_set_d_op():

  # mkfs.btrfs /dev/loop3
  # mount /dev/loop3 /mnt
  # btrfs sub create /mnt/sub
  # btrfs sub snap /mnt /mnt/snap
  # touch /mnt/snap/sub
  touch: cannot touch `tmp': Permission denied

__d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and
then simple_lookup() reset it to simple_dentry_operations, which
triggered the warning.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>

848cce0d

13 Apr, 2012 1 commit

Btrfs: use commit root when loading free space cache · d53ba474

Josef Bacik authored Apr 12, 2012

A user reported that booting his box up with btrfs root on 3.4 was way
slower than on 3.3 because I removed the ideal caching code. It turns out
that we don't load the free space cache if we're in a commit for deadlock
reasons, but since we're reading the cache and it hasn't changed yet we are
safe reading the inode and free space item from the commit root, so do that
and remove all of the deadlock checks so we don't unnecessarily skip loading
the free space cache. The user reported this fixed the slowness. Thanks,
Tested-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d53ba474

12 Apr, 2012 6 commits

Btrfs: fix use-after-free in __btrfs_end_transaction · 4edc2ca3

Dave Jones authored Apr 12, 2012

49b25e05 introduced a use-after-free bug
that caused spurious -EIO's to be returned.

Do the check before we free the transaction.

Cc: David Sterba <dsterba@suse.cz>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4edc2ca3

Btrfs: check return value of bio_alloc() properly · e627ee7b

Tsutomu Itoh authored Apr 12, 2012

bio_alloc() has the possibility of returning NULL.
So, it is necessary to check the return value.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e627ee7b

Btrfs: remove lock assert from get_restripe_target() · c6664b42

Ilya Dryomov authored Apr 12, 2012

This fixes a regression introduced by fc67c450.  spin_is_locked() always
returns 0 on UP kernels, which caused assert in get_restripe_target() to
be fired on every call from btrfs_reduce_alloc_profile() on UP systems.
Remove it completely for now, it's not clear if it's going to be needed
in future.
Reported-by: Bobby Powers <bobbypowers@gmail.com>
Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Tested-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c6664b42

Btrfs: fix eof while discarding extents · b89203f7

Liu Bo authored Apr 12, 2012

We miscalculate the length of extents we're discarding, and it leads to
an eof of device.
Reported-by: Daniel Blueman <daniel@quora.org>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b89203f7

Btrfs: fix uninit variable in repair_eb_io_failure · d95603b2

Chris Mason authored Apr 12, 2012

We'd have to be passing bogus extent buffers for this uninit variable to
actually be used, but set it to zero just in case.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d95603b2

Revert "Btrfs: increase the global block reserve estimates" · 8e62c2de

Chris Mason authored Apr 12, 2012

This reverts commit 5500cdbe.

We've had a number of complaints of early enospc that bisect down
to this patch.  We'll hae to fix the reservations differently.

CC: stable@kernel.org
Signed-off-by: Chris Mason <chris.mason@oracle.com>

8e62c2de

29 Mar, 2012 16 commits

Btrfs: update the checks for mixed block groups with big metadata blocks · bc3f116f

Chris Mason authored Mar 29, 2012

Dave Sterba had put in patches to look for mixed data/metadata groups
with metadata bigger than 4KB.  But these ended up in the wrong place
and it wasn't testing the feature flag correctly.

This updates the tests to make sure our sizes are matching
Signed-off-by: Chris Mason <chris.mason@oracle.com>

bc3f116f

Btrfs: update to the right index of defragment · e1f041e1

Liu Bo authored Mar 29, 2012

When we use autodefrag, we forget to update the index which indicates
the last page we've dirty.  And we'll set dirty flags on a same set of
pages again and again.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e1f041e1

Btrfs: do not bother to defrag an extent if it is a big real extent · 66c26892

Liu Bo authored Mar 29, 2012

$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag
$ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0     3072              10 eof
/mnt/btrfs/foobar: 1 extent found

Now we have a big real extent [0, 40960), but autodefrag will still defrag it.

$ sync
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0     3082              10 eof
/mnt/btrfs/foobar: 1 extent found

So if we already find a big real extent, we're ok about that, just skip it.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

66c26892

Btrfs: add a check to decide if we should defrag the range · 17ce6ef8

Liu Bo authored Mar 29, 2012

If our file's layout is as follows:
| hole | data1 | hole | data2 |

we do not need to defrag this file, because this file has holes and
cannot be merged into one extent.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

17ce6ef8

Btrfs: fix recursive defragment with autodefrag option · 4cb13e5d

Liu Bo authored Mar 29, 2012

$ mkfs.btrfs disk
$ mount disk /mnt -o autodefrag
$ dd if=/dev/zero of=/mnt/foobar bs=4k count=10 2>/dev/null && sync
$ for i in `seq 9 -2 0`; do dd if=/dev/zero of=/mnt/foobar bs=4k count=1 \
  seek=$i conv=notrunc 2> /dev/null; done && sync

then we'll get to defrag "foobar" again and again.
So does option "-o autodefrag,compress".

Reasons:
When the cleaner kthread gets to fetch inodes from the defrag tree and defrag
them, it will dirty pages and submit them, this will comes to another DATA COW
where the processing inode will be inserted to the defrag tree again.

This patch sets a rule for COW code, i.e. insert an inode when we're really
going to make some defragments.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4cb13e5d

Btrfs: fix the mismatch of page->mapping · 1f12bd06

Liu Bo authored Mar 29, 2012

commit 600a45e1
(Btrfs: fix deadlock on page lock when doing auto-defragment)
fixes the deadlock on page, but it also introduces another bug.

A page may have been truncated after unlock & lock.
So we need to find it again to get the right one.

And since we've held i_mutex lock, inode size remains unchanged and
we can drop isize overflow checks.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1f12bd06

Btrfs: fix race between direct io and autodefrag · ecb8bea8

Liu Bo authored Mar 29, 2012

The bug is from running xfstests 209 with autodefrag.

The race is as follows:
       t1                       t2(autodefrag)
   direct IO
     invalidate pagecache
     dio(old data)             add_inode_defrag
     invalidate pagecache
   endio

   direct IO
     invalidate pagecache
                                run_defrag
                                  readpage(old data)
                                  set page dirty (old data)
     dio(new data, rewrite)
     invalidate pagecache (*)
     endio

t2(autodefrag) will get old data into pagecache via readpage and set
pagecache dirty.  Meanwhile, invalidate pagecache(*) will fail due to
dirty flags in pages.  So the old data may be flushed into disk by
flush thread, which will lead to data loss.

And so does the case of user defragment progs.

The patch fixes this race by holding i_mutex when we readpage and set page dirty.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ecb8bea8

Btrfs: fix deadlock during allocating chunks · 15d1ff81

Liu Bo authored Mar 29, 2012

This deadlock comes from xfstests 251.

We'll hold the chunk_mutex throughout the whole of a chunk allocation.
But if we find that we've used up system chunk space, we need to allocate a
new system chunk, but this will lead to a recursion of chunk allocation and end
up with a deadlock on chunk_mutex.
So instead we need to allocate the system chunk first if we find we're in ENOSPC.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

15d1ff81

Btrfs: show useful info in space reservation tracepoint · 2bcc0328

Liu Bo authored Mar 29, 2012

o For space info, the type of space info is useful for debug.
o For transaction handle, its transid is useful.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2bcc0328

Btrfs: don't use crc items bigger than 4KB · 7ca4be45

Chris Mason authored Jan 31, 2012

With the big metadata blocks, we can have crc items
that are much bigger than a page.  There are a few
places that we try to kmalloc memory to hold the
items during a split.

Items bigger than 4KB don't really have a huge benefit
in efficiency, but they do trigger larger order allocations.
This commits changes the csums to make sure they stay under
4KB.  This is not a format change, just a #define to limit
huge items.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

7ca4be45

Btrfs: flush out and clean up any block device pages during mount · 3c4bb26b

Chris Mason authored Mar 27, 2012

Btrfs puts the filesystem metadata into its own address space, and
somehow the block device address space isn't getting onto disk properly
before a mount.  The end result is that a loop of mkfs and mounting the
filesystem will sometimes find stale or incorrect data.

This commit should fix it by sprinkling fdatawrites and invalidate_bdev
calls around.  This is a short term measure to make sure it is fixed.
The block devices really should be flushed and cleaned up higher in the
stack.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3c4bb26b

Merge git://git.jan-o-sch.net/btrfs-unstable into for-linus · 98961a7e
Chris Mason authored Mar 28, 2012
```
Conflicts:
	fs/btrfs/transaction.c
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
98961a7e
Merge branch 'for-chris' of git://github.com/idryomov/btrfs-unstable into for-linus · 1c691b33
Chris Mason authored Mar 28, 2012

1c691b33

Merge branch 'error-handling' into for-linus · 1d4284bd

Chris Mason authored Mar 28, 2012

Conflicts:
	fs/btrfs/ctree.c
	fs/btrfs/disk-io.c
	fs/btrfs/extent-tree.c
	fs/btrfs/extent_io.c
	fs/btrfs/extent_io.h
	fs/btrfs/inode.c
	fs/btrfs/scrub.c
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1d4284bd

btrfs: disallow unequal data/metadata blocksize for mixed block groups · 65139ed9

David Sterba authored Feb 17, 2012

With support for bigger metadata blocks, we must avoid mounting a
filesystem with different block size for mixed block groups, this causes
corruption (found by xfstests/083).
Signed-off-by: David Sterba <dsterba@suse.cz>

65139ed9

Btrfs: enhance superblock sanity checks · fcd1f065

David Sterba authored Mar 06, 2012

Validate checksum algorithm during mount and prevent BUG_ON later in
btrfs_super_csum_size.
Signed-off-by: David Sterba <dsterba@suse.cz>

fcd1f065

27 Mar, 2012 11 commits

Btrfs: change scrub to support big blocks · b5d67f64

Stefan Behrens authored Mar 27, 2012

Scrub used to be coded for nodesize == leafsize == sectorsize == PAGE_SIZE.
This is now changed to support sizes for nodesize and leafsize which are
N * PAGE_SIZE.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b5d67f64

Btrfs: minor cleanup in scrub · 1623edeb

Stefan Behrens authored Mar 27, 2012

Just a minor cleanup commit in preparation for the big block changes.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1623edeb

Btrfs: introduce common define for max number of mirrors · 94598ba8

Stefan Behrens authored Mar 27, 2012

Readahead already has a define for the max number of mirrors. Scrub
needs such a define now, the rest of the code will need something
like this soon. Therefore the define was added to ctree.h and removed
from the readahead code.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

94598ba8

Btrfs: fix infinite loop in btrfs_shrink_device() · 213e64da

Ilya Dryomov authored Mar 27, 2012

If relocate of block group 0 fails with ENOSPC we end up infinitely
looping because key.offset -= 1 statement in that case brings us back to
where we started.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

213e64da

Btrfs: fix memory leak in resolver code · 5eb56d25

Ilya Dryomov authored Mar 27, 2012

init_ipath() allocates btrfs_data_container which is never freed.  Free
it in free_ipath() and nuke the comment for init_data_container() - we
can safely free it with kfree().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

5eb56d25

Btrfs: allow dup for data chunks in mixed mode · e4837f8f

Ilya Dryomov authored Mar 27, 2012

Generally we don't allow dup for data, but mixed chunks are special and
people seem to think this has its use cases.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

e4837f8f

Btrfs: validate target profiles only if we are going to use them · 6728b198

Ilya Dryomov authored Mar 27, 2012

Do not run sanity checks on all target profiles unless they all will be
used.  This came up because alloc_profile_is_valid() is now more strict
than it used to be.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

6728b198

Btrfs: improve the logic in btrfs_can_relocate() · 4a5e98f5

Ilya Dryomov authored Mar 27, 2012

Currently if we don't have enough space allocated we go ahead and loop
though devices in the hopes of finding enough space for a chunk of the
*same* type as the one we are trying to relocate.  The problem with that
is that if we are trying to restripe the chunk its target type can be
more relaxed than the current one (eg require less devices or less
space).  So, when restriping, run checks against the target profile
instead of the current one.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

4a5e98f5

Btrfs: add __get_block_group_index() helper · 7738a53a

Ilya Dryomov authored Mar 27, 2012

Add __get_block_group_index() helper to be able to derive block group
index from an arbitary set of flags.  Implement get_block_group_index()
in terms of it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

7738a53a

Btrfs: add get_restripe_target() helper · fc67c450

Ilya Dryomov authored Mar 27, 2012

Add get_restripe_target() helper and switch everybody to use it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

fc67c450

Btrfs: move alloc_profile_is_valid() to volumes.c · 0c460c0d

Ilya Dryomov authored Mar 27, 2012

Header file is not a good place to define functions.  This also moves a
call to alloc_profile_is_valid() down the stack and removes a redundant
check from __btrfs_alloc_chunk() - alloc_profile_is_valid() takes it
into account.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

0c460c0d