Commits · 4b9465cb9e3859186eefa1ca3b990a5849386320 · nexedi / linux

04 Jun, 2011 9 commits

Btrfs: add mount -o inode_cache · 4b9465cb

Chris Mason authored Jun 03, 2011

This makes the inode map cache default to off until we
fix the overflow problem when the free space crcs don't fit
inside a single page.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4b9465cb

btrfs: scrub: add explicit plugging · e7786c3a

Arne Jansen authored May 28, 2011

With the removal of the implicit plugging scrub ends up doing more and
smaller I/O than necessary. This patch adds explicit plugging per chunk.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e7786c3a

btrfs: use btrfs_ino to access inode number · a4689d2b

David Sterba authored May 31, 2011

commit 4cb5300b ("Btrfs: add mount -o auto_defrag") accesses inode
number directly while it should use the helper with the new inode
number allocator.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

a4689d2b

Btrfs: don't save the inode cache if we are deleting this root · d132a538

Josef Bacik authored May 31, 2011

With xfstest 254 I can panic the box every time with the inode number caching
stuff on. This is because we clean the inodes out when we delete the subvolume,
but then we write out the inode cache which adds an inode to the subvolume inode
tree, and then when it gets evicted again the root gets added back on the dead
roots list and is deleted again, so we have a double free. To stop this from
happening just return 0 if refs is 0 (and we're not the tree root since tree
root always has refs of 0). With this fix 254 no longer panics. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Tested-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d132a538

btrfs: false BUG_ON when degraded · 5f3f302a

Arne Jansen authored May 30, 2011

In degraded mode the struct btrfs_device of missing devs don't have
device->name set. A kstrdup of NULL correctly returns NULL. Don't
BUG in this case.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5f3f302a

Btrfs: don't save the inode cache in non-FS roots · ca456ae2

liubo authored Jun 01, 2011

This adds extra checks to make sure the inode map we are caching really
belongs to a FS root instead of a special relocation tree.  It
prevents crashes during balancing operations.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ca456ae2

Btrfs: make sure we don't overflow the free space cache crc page · 211f96c2

Chris Mason authored Jun 03, 2011

The free space cache uses only one page for crcs right now,
which means we can't have a cache file bigger than the
crcs we can fit in the first page.  This adds a check to
enforce that restriction.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

211f96c2

Btrfs: fix uninit variable in the delayed inode code · 17aca1c9
Chris Mason authored Jun 03, 2011
```
The nitems counter needs to start at zero
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
17aca1c9

btrfs: scrub: don't reuse bios and pages · 1bc87793

Arne Jansen authored May 28, 2011

The current scrub implementation reuses bios and pages as often as possible,
allocating them only on start and releasing them when finished. This leads
to more problems with the block layer than it's worth. The elevator gets
confused when there are more pages added to the bio than bi_size suggests.
This patch completely rips out the reuse of bios and pages and allocates
them freshly for each submit.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Maosn <chris.mason@oracle.com>

1bc87793

28 May, 2011 1 commit

Merge branch 'for-chris' of · ff5714cc

Chris Mason authored May 28, 2011

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus

Conflicts:
	fs/btrfs/disk-io.c
	fs/btrfs/extent-tree.c
	fs/btrfs/free-space-cache.c
	fs/btrfs/inode.c
	fs/btrfs/transaction.c
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ff5714cc

27 May, 2011 1 commit

Btrfs: use the device_list_mutex during write_dev_supers · 174ba509

Chris Mason authored May 27, 2011

write_dev_supers was changed to use RCU to protect the list of
devices, but it was then sleeping while it actually wrote the supers.
This fixes it to just use the mutex, since we really don't any
concurrency in write_dev_supers anyway.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

174ba509

26 May, 2011 4 commits

Btrfs: setup free ino caching in a more asynchronous way · a47d6b70

Li Zefan authored May 26, 2011

For a filesystem that has lots of files in it, the first time we mount
it with free ino caching support, it can take quite a long time to
setup the caching before we can create new files.

Here we fill the cache with [highest_ino, BTRFS_LAST_FREE_OBJECTID]
before we start the caching thread to search through the extent tree.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

a47d6b70

btrfs scrub: don't coalesce pages that are logically discontiguous · 00d01bc1

Arne Jansen authored May 25, 2011

scrub_page collects several pages into one bio as long as they are physically
contiguous. As we only save one logical address for the whole bio, don't
collect pages that are physically contiguous but logically discontiguous.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

00d01bc1

Btrfs: return -ENOMEM in clear_extent_bit · c309df07

Chris Mason authored May 26, 2011

The btrfs releasepage function depends on ENOMEM coming
back when it is called atomic.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c309df07

Btrfs: add mount -o auto_defrag · 4cb5300b

Chris Mason authored May 24, 2011

This will detect small random writes into files and
queue the up for an auto defrag process.  It isn't well suited to
database workloads yet, but works for smaller files such as rpm, sqlite
or bdb databases.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4cb5300b

23 May, 2011 25 commits

Merge branch 'cleanups_and_fixes' into inode_numbers · d6c0cb37

Chris Mason authored May 23, 2011

Conflicts:
	fs/btrfs/tree-log.c
	fs/btrfs/volumes.c
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d6c0cb37

Btrfs: using rcu lock in the reader side of devices list · 1f78160c

Xiao Guangrong authored Apr 20, 2011

fs_devices->devices is only updated on remove and add device paths, so we can
use rcu to protect it in the reader side
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1f78160c

Btrfs: drop unnecessary device lock · 46224705

Xiao Guangrong authored Apr 20, 2011

Drop device_list_mutex for the reader side  on clone_fs_devices and
btrfs_rm_device pathes since the fs_info->volume_mutex can ensure the device
list is not updated

btrfs_close_extra_devices is the initialized path, we can not add or remove
device at this time, so we can simply drop the mutex safely, like other
initialized function does(add_missing_dev, __find_device, __btrfs_open_devices
...).
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

46224705

Btrfs: fix the race between remove dev and alloc chunk · 0c1daee0

Xiao Guangrong authored Apr 20, 2011

On remove device path, it updates device->dev_alloc_list but does not hold
chunk lock
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

0c1daee0

Btrfs: fix the race between reading and updating devices · c9513edb

Xiao Guangrong authored Apr 20, 2011

On btrfs_congested_fn and __unplug_io_fn paths, we should hold
device_list_mutex to avoid remove/add device path to
update fs_devices->devices

On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in
fs_devices->devices or fs_devices->devices is updated, so we should hold
the mutex to avoid the reader side to reach them
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c9513edb

Btrfs: fix bh leak on __btrfs_open_devices path · 4f6c9328

Xiao Guangrong authored Apr 20, 2011

'bh' is forgot to release if no error is detected
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4f6c9328

Btrfs: fix unsafe usage of merge_state · c7f895a2

Xiao Guangrong authored Apr 20, 2011

merge_state can free the current state if it can be merged with the next node,
but in set_extent_bit(), after merge_state, we still use the current extent to
get the next node and cache it into cached_state
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c7f895a2

Btrfs: allocate extent state and check the result properly · 8233767a

Xiao Guangrong authored Apr 20, 2011

It doesn't allocate extent_state and check the result properly:
- in set_extent_bit, it doesn't allocate extent_state if the path is not
  allowed wait

- in clear_extent_bit, it doesn't check the result after atomic-ly allocate,
  we trigger BUG_ON() if it's fail

- if allocate fail, we trigger BUG_ON instead of returning -ENOMEM since
  the return value of clear_extent_bit() is ignored by many callers
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

8233767a

fs/btrfs: Add missing btrfs_free_path · b0839166

Julia Lawall authored May 14, 2011

Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
local idexpression struct btrfs_path * x;
expression ra,rb;
position p1,p2;
@@

x = btrfs_alloc_path@p1(...)
...  when != btrfs_free_path(x,...)
     when != if (...) { ... btrfs_free_path(x,...) ...}
     when != x = ra
if(...) { ... when != x = rb
     when forall
     when != btrfs_free_path(x,...)
 \(return <+...x...+>; \| return@p2...; \) }

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

cocci.print_main("alloc",p1)
cocci.print_secs("return",p2)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b0839166

Btrfs: check return value of btrfs_inc_extent_ref() · 37daa4f9

Tsutomu Itoh authored Apr 28, 2011

If return value of btrfs_inc_extent_ref() is not 0, BUG() is called.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

37daa4f9

Btrfs: return error to caller if read_one_inode() fails · c00e9493

Tsutomu Itoh authored Apr 28, 2011

When read_one_inode() fails, error code is returned to caller instead
of BUG_ON().
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c00e9493

Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item · 1cd30799

Tsutomu Itoh authored May 19, 2011

Currently, btrfs_truncate_item and btrfs_extend_item returns only 0.
So, the check by BUG_ON in the caller is unnecessary.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1cd30799

Btrfs: return error code to caller when btrfs_del_item fails · 65a246c5

Tsutomu Itoh authored May 19, 2011

The error code is returned instead of calling BUG_ON when
btrfs_del_item returns the error.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

65a246c5

Btrfs: return error code to caller when btrfs_previous_item fails · b0b802d7

Tsutomu Itoh authored May 19, 2011

The error code is returned instead of calling BUG_ON when
btrfs_previous_item returns the error.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b0b802d7

btrfs: fix typo 'testeing' -> 'testing' · 27160b6b

Sergei Trofimovich authored May 20, 2011

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

27160b6b

btrfs: typo: 'btrfS' -> 'btrfs' · 9694b3fc

Sergei Trofimovich authored May 20, 2011

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9694b3fc

btrfs: don't spin in shrink_delalloc if there is nothing to free · c4f675cd

Sergei Trofimovich authored May 20, 2011

Observed as a large delay when --mixed filesystem is filled up.
Test example:
1. create tiny --mixed FS:
   $ dd if=/dev/zero of=2G.img seek=$((2048 * 1024 * 1024 - 1)) count=1 bs=1
   $ mkfs.btrfs --mixed 2G.img
   $ mount -oloop 2G.img /mnt/ut/
2. Try to fill it up:
   $ dd if=/dev/urandom of=10M.file bs=10240 count=1024
   $ seq 1 256 | while read file_no; do echo $file_no; time cp 10M.file ${file_no}.copy; done

Up to '200.copy' it goes fast, but when disk fills-up each -ENOSPC
message takes 3 seconds to pop-up _every_ ENOSPC (and in usermode linux
it's even more: 30-60 seconds!). (Maybe, time depends on kernel's timer resolution).

No IO, no CPU load, just rescheduling. Some debugging revealed busy spinning
in shrink_delalloc.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c4f675cd

btrfs: Delete unused version.sh script. · 0f3b708c

Jamey Sharp authored May 05, 2011

In 2008, commit b4f6c45d dropped the use
of fs/btrfs/version.sh, but left the script behind. Kill it.

Commit by Jamey Sharp and Josh Triplett.
Signed-off-by: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

0f3b708c

btrfs: Ensure the tree search ioctl returns the right number of records · e2156867

Hugo Mills authored May 14, 2011

Btrfs's tree search ioctl has a field to indicate that no more than a
given number of records should be returned. The ioctl doesn't honour
this, as the tested value is not incremented until the end of the
copy_to_sk function. This patch removes an unnecessary local variable,
and updates the num_found counter as each key is found in the tree.
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e2156867

BTRFS: Remove unused node_lock · 0956c798

Andi Kleen authored May 18, 2011

240f62c8 replaced the node_lock with rcu_read_lock, but forgot
to remove the actual lock in the data structure. Remove it here.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

0956c798

Btrfs: leave spinning on lookup and map the leaf · d90c7321

Josef Bacik authored May 17, 2011

On lookup we only want to read the inode item, so leave the path spinning.  Also
we're just wholesale reading the leaf off, so map the leaf so we don't do a
bunch of kmap/kunmaps.  Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>

d90c7321

Btrfs: check for duplicate entries in the free space cache · 207dde82

Josef Bacik authored May 13, 2011

If there are duplicate entries in the free space cache, discard the entire cache
and load it the old fashioned way.  Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>

207dde82

Btrfs: don't try to allocate from a block group that doesn't have enough space · cca1c81f

Josef Bacik authored May 13, 2011

If we have a very large filesystem, we can spend a lot of time in
find_free_extent just trying to allocate from empty block groups.  So instead
check to see if the block group even has enough space for the allocation, and if
not go on to the next block group.
Signed-off-by: Josef Bacik <josef@redhat.com>

cca1c81f

Btrfs: don't always do readahead · 026fd317

Josef Bacik authored May 13, 2011

Our readahead is sort of sloppy, and really isn't always needed. For example if
ls is doing a stating ls (which is the default) it's going to stat in non-disk
order, so if say you have a directory with a stupid amount of files, readahead
is going to do nothing but waste time in the case of doing the stat. Taking the
unconditional readahead out made my test go from 57 minutes to 36 minutes. This
means that everywhere we do loop through the tree we want to make sure we do set
path->reada properly, so I went through and found all of the places where we
loop through the path and set reada to 1. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>

026fd317

Btrfs: try not to sleep as much when doing slow caching · 589d8ade

Josef Bacik authored May 11, 2011

When the fs is super full and we unmount the fs, we could get stuck in this
thing where unmount is waiting for the caching kthread to make progress and the
caching kthread keeps scheduling because we're in the middle of a commit. So
instead just let the caching kthread keep going and only yeild if
need_resched(). This makes my horrible umount case go from taking up to 10
minutes to taking less than 20 seconds. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>

589d8ade