- 28 Mar, 2011 14 commits
-
-
liubo authored
When we recover from crash via write-ahead log tree and process the inode refs, for each btrfs_inode_ref item, we will 1) check if we already have a perfect match in fs/file tree, if we have, then we're done. 2) search the corresponding back reference in fs/file tree, and check all the names in this back reference to see if they are also in the log to avoid conflict corners. 3) recover the logged inode refs to fs/file tree. In current btrfs, however, - for 2)'s check, once is enough, since the checked back reference will remain unchanged after processing all the inode refs belonged to the key. - it has no need to do another 1) between 2) and 3). I've made a small test to show how it improves, $dd if=/dev/zero of=foobar bs=4K count=1 $sync $make 100 hard links continuously, like ln foobar link_i $fsync foobar $echo b > /proc/sysrq-trigger after reboot $time mount DEV PATH without patch: real 0m0.285s user 0m0.001s sys 0m0.009s with patch: real 0m0.123s user 0m0.000s sys 0m0.010s Changelog v1->v2: - fix double free - pointed by David Sterba Changelog v2->v3: - adjust free order Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Li Dongyang authored
We take an free extent out from allocator, trim it, then put it back, but before we trim the block group, we should make sure the block group is cached, so plus a little change to make cache_block_group() run without a transaction. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Li Dongyang authored
Callers of btrfs_discard_extent() should check if we are mounted with -o discard, as we want to make fitrim to work even the fs is not mounted with -o discard. Also we should use REQ_DISCARD to map the free extent to get a full mapping, last we only return errors if 1. the error is not a EOPNOTSUPP 2. no device supports discard Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Li Dongyang authored
btrfs_map_block() will only return a single stripe length, but we want the full extent be mapped to each disk when we are trimming the extent, so we add length to btrfs_bio_stripe and fill it if we are mapping for REQ_DISCARD. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Li Dongyang authored
Make the function public as we should update the reserved extents calculations after taking out an extent for trimming. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Mark Fasheh authored
btrfs_link returns EPERM if a cross-subvolume link is attempted. However, in this case I believe EXDEV to be the more appropriate value. >From the link(2) man page: EXDEV oldpath and newpath are not on the same mounted file system. (Linux permits a file system to be mounted at multiple points, but link() does not work across different mount points, even if the same file system is mounted on both.) This matters because an application may have different behaviors based on return codes. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Liu Bo authored
Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to Chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic "FS_IOC_SETFLAGS" ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). v1->v2: - rebase to the latest btrfs. v2->v3: - fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW will be screwed by inheritance from parent directory. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
liubo authored
For datacow control, the corresponding inode flags are needed. This is for btrfs use. v1->v2: Change FS_COW_FL to another bit due to conflict with the upstream e2fsprogs Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Miao Xie authored
In the filesystem context, we must allocate memory by GFP_NOFS, or we may start another filesystem operation and make kswap thread hang up. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Tsutomu Itoh authored
This patch is checking return value of read_tree_block(), and if it is NULL, error processing. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
David Sterba authored
On Fri, Mar 18, 2011 at 11:56:53AM -0400, Chris Mason wrote: > Thanks for fielding this one. Does put_unaligned_le32 optimize away on > platforms with efficient access? It would be great if we didn't need > the #ifdef. (quicktest: assembly output is same for put_unaligned_le32 and direct assignment on my x86_64) I was originally following examples in Documentation/unaligned-memory-access.txt. From other code it seems to me that the define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is intended for larger portions of code. Macros/wrappers for {put,get}_unaligned* are chosen via arch/<arch>/include/asm/unaligned.h accordingly, therefore it's safe to use put_unaligned_le32 without the ifdef. dave Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Tsutomu Itoh authored
This patch changes some BUG_ON() to the error return. (but, most callers still use BUG_ON()) Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
liubo authored
Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
The pointer to the extent buffer for the root of each tree is protected by a spinlock so that we can safely read the pointer and take a reference on the extent buffer. But now that the extent buffers are freed via RCU, we can safely use rcu_read_lock instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 25 Mar, 2011 3 commits
-
-
Josef Bacik authored
I noticed that dio_end_io calls the appropriate endio function with an error, but the endio functions don't actually do anything with that error, they assume that if there was an error then the bio will not be uptodate. So if we had checksum failures we would never pass back EIO. So if there is an error in our endio functions make sure to clear the uptodate flag on the bio. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
When doing direct writes we store the checksums in the ordered sum stuff in the ordered extent for writing them when the write completes, so we don't even use the dip->csums array. So if we're writing, don't bother allocating dip->csums since we won't use it anyway. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
This patch makes the free space cluster refilling code a little easier to understand, and fixes some things with the bitmap part of it. Currently we either want to refill a cluster with 1) All normal extent entries (those without bitmaps) 2) A bitmap entry with enough space The current code has this ugly jump around logic that will first try and fill up the cluster with extent entries and then if it can't do that it will try and find a bitmap to use. So instead split this out into two functions, one that tries to find only normal entries, and one that tries to find bitmaps. This also fixes a suboptimal thing we would do with bitmaps. If we used a bitmap we would just tell the cluster that we were pointing at a bitmap and it would do the tree search in the block group for that entry every time we tried to make an allocation. Instead of doing that now we just add it to the clusters group. I tested this with my ENOSPC tests and xfstests and it survived. Signed-off-by: Josef Bacik <josef@redhat.com>
-
- 21 Mar, 2011 3 commits
-
-
Josef Bacik authored
We have been creating bitmaps for small extents unconditionally forever. This was great when testing to make sure the bitmap stuff was working, but is overkill normally. So instead of always adding small chunks of free space to bitmaps, only start doing it if we go past half of our extent threshold. This will keeps us from creating a bitmap for just one small free extent at the front of the block group, and will make the allocator a little faster as a result. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We do all this fun stuff with min_bytes, but either don't use it in the case of just normal extents, or use it completely wrong in the case of bitmaps. So fix this for both cases 1) In the extent case, stop looking for space with window_free >= min_bytes instead of bytes + empty_size. 2) In the bitmap case, we were looking for streches of free space that was at least min_bytes in size, which was not right at all. So instead search for stretches of free space that are at least bytes in size (this will make a difference when we have > page size blocks) and then only search for min_bytes amount of free space. Thanks, Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
The free space cluster stuff is heavy duty, so there is no sense in going through the entire song and dance if there isn't enough space in the block group to begin with. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
- 17 Mar, 2011 16 commits
-
-
Josef Bacik authored
We need to make sure the dir items we get are valid dir items. So any time we try and read one check it with verify_dir_item, which will do various sanity checks to make sure it looks sane. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Doing an audit of where we use btrfs_search_slot only showed one place where we don't check the return value of btrfs_search_slot properly. Just fix mark_extent_written to see if btrfs_search_slot failed and act accordingly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Currently if we have corrupted items things will blow up in spectacular ways. So as we read in blocks and they are leaves, check the entire leaf to make sure all of the items are correct and point to valid parts in the leaf for the item data the are responsible for. If the item is corrupt we will kick back EIO and not read any of the copies since they are likely to not be correct either. This will catch generic corruptions, it will be up to the individual callers of btrfs_search_slot to make sure their items are right. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Currently if we have corrupt metadata map_extent_buffer will complain about it, but not return an error so the caller has no idea a problem was hit. Fix this. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Everytime I have to deal with btrfs_cont_expand I stare at it for 20 minutes trying to remember what exactly it does and why the hell we need it. So add a comment to save future-Josef some time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Mark_inode_dirty will call btrfs_dirty_inode which will take care of updating the inode. This makes setsize a little cleaner since we don't have to start a transaction and update the inode in there, we can just call mark_inode_dirty. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We don't need an orphan item when expanding files, we just need them for truncating them, so only add the orphan item in btrfs_truncate instead of in btrfs_setsize. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
This fixes a problem where if truncate fails the inode will still be on the in memory orphan list. This is will make us complain when the inode gets destroyed because it's still on the orphan list. So if we fail just remove us from the in memory list and carry on. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
If we cannot truncate an inode for some reason we will never delete the orphan item associated with that inode, which means that we will loop forever in btrfs_orphan_cleanup. Instead of doing this just return error so we fail to mount. It sucks, but hey it's better than hanging. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Now that we can handle having errors in the truncate path lets make sure we return errors instead of doing BUG_ON() and such. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
->truncate() is going away, instead all of the work needs to be done in ->setattr(). So this converts us over to do this. It's fairly straightforward, just get rid of our .truncate inode operation and call btrfs_truncate() directly from btrfs_setsize. This works out better for us since truncate can technically return ENOSPC, and before we had no way of letting anybody know. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Since we alloc/free free space entries a whole lot, lets use a slab to keep track of them. This makes some of my tests slightly faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We track delayed allocation per inodes via 2 counters, one is outstanding_extents and reserved_extents. Outstanding_extents is already an atomic_t, but reserved_extents is not and is protected by a spinlock. So convert this to an atomic_t and instead of using a spinlock, use atomic_cmpxchg when releasing delalloc bytes. This makes our inode 72 bytes smaller, and reduces locking overhead (albiet it was minimal to begin with). Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Really we don't need to memset the pages array at all, since we know how many pages we're going to use in the array and pass that around. So don't memset, just trust we're not idiots and we pass num_pages around properly. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Our aio_write function is huge and kind of hard to follow at times. So this patch fixes this by breaking out the buffered and direct write paths out into seperate functions so it's a little clearer what's going on. I've also fixed some wrong typing that we had and added the ability to handle getting an error back from btrfs_set_extent_delalloc. Tested this with xfstests and everything came out fine. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Sorry, but these were bugging me. Just cleanup some of the formatting in file.c. Signed-off-by: Josef Bacik <josef@redhat.com>
-
- 15 Mar, 2011 1 commit
-
-
Linus Torvalds authored
-
- 14 Mar, 2011 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300Linus Torvalds authored
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300: MN10300: atomic_read() should ensure it emits a load MN10300: The SMP_ICACHE_INV_FLUSH_RANGE IPI command does not exist MN10300: Proper use of macros get_user() in the case of incremented pointers
-
git://git.linux-mips.org/pub/scm/upstream-linusLinus Torvalds authored
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (26 commits) MIPS: Alchemy: Fix reset for MTX-1 and XXS1500 MIPS: MTX-1: Make au1000_eth probe all PHY addresses MIPS: Jz4740: Add HAVE_CLK MIPS: Move idle task creation to work queue MIPS, Perf-events: Use unsigned delta for right shift in event update MIPS, Perf-events: Work with the new callchain interface MIPS, Perf-events: Fix event check in validate_event() MIPS, Perf-events: Work with the new PMU interface MIPS, Perf-events: Work with irq_work MIPS: Fix always CONFIG_LOONGSON_UART_BASE=y MIPS: Loongson: Fix potentially wrong string handling MIPS: Fix GCC-4.6 'set but not used' warning in arch/mips/mm/init.c MIPS: Fix GCC-4.6 'set but not used' warning in ieee754int.h MIPS: Remove unused code from arch/mips/kernel/syscall.c MIPS: Fix GCC-4.6 'set but not used' warning in signal*.c MIPS: MSP: Fix MSP71xx bpci interrupt handler return value MIPS: Select R4K timer lib for all MSP platforms MIPS: Loongson: Remove ad-hoc cmdline default MIPS: Clear the correct flag in sysmips(MIPS_FIXADE, ...). MIPS: Add an unreachable return statement to satisfy buggy GCCs. ...
-
Linus Torvalds authored
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: ce4100: Set pci ops via callback instead of module init x86/mm: Fix pgd_lock deadlock x86/mm: Handle mm_fault_error() in kernel space x86: Don't check for BIOS corruption in first 64K when there's no need to
-