- 26 Aug, 2015 1 commit
-
-
Chao Yu authored
This patch introduce a new helper f2fs_update_extent_tree_range which can do extent mapping update at a specified range. The main idea is: 1) punch all mapping info in extent node(s) which are at a specified range; 2) try to merge new extent mapping with adjacent node, or failing that, insert the mapping into extent tree as a new node. In order to see the benefit, I add a function for stating time stamping count as below: uint64_t rdtsc(void) { uint32_t lo, hi; __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); return (uint64_t)hi << 32 | lo; } My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd. truncation path: update extent cache from truncate_data_blocks_range non-truncataion path: update extent cache from other paths total: all update paths a) Removing 128MB file which has one extent node mapping whole range of file: 1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128 2. sync 3. rm /mnt/f2fs/128M Before: total count average truncation: 7651022 32768 233.49 Patched: total count average truncation: 3321 33 100.64 b) fsstress: fsstress -d /mnt/f2fs -l 5 -n 100 -p 20 Test times: 5 times. Before: total count average truncation: 5812480.6 20911.6 277.95 non-truncation: 7783845.6 13440.8 579.12 total: 13596326.2 34352.4 395.79 Patched: total count average truncation: 1281283.0 3041.6 421.25 non-truncation: 7355844.4 13662.8 538.38 total: 8637127.4 16704.4 517.06 1) For the updates in truncation path: - we can see updating in batches leads total tsc and update count reducing explicitly; - besides, for a single batched updating, punching multiple extent nodes in a loop, result in executing more operations, so our average tsc increase intensively. 2) For the updates in non-truncation path: - there is a little improvement, that is because for the scenario that we just need to update in the head or tail of extent node, new interface optimize to update info in extent node directly, rather than removing original extent node for updating and then inserting that updated one into cache as new node. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 24 Aug, 2015 6 commits
-
-
Chao Yu authored
In following call stack, if unfortunately we lose all chances to truncate inode page in remove_inode_page, eventually we will add the nid allocated previously into free nid cache, this nid is with NID_NEW status and with NEW_ADDR in its blkaddr pointer: - f2fs_create - f2fs_add_link - __f2fs_add_link - init_inode_metadata - new_inode_page - new_node_page - set_node_addr(, NEW_ADDR) - f2fs_init_acl failed - remove_inode_page failed - handle_failed_inode - remove_inode_page failed - iput - f2fs_evict_inode - remove_inode_page failed - alloc_nid_failed cache a nid with valid blkaddr: NEW_ADDR This may not only cause resource leak of previous inode, but also may cause incorrect use of the previous blkaddr which is located in NO.nid node entry when this nid is reused by others. This patch tries to add this inode to orphan list if we fail to truncate inode, so that we can obtain a second chance to release it in orphan recovery flow. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch fixes to return error number of f2fs_truncate, so that we can handle the error correctly in callers. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
When converting inline dentry, we will zero out target dentry page before duplicating data of inline dentry into target page, it become overhead since inline dentry size is not small. So this patch tries to remove unneeded initializing in the space of target dentry page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Zhang Zhen authored
According to commit 5f16f322 ("ext4: atomically set inode->i_flags in ext4_set_inode_flags()"). Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If we release the lock in list_for_each_entry_safe, we can lose the tmp pointer by alloc_nid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
__GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and bio_alloc. And, it also fixes the use cases of GFP_ATOMIC correctly. Suggested-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 22 Aug, 2015 9 commits
-
-
Chao Yu authored
In __lookup_extent_tree_ret we will not try to find neighbor nodes if we find the target node, in this condition, we will lost the chance to merge the new mapping with exist extent node later. So our extent cache of inode will be fragmented after overwrite exist file, we can see the number of extent node increases intensively in following test case: dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024 Extent Cache: - Hit Count: L1-1:0 L1-2:0 L2:0 - Hit Ratio: 0% (0 / 3072) - Inner Struct Count: tree: 1, node: 1 dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024 conv=notrunc Extent Cache: - Hit Count: L1-1:2048 L1-2:0 L2:0 - Hit Ratio: 33% (2048 / 6144) - Inner Struct Count: tree: 1, node: 961 This patch fixes to lookup neighbors of target node for further merging. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch splits __insert_extent_tree_ret into __try_merge_extent_node & __insert_extent_tree for code readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
After commit 0f825ee6 ("f2fs: add new interfaces for extent tree"), f2fs_init_extent_tree becomes the only caller of __insert_extent_tree, and in f2fs_init_extent_tree, we will only insert extent node in an empty tree, so __try_{back,front}_merge in __insert_extent_tree will never be called. This patch removes these dead codes, besides, rename __insert_extent_tree to __init_extent_tree for readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch alters to replace total hit stat with rbtree hit stat, and then adjust showing of extent cache stat: Hit Count: L1-1: for largest node hit count; L1-2: for last cached node hit count; L2: for extent node hit after lookuping in rbtree. Hit Ratio: ratio (hit count / total lookup count) Inner Struct Count: tree count, node count. Before: Extent Hit Ratio: 0 / 2 Extent Tree Count: 3 Extent Node Count: 2 Patched: Exten Cacache: - Hit Count: L1-1:4871 L1-2:2074 L2:208 - Hit Ratio: 1% (7153 / 550751) - Inner Struct Count: tree: 26560, node: 11824 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch adds to stat the hit count of largest/cached node for showing in debugfs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
The test step is like below: 1. touch file 2. truncate -s $((1024*1024)) file 3. fallocate -o 0 -l $((1024*1024)) file 4. fibmap.f2fs file Our result of fibmap.f2fs showed below is not correct: file_pos start_blk end_blk blks 0 -937166132 -937166132 1 4096 -937166132 -937166132 1 8192 -937166132 -937166132 1 12288 -937166132 -937166132 1 16384 -937166132 -937166132 1 20480 -937166132 -937166132 1 ... 1040384 -937166132 -937166132 1 1044480 -937166132 -937166132 1 This is because f2fs_map_blocks will return with no error when meeting a hole or preallocated block, the caller __get_data_block will map the uninitialized variable value to bh->b_blocknr. Unfortunately generic_block_bmap will neither check the return value of get_data() nor check mapping info of buffer_head, result in returning the random block address. After fixing the issue, our result shows correctly: file_pos start_blk end_blk blks 0 0 0 256 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Add annotation to let us know more clearly about space utilization information of regular dentry and inline dentry. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Fan Li authored
In f2fs_lookup_extent_tree, et->cached_en was read and updated with only read lock held, it could cause __lookup_extent_tree within return entirely wrong extent_node, if other thread update et->cached_en just before __lookup_extent_tree return. However, there are two things about this patch that need to be noticed: 1. It does no good to arrange the order of concurrent read/write, the result would still be random in such case. 2. It's built on this assumption: the mix up of reads and writes on a single pointer would not make the pointer partially wrong at any time. Please let me know if I'm wrong, thx. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Junesung Lee authored
Fix typo. Signed-off-by: Junesung Lee <junesoung412@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 20 Aug, 2015 13 commits
-
-
Jaegeuk Kim authored
This patch adds a routine which checks the block address of newly allocated nid. If an nid has already allocated by other thread due to subtle data races, it will result in filesystem corruption. So, it needs to check whether its block address was already allocated or not in prior to nid allocation as the last chance. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
We should not call unlock_new_inode when insert_inode_locked failed. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If FG_GC failed to reclaim one section, let's retry with another section from the start, since we can get anoterh good candidate. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
Previously, update_inode_page is not called under f2fs_lock_op. Instead we should call with f2fs_write_inode. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If we can reuse nids as many as possible, we can mitigate producing obsolete node pages in the page cache. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If node blocks were already moved, we don't need to move them again. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the same time. So, we can't guarantee that bio is allocated all the time. " * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be * able to allocate a bio. This is due to the mempool guarantees. To make this * work, callers must never allocate more than 1 bio at a time from this pool. * Callers that need to allocate more than 1 bio must always submit the * previously allocated bio for IO before attempting to allocate a new one. * Failure to do so can cause deadlocks under memory pressure. " Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch increases the number of maximum hard links for one file. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
We should avoid needless checkpoints when there is no dirty and prefree segment. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch introduces __count_free_nids/try_to_free_nids and registers them in slab shrinker for shrinking under memory pressure. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
In f2fs_delete_entry, if last dirent is remove from the dentry page, we will try to punch that page since it has no valid date in it. But truncate_hole which is used for punching could fail because of no memory or IO error, if that happened, we'd better skip clearing this valid dentry page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
I volunteer to be a dedicated reviewer of f2fs, add my email address in maintainship entry of f2fs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
We should not write node pages when deleting orphan inodes. In order to do that, we can eaisly set POR_DOING flag earlier before entering orphan inode routine. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 14 Aug, 2015 3 commits
-
-
Jaegeuk Kim authored
If F2FS_CHECK_FS is turned off, we can get a build warning for unused variable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
In recover_orphan_inode, whenever f2fs_iget fail, we will make kernel panic, but it's not reasonable, because f2fs_iget can fail due to a lot of reasons including out of memory. So we change error handling method as below: a) when finding no entry for the orphan inode, bug_on for catching bugs; b) for other reasons, report it to caller. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If there is not enough free segment, we should not assign a new segment explicitly. Otherwise, we can run out of free segment. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 11 Aug, 2015 2 commits
-
-
Chao Yu authored
Previously, we use radix tree to index all registered page entries for atomic file, but now we only use radix tree to see whether current page is indexed or not, since the other user of radix tree is gone in commit 042b7816 ("f2fs: remove unnecessary call to invalidate inmemory pages"). So in this patch, we try to use one more efficient way: Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private value to indicate page indexing status. By using this way, we can save memory and lookup time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in detail is as fallow: dio04 <<<test_start>>> tag=dio04 stime=1432278894 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write diotest4 5 TPASS : Read beyond the file size ...... the result of ext4 with same environment: dio04 <<<test_start>>> tag=dio04 stime=1432259643 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TPASS : Odd count of read and write diotest4 4 TPASS : Read beyond the file size ...... The reason is that when triggering DIO in f2fs, we will return zero value in ->direct_IO if writer's buffer offset, file offset and transfer size is not alignment to block size of filesystem, resulting in falling back into buffered write instead of returning -EINVAL. This patch fixes that problem by returning correct error number for above case, and removing the judgement condition in check_direct_IO to make sure the verification will be enabled for direct reader too. Besides, Jaegeuk Kim pointed out that there is expectional cases we should always make direct-io falling back into buffered write, such as dio in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Chao Yu make small change and add detail description in commit message] Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 10 Aug, 2015 1 commit
-
-
Chao Yu authored
fill_zero can fail due to a lot of reason, but previously we do not handle its return value, so its callers such as punch_hole/f2fs_zero_range may report success, but actually can fail because of error occurs inside fill_zero. This patch fixes to report correct return value of fill_zero. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 06 Aug, 2015 2 commits
-
-
Chao Yu authored
When testing with generic/101 in xfstests, error message outputed as below: --- tests/generic/101.out +++ results//generic/101.out.bad @@ -10,10 +10,14 @@ File foo content after log replay: 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * -0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb * 0372000 ... (Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff) The test flow is like below: 1. pwrite foo -S 0xaa 0 64K 2. pwrite foo -S 0xbb 64K 61K 3. sync 4. truncate foo 64K 5. truncate foo 125K 6. fsync foo 7. flakey drop writes 8. umount After this test, we expect the data of recovered file will have the first 64k of data filling with value 0xaa and the next 61k of data filling with value 0x00 because we have fsynced it before dropping writes in dm. In f2fs, during recovering, we will only recover the valid block address in direct node page if it is marked as a fsynced dnode, but block address which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be recovered. So, the file recovered shows its incorrect data 0xbb in range of [61k, 125k]. In this patch, we fix to recover invalid/reserved block during recover flow. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Fan Li authored
In some cases, we only need the block address when we call f2fs_reserve_block, other fields of struct dnode_of_data aren't necessary. We can try extent cache first for such cases in order to speed up the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 05 Aug, 2015 3 commits
-
-
Chao Yu authored
To avoid meeting garbage data in next free node block at the end of warm node chain when doing recovery, we will try to zero out that invalid block. If the device is not support discard, our way for zeroing out block is: grabbing a temporary zeroed page in meta inode, then, issue write request with this page. But, we forget to release that temporary page, so our memory usage will increase without gaining any hit ratio benefit, so it's better to free it for saving memory. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
In following call path, we will pass a locked and referenced ipage pointer to get_new_data_page: - init_inode_metadata - make_empty_dir - get_new_data_page There are two exit paths in get_new_data_page when error occurs: 1) grab_cache_page fails, ipage will not be released; 2) f2fs_reserve_block fails, ipage will be released in callee. So, it's not consistent for error handling in get_new_data_page. For f2fs_reserve_block, it's not very easy to change the rule of error handling, since it's already complicated. Here we deside to choose an easy way to fix this issue: If any error occur in get_new_data_page, we will ensure releasing ipage in this function. The same issue is in f2fs_convert_inline_dir, fix that too. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Liu Xue authored
Replace BUG_ON with f2fs_bug_on to deal with block and segment validity check failed. Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-