- 04 Aug, 2014 1 commit
-
-
Chao Yu authored
When inode is evicted, all the page cache belong to this inode should be released including the xattr node page. But previously we didn't do this, this patch fixed this issue. v2: o reposition invalidate_mapping_pages() to the right place suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 02 Aug, 2014 2 commits
-
-
Chao Yu authored
When we recover data of inode in roll-forward procedure, and the inode has both inline data and inline xattr. We may skip recovering inline xattr if we recover inline data form node page first. This patch will fix the problem that we lost inline xattr data in above scenario. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch adds a tracepoint for f2fs_direct_IO. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 31 Jul, 2014 2 commits
-
-
Chao Yu authored
We do not need to block on ->node_write among different node page writers e.g. fsync/flush, unless we have a node page writer from write_checkpoint. So it's better use rw_semaphore instead of mutex type for ->node_write to promote performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch fixes wrong coding style. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 30 Jul, 2014 7 commits
-
-
Dongho Sim authored
There are redundant lines in allocate_data_block. In this function, we call refresh_sit_entry with old seg and old curseg. After that, we call locate_dirty_segment with old curseg. But, the new address is always allocated from old curseg and we call locate_dirty_segment with old curseg in refresh_sit_entry. So, we do not need to call locate_dirty_segment with old curseg again. We've discussed like below: Jaegeuk said: "When considering SSR, we need to take care of the following scenario. - old segno : X - new address : Z - old curseg : Y This means, a new block is supposed to be written to Z from X. And Z is newly allocated in the same path from Y. In that case, we should trigger locate_dirty_segment for Y, since it was a current_segment and can be dirty owing to SSR. But that was not included in the dirty list." Changman said: "We already choosed old curseg(Y) and then we allocate new address(Z) from old curseg(Y). After that we call refresh_sit_entry(old address, new address). In the funcation, we call locate_dirty_segment with old seg and old curseg. So calling locate_dirty_segment after refresh_sit_entry again is redundant." Jaegeuk said: "Right. The new address is always allocated from old_curseg." Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Dongho Sim <dh.sim@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch adds a tracepoint for f2fs_issue_flush. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch eliminates the propagation of recovery errors to the next mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If the bit is already set, we don't need to reset it, and vice versa. Because we don't need to make the caches dirty for that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch fixes the wrongly used unlikely condition. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch intends to improve the fsync performance by skipping remaining the recovery information, only when there is no data that we should recover. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 29 Jul, 2014 5 commits
-
-
Jaegeuk Kim authored
This patch introduces a inode number list in which represents inodes having appended data writes or updated data writes after last checkpoint. This will be used at fsync to determine whether the recovery information should be written or not. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
For better ino management, this patch replaces the data structure from list to radix tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch changes the naming of orphan-related data structures to use as inode numbers managed globally. Later, we can use this facility for managing any inode number lists. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch punches out the core functions to manage the inode numbers. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch adds a mount option, nobarrier, in f2fs. The assumption in here is that file system keeps the IO ordering, but doesn't care about cache flushes inside the storages. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 25 Jul, 2014 4 commits
-
-
Chao Yu authored
We should put root inode correctly in error path of fill_super, otherwise we may encounter a leak case of inode resource. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Andrey Tsyvarev reported: "Using memory error detector reveals the following use-after-free error in 3.15.0: AddressSanitizer: heap-use-after-free in f2fs_evict_inode Read of size 8 by thread T22279: [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs] [<ffffffff812359af>] evict+0x15f/0x290 [< inlined >] iput+0x196/0x280 iput_final [<ffffffff812369a6>] iput+0x196/0x280 [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs] [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0 [<ffffffff812105fd>] kill_block_super+0x4d/0xb0 [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80 [<ffffffff81211c98>] deactivate_super+0x68/0x80 [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250 [< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0 [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b Freed by thread T3: [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs] [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim [< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch [< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930 [<ffffffff8107cce2>] __do_softirq+0x142/0x380 [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50 [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280 [<ffffffff810a8238>] kthread+0x148/0x160 [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0 Allocated by thread T22276: [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs] [<ffffffff81235e2a>] iget_locked+0x10a/0x230 [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs] [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs] [<ffffffff81211bce>] mount_bdev+0x1de/0x240 [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs] [<ffffffff81212a85>] mount_fs+0x55/0x220 [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200 [< inlined >] do_mount+0x2b4/0x1120 do_new_mount [<ffffffff812400d4>] do_mount+0x2b4/0x1120 [< inlined >] SyS_mount+0xb2/0x110 SYSC_mount [<ffffffff812414a2>] SyS_mount+0xb2/0x110 [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b The buggy address ffff8800587866c8 is located 48 bytes inside of 680-byte region [ffff880058786698, ffff880058786940) Memory state around the buggy address: ffff880058786100: ffffffff ffffffff ffffffff ffffffff ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff ffff880058786400: ffffffff ffffffff ffffffff ffffffff ffff880058786500: ffffffff ffffffff ffffffff fffffffr >ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff ^ ffff880058786700: ffffffff ffffffff ffffffff ffffffff ffff880058786800: ffffffff ffffffff ffffffff ffffffff ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr.... ffff880058786a00: ........ ........ ........ ........ ffff880058786b00: ........ ........ ........ ........ Legend: f - 8 freed bytes r - 8 redzone bytes . - 8 allocated bytes x=1..7 - x allocated bytes + (8-x) redzone bytes Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned." Nids for both meta_inode and node_inode are reservation, so it's not necessary for us to invalidate pages which will never be allocated. To fix this issue, let's skipping needlessly invalidating pages for {meta,node}_inode in f2fs_evict_inode. Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Now new interface ->rename2() is added to VFS, here are related description: https://lkml.org/lkml/2014/2/7/873 https://lkml.org/lkml/2014/2/7/758 This patch adds function f2fs_rename2() to support ->rename2() including handling both RENAME_EXCHANGE and RENAME_NOREPLACE flag. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Huang Ying authored
Otherwise, if a large amount of direct IO writes were done, the segment allocation may be failed because no enough segments are gced. Changes: v2: add f2fs_balance_fs into __get_data_block instead of f2fs_direct_IO. Signed-off-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 15 Jul, 2014 1 commit
-
-
Chao Yu authored
In __set_test_and_free we will check whether all segment are free in one section When free one segment, in order to set section to free status. But the searching region of segmap is from start segno to last segno of f2fs, it's not necessary. So let's just only check all segment bitmap of target section. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 11 Jul, 2014 3 commits
-
-
Gu Zheng authored
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Gu Zheng authored
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
We assume that modification of some special application could result in zeroed name_len, or it is consciously made by somebody. We will deadloop in find_in_block when name_len of dir entry is zero. This patch is added for preventing deadloop in above scenario. change log from v1: o use f2fs_bug_on rather than break out from searching dir entry suggested by Jaegeuk Kim. Jaegeuk describe: "Well, IMO, it would be good to add f2fs_bug_on() here with a specific comment. In the current phase of f2fs, it is more important to investigate the file system bugs, rather than workarounds for any corrupted images. And, definitely it needs to stop the kernel if any corrupted image was mounted, so that we can figure out where the bugs are occurred." Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 09 Jul, 2014 15 commits
-
-
Chao Yu authored
In this patch we use below inner macro and function to clean up codes. 1. ADDRS_PER_PAGE 2. SM_I 3. f2fs_readonly Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
When we fail in ->write_begin()/->direct_IO(), our allocated node block in disk and page cache are still kept, despite these may not be used again. This patch introduce f2fs_write_failed() to handle the error case of these two interfaces, it will truncate page cache and blocks of this file according to i_size. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Gu Zheng authored
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Gu Zheng authored
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Gu Zheng authored
kernel side(xx_init_acl), the acl is get/cloned from the parent dir's, which is credible. So remove the redundant validation check of acl here. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
In our rename process, region of f2fs_lock_op covered is too big as some of the code like f2fs_empty_dir/f2fs_find_entry are not needed to protect by this lock. So in the extreme case like doing checkpoint when we rename old inode to exist inode in a large directory could cause lower concurrency. Let's reduce the region of f2fs_lock_op to fix this. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Fabian Frederick authored
kcalloc manages count*sizeof overflow. Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time. In this patch we merge dirty entries located in same NAT block to nat entry set, and linked all set to list, sorted ascending order by entries' count of set. Later we flush entries in sparse set into journal as many as we can, and then flush merged entries to disk. In this way we can not only gain in performance, but also save lifetime of flash device. In my testing environment, it shows this patch can help to reduce NAT block writes obviously. In hard disk test case: cost time of fsstress is stablely reduced by about 5%. 1. virtual machine + hard disk: fsstress -p 20 -n 200 -l 5 node num cp count nodes/cp based 4599.6 1803.0 2.551 patched 2714.6 1829.6 1.483 2. virtual machine + 32g micro SD card: fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0 -f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5 -f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S node num cp count nodes/cp based 84.5 43.7 1.933 patched 49.2 40.0 1.23 Our latency of merging op shows not bad when handling extreme case like: merging a great number of dirty nats: latency(ns) dirty nat count 3089219 24922 5129423 27422 4000250 24523 change log from v1: o fix wrong logic in add_nat_entry when grab a new nat entry set. o swith to create slab cache in create_node_manager_caches. o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency. change log from v2: o make comment position more appropriate suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch cleans up simple unnecessary codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata flow. Throught this, we can provide the consistent lock usage, e.g., fi->i_sem, and this will enable better debugging stuffs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Add function f2fs_tmpfile() to support O_TMPFILE file creation, and modify logic of init_inode_metadata to enable linkat temp file. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
After we call find_data_page in truncate_partial_data_page, we could not guarantee this page is updated or not as error may occurred in lower layer. We'd better check status of the page to avoid this no updated page be writebacked to device. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
We have already set page update in ->write_begin, so we should remove redundant SetPageUptodate in ->write_end. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fsLinus Torvalds authored
Pull f2fs bugfixes from Jaegeuk Kim: "This includes a couple of bug fixes found by xfstests. In addition, one critical bug was reported by Brian Chadwick, which is falling into the infinite loop in balance_dirty_pages. And it turned out due to the IO merging policy in f2fs, which was newly merged in 3.16. - fix normal and recovery path for fallocated regions - fix error case mishandling - recover renamed fsync inodes correctly - fix to get out of infinite loops in balance_dirty_pages - fix kernel NULL pointer error" * tag 'f2fs-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: avoid to access NULL pointer in issue_flush_thread f2fs: check bdi->dirty_exceeded when trying to skip data writes f2fs: do checkpoint for the renamed inode f2fs: release new entry page correctly in error path of f2fs_rename f2fs: fix error path in init_inode_metadata f2fs: check lower bound nid value in check_nid_range f2fs: remove unused variables in f2fs_sm_info f2fs: fix not to allocate unnecessary blocks during fallocate f2fs: recover fallocated data and its i_size together f2fs: fix to report newly allocate region as extent
-
Chao Yu authored
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=75861 Denis 2014-05-10 11:28:59 UTC reported: "F2FS-fs (mmcblk0p28): mounting.. Unable to handle kernel NULL pointer dereference at virtual address 00000018 ... [<c0a2f678>] (_raw_spin_lock+0x3c/0x70) from [<c03a0330>] (issue_flush_thread+0x50/0x17c) [<c03a0330>] (issue_flush_thread+0x50/0x17c) from [<c01b4064>] (kthread+0x98/0xa4) [<c01b4064>] (kthread+0x98/0xa4) from [<c0108060>] (kernel_thread_exit+0x0/0x8)" This patch assign cmd_control_info in sm_info before issue_flush_thread is being created, so this make sure that issue flush thread will have no chance to access invalid info in fcc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-