Commits · fa528722d06ecbee9d918b9eec58c5d4c2978839 · Kirill Smelkov / linux

04 Nov, 2014 13 commits

f2fs: remove the redundant function cond_clear_inode_flag · fa528722

Gu Zheng authored 10 years ago


Use clear_inode_flag to replace the redundant cond_clear_inode_flag.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

fa528722

f2fs: reuse make_empty_dir code for inline_dentry · 062a3e7b

Jaegeuk Kim authored 10 years ago


This patch introduces do_make_empty_dir to mitigate code redundancy
for inline_dentry.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

062a3e7b

f2fs: introduce f2fs_dentry_ptr structure for code clean-up · 7b3cd7d6

Jaegeuk Kim authored 10 years ago


This patch introduces f2fs_dentry_ptr structure for the use of a function
parameter in inline_dentry operations.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

7b3cd7d6

f2fs: reuse core function in f2fs_readdir for inline_dentry · 38594de7

Jaegeuk Kim authored 10 years ago


This patch introduces a core function, f2fs_fill_dentries, to remove
redundant code in f2fs_readdir and f2fs_read_inline_dir.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

38594de7

f2fs: add stat info for inline_dentry inodes · 3289c061

Jaegeuk Kim authored 10 years ago


This patch adds status information for inline_dentry inodes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3289c061

f2fs: avoid deadlock on init_inode_metadata · bce8d112

Jaegeuk Kim authored 10 years ago


Previously, init_inode_metadata does not hold any parent directory's inode
page. So, f2fs_init_acl can grab its parent inode page without any problem.
But, when we use inline_dentry, that page is grabbed during f2fs_add_link,
so that we can fall into deadlock condition like below.

INFO: task mknod:11006 blocked for more than 120 seconds.
      Tainted: G           OE  3.17.0-rc1+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mknod           D ffff88003fc94580     0 11006  11004 0x00000000
 ffff880007717b10 0000000000000002 ffff88003c323220 ffff880007717fd8
 0000000000014580 0000000000014580 ffff88003daecb30 ffff88003c323220
 ffff88003fc94e80 ffff88003ffbb4e8 ffff880007717ba0 0000000000000002
Call Trace:
 [<ffffffff8173dc40>] ? bit_wait+0x50/0x50
 [<ffffffff8173d4cd>] io_schedule+0x9d/0x130
 [<ffffffff8173dc6c>] bit_wait_io+0x2c/0x50
 [<ffffffff8173da3b>] __wait_on_bit_lock+0x4b/0xb0
 [<ffffffff811640a7>] __lock_page+0x67/0x70
 [<ffffffff810acf50>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811652cc>] pagecache_get_page+0x14c/0x1e0
 [<ffffffffa029afa9>] get_node_page+0x59/0x130 [f2fs]
 [<ffffffffa02a63ad>] read_all_xattrs+0x24d/0x430 [f2fs]
 [<ffffffffa02a6ca2>] f2fs_getxattr+0x52/0xe0 [f2fs]
 [<ffffffffa02a7481>] f2fs_get_acl+0x41/0x2d0 [f2fs]
 [<ffffffff8122d847>] get_acl+0x47/0x70
 [<ffffffff8122db5a>] posix_acl_create+0x5a/0x150
 [<ffffffffa02a7759>] f2fs_init_acl+0x29/0xcb [f2fs]
 [<ffffffffa0286a8d>] init_inode_metadata+0x5d/0x340 [f2fs]
 [<ffffffffa029253a>] f2fs_add_inline_entry+0x12a/0x2e0 [f2fs]
 [<ffffffffa0286ea5>] __f2fs_add_link+0x45/0x4a0 [f2fs]
 [<ffffffffa028b5b6>] ? f2fs_new_inode+0x146/0x220 [f2fs]
 [<ffffffffa028b816>] f2fs_mknod+0x86/0xf0 [f2fs]
 [<ffffffff811e3ec1>] vfs_mknod+0xe1/0x160
 [<ffffffff811e4b26>] SyS_mknod+0x1f6/0x200
 [<ffffffff81741d7f>] tracesys+0xe1/0xe6
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

bce8d112

f2fs: reuse find_in_block code for find_in_inline_dir · 4e6ebf6d

Jaegeuk Kim authored 10 years ago


This patch removes redundant copied code in find_in_inline_dir.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

4e6ebf6d

f2fs: reuse room_for_filename for inline dentry operation · a82afa20

Jaegeuk Kim authored 10 years ago


This patch introduces to reuse the existing room_for_filename for inline dentry
operation.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

a82afa20

f2fs: add key function to handle inline dir · 201a05be

Chao Yu authored 10 years ago


Adds Functions to implement inline dir init/lookup/insert/delete/convert ops.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove needless reserved area copy, pointed by Dan Carpenter]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

201a05be

f2fs: export dir operations for inline dir · dbeacf02

Chao Yu authored 10 years ago

This patch exports some dir operations for inline dir, additionally introduces
f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

dbeacf02

f2fs: add infra struct and helper for inline dir · 34d67deb

Chao Yu authored 10 years ago


This patch defines macro/inline dentry structure, and adds some helpers for
inline dir infrastructure.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

34d67deb

f2fs: invalidate inmemory page · cbcb2872

Jaegeuk Kim authored 10 years ago


If user truncates file's data, we should truncate inmemory pages too.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

cbcb2872

f2fs: do not make dirty any inmemory pages · 34ba94ba

Jaegeuk Kim authored 10 years ago


This patch let inmemory pages be clean all the time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

34ba94ba

07 Oct, 2014 2 commits

f2fs: support volatile operations for transient data · 02a1335f

Jaegeuk Kim authored 10 years ago


This patch adds support for volatile writes which keep data pages in memory
until f2fs_evict_inode is called by iput.

For instance, we can use this feature for the sqlite database as follows.
While supporting atomic writes for main database file, we can keep its journal
data temporarily in the page cache by the following sequence.

1. open
 -> ioctl(F2FS_IOC_START_VOLATILE_WRITE);
2. writes
 : keep all the data in the page cache.
3. flush to the database file with atomic writes
  a. ioctl(F2FS_IOC_START_ATOMIC_WRITE);
  b. writes
  c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
4. close
 -> drop the cached data
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

02a1335f

f2fs: support atomic writes · 88b88a66

Jaegeuk Kim authored 10 years ago


This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
 o F2FS_IOC_START_ATOMIC_WRITE
 o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
 -> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
  : all the written data will be treated as atomic pages.
3. commit
 -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
  : this flushes all the data blocks to the disk, which will be shown all or
  nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

  ,- START_ATOMIC_WRITE                  ,- COMMIT_ATOMIC_WRITE
 CP | D D D D D D | FSYNC | D D D D | FSYNC ...
                      `- COMMIT_ATOMIC_WRITE
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

88b88a66

30 Sep, 2014 4 commits

f2fs: call f2fs_unlock_op after error was handled · 44c16156

Jaegeuk Kim authored 10 years ago


This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

44c16156

f2fs: refactor flush_nat_entries to remove costly reorganizing ops · 309cc2b6

Jaegeuk Kim authored 10 years ago

Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

309cc2b6

f2fs: introduce FITRIM in f2fs_ioctl · 4b2fecc8

Jaegeuk Kim authored 10 years ago


This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

4b2fecc8

f2fs: introduce cp_control structure · 75ab4cb8

Jaegeuk Kim authored 10 years ago


This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

75ab4cb8

23 Sep, 2014 3 commits

f2fs: remove redundant operation during roll-forward recovery · c52e1b10

Jaegeuk Kim authored 10 years ago


If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c52e1b10

f2fs: fix conditions to remain recovery information in f2fs_sync_file · 88bd02c9

Jaegeuk Kim authored 10 years ago


This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED,	/* is it checkpointed before? */
b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

   inode(x) | CP | dnode(F) | inode(x) | inode(F)
a)    x       o      o          o          o
b)    x       x      x          x          o
c)    x       o      o          x          o

If f2fs_sync_file stops   ------^,
 it should write inode(F)    --------------^

So, the need_inode_block_update should return true, since
 c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
      CP | alloc | dnode(F) | inode(x) | inode(DF)
a)    o      x        x          x          x
b)    x               x          x          o
c)    o               o          x          o

If f2fs_sync_file stops   -------^,
 it should write inode(DF)    --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

88bd02c9

f2fs: use meta_inode cache to improve roll-forward speed · 4c521f49

Jaegeuk Kim authored 10 years ago


Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

4c521f49

16 Sep, 2014 2 commits

f2fs: give an option to enable in-place-updates during fsync to users · c1ce1b02

Jaegeuk Kim authored 10 years ago


If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c1ce1b02

f2fs: expand counting dirty pages in the inode page cache · a7ffdbe2

Jaegeuk Kim authored 10 years ago


Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

a7ffdbe2

09 Sep, 2014 4 commits

f2fs: use lock-less list(llist) to simplify the flush cmd management · 721bd4d5

Gu Zheng authored 10 years ago


We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

721bd4d5

f2fs: refactor flush_sit_entries codes for reducing SIT writes · 184a5cd2

Chao Yu authored 10 years ago

In commit aec71382

 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
		sit page num	cp count	sit pages/cp
based		2006.50		1349.75		1.486
patched		1566.25		1463.25		1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns)	dirty sit count
36038		2151
49168		2123
37174		2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

184a5cd2

f2fs: need fsck.f2fs when f2fs_bug_on is triggered · 9850cf4a

Jaegeuk Kim authored 10 years ago


If any f2fs_bug_on is triggered, fsck.f2fs is needed.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9850cf4a

f2fs: retain inconsistency information to initiate fsck.f2fs · 2ae4c673

Jaegeuk Kim authored 10 years ago


This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

2ae4c673

04 Sep, 2014 1 commit

f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB · 4081363f

Jaegeuk Kim authored 10 years ago


This patch adds three inline functions to clean up dirty casting codes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

4081363f

21 Aug, 2014 5 commits

f2fs: remove rewrite_node_page · 202095a7

Jaegeuk Kim authored 10 years ago


I think we need to let the dirty node pages remain in the page cache instead
of rewriting them in their places.
So, after done with successful recovery, write_checkpoint will flush all of them
through the normal write path.
Through this, we can avoid potential error cases in terms of block allocation.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

202095a7

f2fs: avoid double lock in truncate_blocks · 764aa3e9

Jaegeuk Kim authored 10 years ago


The init_inode_metadata calls truncate_blocks when error is occurred.
The callers holds f2fs_lock_op, so we should not call it again in
truncate_blocks.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

764aa3e9

f2fs: add WARN_ON in f2fs_bug_on · b3fe0a0d

Jaegeuk Kim authored 10 years ago


This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

b3fe0a0d

f2fs: introduce f2fs_cp_error for readability · 1e968fdf

Jaegeuk Kim authored 10 years ago


This patch adds f2fs_cp_error for readability.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

1e968fdf

f2fs: trigger release_dirty_inode in f2fs_put_super · 6f12ac25

Jaegeuk Kim authored 10 years ago


The generic_shutdown_super calls sync_filesystem, evict_inode, and then
f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information
so we should release them at f2fs_put_super.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

6f12ac25

19 Aug, 2014 4 commits

f2fs: fix to recover inline_xattr/data and blocks · 1c35a90e

Jaegeuk Kim authored 10 years ago


This patch fixes not to skip xattr recovery and inline xattr/data recovery
order.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

1c35a90e

f2fs: make clear on test condition and return types · 0342fd30

Jaegeuk Kim authored 10 years ago


This patch adds a parentheses to make clear for condition check.
And also it changes the return type for better meanings.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

0342fd30

f2fs: should convert inline_data during the mkwrite · b067ba1f

Jaegeuk Kim authored 10 years ago

If mkwrite is called to an inode having inline_data, it can overwrite the data
index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero)
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

b067ba1f

f2fs: fix typo · e1c42045

arter97 authored 10 years ago


Fix typo and some grammatical errors.

The words "filesystem" and "readahead" are being used without the space treewide.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

e1c42045

02 Aug, 2014 1 commit

f2fs: avoid skipping recover_inline_xattr after recover_inline_data · 70cfed88

Chao Yu authored 10 years ago


When we recover data of inode in roll-forward procedure, and the inode has both
inline data and inline xattr. We may skip recovering inline xattr if we recover
inline data form node page first.
This patch will fix the problem that we lost inline xattr data in above
scenario.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

70cfed88

31 Jul, 2014 1 commit

f2fs: reduce competition among node page writes · b3582c68

Chao Yu authored 10 years ago


We do not need to block on ->node_write among different node page writers e.g.
fsync/flush, unless we have a node page writer from write_checkpoint.
So it's better use rw_semaphore instead of mutex type for ->node_write to
promote performance.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

b3582c68