1. 18 Mar, 2016 1 commit
    • Jaegeuk Kim's avatar
      fs crypto: move per-file encryption from f2fs tree to fs/crypto · 0b81d077
      Jaegeuk Kim authored
      This patch adds the renamed functions moved from the f2fs crypto files.
      
      1. definitions for per-file encryption used by ext4 and f2fs.
      
      2. crypto.c for encrypt/decrypt functions
       a. IO preparation:
        - fscrypt_get_ctx / fscrypt_release_ctx
       b. before IOs:
        - fscrypt_encrypt_page
        - fscrypt_decrypt_page
        - fscrypt_zeroout_range
       c. after IOs:
        - fscrypt_decrypt_bio_pages
        - fscrypt_pullback_bio_page
        - fscrypt_restore_control_page
      
      3. policy.c supporting context management.
       a. For ioctls:
        - fscrypt_process_policy
        - fscrypt_get_policy
       b. For context permission
        - fscrypt_has_permitted_context
        - fscrypt_inherit_context
      
      4. keyinfo.c to handle permissions
        - fscrypt_get_encryption_info
        - fscrypt_free_encryption_info
      
      5. fname.c to support filename encryption
       a. general wrapper functions
        - fscrypt_fname_disk_to_usr
        - fscrypt_fname_usr_to_disk
        - fscrypt_setup_filename
        - fscrypt_free_filename
      
       b. specific filename handling functions
        - fscrypt_fname_alloc_buffer
        - fscrypt_fname_free_buffer
      
      6. Makefile and Kconfig
      
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: default avatarMichael Halcrow <mhalcrow@google.com>
      Signed-off-by: default avatarIldar Muslukhov <ildarm@google.com>
      Signed-off-by: default avatarUday Savagaonkar <savagaon@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0b81d077
  2. 02 Mar, 2016 2 commits
    • Yang Shi's avatar
      f2fs: mutex can't be used by down_write_nest_lock() · 59692b7c
      Yang Shi authored
      f2fs_lock_all() calls down_write_nest_lock() to acquire a rw_sem and check
      a mutex, but down_write_nest_lock() is designed for two rw_sem accoring to the
      comment in include/linux/rwsem.h. And, other than f2fs, it is just called in
      mm/mmap.c with two rwsem.
      
      So, it looks it is used wrongly by f2fs. And, it causes the below compile
      warning on -rt kernel too.
      
      In file included from fs/f2fs/xattr.c:25:0:
      fs/f2fs/f2fs.h: In function 'f2fs_lock_all':
      fs/f2fs/f2fs.h:962:34: warning: passing argument 2 of 'down_write_nest_lock' from incompatible pointer type [-Wincompatible-pointer-types]
        f2fs_down_write(&sbi->cp_rwsem, &sbi->cp_mutex);
                                        ^
      fs/f2fs/f2fs.h:27:55: note: in definition of macro 'f2fs_down_write'
       #define f2fs_down_write(x, y) down_write_nest_lock(x, y)
                                                             ^
      In file included from include/linux/rwsem.h:22:0,
                       from fs/f2fs/xattr.c:21:
      include/linux/rwsem_rt.h:138:20: note: expected 'struct rw_semaphore *' but argument is of type 'struct mutex *'
       static inline void down_write_nest_lock(struct rw_semaphore *sem,
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      59692b7c
    • Liu Xue's avatar
      f2fs: recovery missing dot dentries in root directory · 8c2b1435
      Liu Xue authored
      If f2fs was corrupted with missing dot dentries in root dirctory,
      it needs to recover them after fsck.f2fs set F2FS_INLINE_DOTS flag
      in directory inode when fsck.f2fs detects missing dot dentries.
      Signed-off-by: default avatarXue Liu <liuxueliu.liu@huawei.com>
      Signed-off-by: default avatarYong Sheng <shengyong1@huawei.com>
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8c2b1435
  3. 26 Feb, 2016 5 commits
    • Chao Yu's avatar
      f2fs: fix to avoid deadlock when merging inline data · 19c7377b
      Chao Yu authored
      When testing with fsstress, kworker and user threads were both blocked:
      
      INFO: task kworker/u16:1:16580 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u16:1   D ffff8803f2595390     0 16580      2 0x00000000
      Workqueue: writeback bdi_writeback_workfn (flush-251:0)
       ffff8802730e5760 0000000000000046 ffff880274729fc0 0000000000012440
       ffff8802730e5fd8 ffff8802730e4010 0000000000012440 0000000000012440
       ffff8802730e5fd8 0000000000012440 ffff880274729fc0 ffff88026eb50000
      Call Trace:
       [<ffffffff816fe9d9>] schedule+0x29/0x70
       [<ffffffff816ff895>] rwsem_down_read_failed+0xa5/0xf9
       [<ffffffff81378584>] call_rwsem_down_read_failed+0x14/0x30
       [<ffffffffa0694feb>] f2fs_write_data_page+0x31b/0x420 [f2fs]
       [<ffffffffa0690f1a>] __f2fs_writepage+0x1a/0x50 [f2fs]
       [<ffffffffa06922a0>] f2fs_write_data_pages+0xe0/0x290 [f2fs]
       [<ffffffff811473b3>] do_writepages+0x23/0x40
       [<ffffffff811cc3ee>] __writeback_single_inode+0x4e/0x250
       [<ffffffff811cd4f1>] writeback_sb_inodes+0x2c1/0x470
       [<ffffffff811cd73e>] __writeback_inodes_wb+0x9e/0xd0
       [<ffffffff811cda0b>] wb_writeback+0x1fb/0x2d0
       [<ffffffff811cdb7c>] wb_do_writeback+0x9c/0x220
       [<ffffffff811ce232>] bdi_writeback_workfn+0x72/0x1c0
       [<ffffffff8106b74e>] process_one_work+0x1de/0x5b0
       [<ffffffff8106e78f>] worker_thread+0x11f/0x3e0
       [<ffffffff810750ce>] kthread+0xde/0xf0
       [<ffffffff817093f8>] ret_from_fork+0x58/0x90
      
      fsstress thread stack:
       [<ffffffff81139f0e>] sleep_on_page+0xe/0x20
       [<ffffffff81139ef7>] __lock_page+0x67/0x70
       [<ffffffff8113b100>] find_lock_page+0x50/0x80
       [<ffffffff8113b24f>] find_or_create_page+0x3f/0xb0
       [<ffffffffa06983a9>] sync_node_pages+0x259/0x810 [f2fs]
       [<ffffffffa068d874>] write_checkpoint+0x1a4/0xce0 [f2fs]
       [<ffffffffa0686b0c>] f2fs_sync_fs+0x7c/0xd0 [f2fs]
       [<ffffffffa067c813>] f2fs_sync_file+0x143/0x5f0 [f2fs]
       [<ffffffff811d301b>] vfs_fsync_range+0x2b/0x40
       [<ffffffff811d304c>] vfs_fsync+0x1c/0x20
       [<ffffffff811d3291>] do_fsync+0x41/0x70
       [<ffffffff811d32d3>] SyS_fdatasync+0x13/0x20
       [<ffffffff817094a2>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff
      
      The reason of this issue is:
      CPU0:					CPU1:
       - f2fs_write_data_pages
      					 - f2fs_sync_fs
      					  - write_checkpoint
      					   - block_operations
      					    - f2fs_lock_all
      					     - down_write(sbi->cp_rwsem)
        - lock_page(page)
        - f2fs_write_data_page
      					    - sync_node_pages
      					     - flush_inline_data
      					      - pagecache_get_page(page, GFP_LOCK)
         - f2fs_lock_op
          - down_read(sbi->cp_rwsem)
      
      This patch alters to use trylock_page in flush_inline_data to fix this ABBA
      deadlock issue.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      19c7377b
    • Chao Yu's avatar
      f2fs: introduce f2fs_flush_merged_bios for cleanup · 406657dd
      Chao Yu authored
      Add a new helper f2fs_flush_merged_bios to clean up redundant codes.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      406657dd
    • Chao Yu's avatar
      f2fs: introduce f2fs_update_data_blkaddr for cleanup · f28b3434
      Chao Yu authored
      Add a new help f2fs_update_data_blkaddr to clean up redundant codes.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f28b3434
    • Chao Yu's avatar
      f2fs crypto: fix incorrect positioning for GCing encrypted data page · 4356e48e
      Chao Yu authored
      For now, flow of GCing an encrypted data page:
      1) try to grab meta page in meta inode's mapping with index of old block
      address of that data page
      2) load data of ciphertext into meta page
      3) allocate new block address
      4) write the meta page into new block address
      5) update block address pointer in direct node page.
      
      Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to
      check and wait on GCed encrypted data cached in meta page writebacked
      in order to avoid inconsistence among data page cache, meta page cache
      and data on-disk when updating.
      
      However, we will use new block address updated in step 5) as an index to
      lookup meta page in inner bio buffer. That would be wrong, and we will
      never find the GCing meta page, since we use the old block address as
      index of that page in step 1).
      
      This patch fixes the issue by adjust the order of step 1) and step 3),
      and in step 1) grab page with index generated in step 3).
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4356e48e
    • Chao Yu's avatar
      f2fs: fix incorrect upper bound when iterating inode mapping tree · 80dd9c0e
      Chao Yu authored
      1. Inode mapping tree can index page in range of [0, ULONG_MAX], however,
      in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX],
      result in miss hitting in page cache.
      
      2. filemap_fdatawait_range accepts range parameters in unit of bytes, so
      the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX]
      as range for waiting on writeback, big number of pages will not be covered.
      
      This patch corrects above two issues.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      80dd9c0e
  4. 23 Feb, 2016 32 commits
    • Yunlei He's avatar
      f2fs: avoid hungtask problem caused by losing wake_up · 0ff21646
      Yunlei He authored
      The D state of wait_on_all_pages_writeback should be waken by
      function f2fs_write_end_io when all writeback pages have been
      succesfully written to device. It's possible that wake_up comes
      between get_pages and io_schedule. Maybe in this case it will
      lost wake_up and still in D state even if all pages have been
      write back to device, and finally, the whole system will be into
      the hungtask state.
      
                      if (!get_pages(sbi, F2FS_WRITEBACK))
                               break;
      					<---------  wake_up
                      io_schedule();
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarBiao He <hebiao6@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0ff21646
    • Chao Yu's avatar
      f2fs: trace old block address for CoWed page · 7a9d7548
      Chao Yu authored
      This patch enables to trace old block address of CoWed page for better
      debugging.
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7a9d7548
    • Chao Yu's avatar
      f2fs: try to flush inode after merging inline data · 9a4cbc9e
      Chao Yu authored
      When flushing node pages, if current node page is an inline inode page, we
      will try to merge inline data from data page into inline inode page, then
      skip flushing current node page, it will decrease the number of nodes to
      be flushed in batch in this round, which may lead to worse performance.
      
      This patch gives a chance to flush just merged inline inode pages for
      performance.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9a4cbc9e
    • Chao Yu's avatar
      f2fs: show more info about superblock recovery · 41214b3c
      Chao Yu authored
      This patch changes to show more info in message log about the recovery
      of the corrupted superblock during ->mount, e.g. the index of corrupted
      superblock and the result of recovery.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      41214b3c
    • Chao Yu's avatar
      f2fs: fix the wrong stat count of calling gc · 17d899df
      Chao Yu authored
      With a partition which was formated as multi segments in one section,
      we stated incorrectly for count of gc operation.
      
      e.g., for a partition with segs_per_sec = 4
      
      cat /sys/kernel/debug/f2fs/status
      
      GC calls: 208 (BG: 7)
        - data segments : 104 (52)
        - node segments : 104 (24)
      
      GC called count should be (104 (data segs) + 104 (node segs)) / 4 = 52,
      rather than 208. Fix it.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      17d899df
    • Jaegeuk Kim's avatar
      f2fs: remain last victim segment number ascending order · 4ce53776
      Jaegeuk Kim authored
      This patch avoids to remain inefficient victim segment number selected by
      a victim.
      
      For example, if all the dirty segments has same valid blocks, we can get
      the victim segments descending order due to keeping wrong last segment number.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4ce53776
    • Shawn Lin's avatar
      f2fs: reuse read_inline_data for f2fs_convert_inline_page · 8060656a
      Shawn Lin authored
      f2fs_convert_inline_page introduce what read_inline_data
      already does for copying out the inline data from inode_page.
      We can use read_inline_data instead to simplify the code.
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8060656a
    • Chao Yu's avatar
      f2fs: fix to delete old dirent in converted inline directory in ->rename · 993a0499
      Chao Yu authored
      When doing test with fstests/generic/068 in inline_dentry enabled f2fs,
      following oops dmesg will be reported:
      
       ------------[ cut here ]------------
       WARNING: CPU: 5 PID: 11841 at fs/inode.c:273 drop_nlink+0x49/0x50()
       Modules linked in: f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
       CPU: 5 PID: 11841 Comm: fsstress Tainted: G           O    4.5.0-rc1 #45
       Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
        0000000000000111 ffff88009cdf7ae8 ffffffff813e5944 0000000000002e41
        0000000000000000 0000000000000111 0000000000000000 ffff88009cdf7b28
        ffffffff8106a587 ffff88009cdf7b58 ffff8804078fe180 ffff880374a64e00
       Call Trace:
        [<ffffffff813e5944>] dump_stack+0x48/0x64
        [<ffffffff8106a587>] warn_slowpath_common+0x97/0xe0
        [<ffffffff8106a5ea>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81231039>] drop_nlink+0x49/0x50
        [<ffffffffa07b95b4>] f2fs_rename2+0xe04/0x10c0 [f2fs]
        [<ffffffff81231ff1>] ? lock_two_nondirectories+0x81/0x90
        [<ffffffff813f454d>] ? lockref_get+0x1d/0x30
        [<ffffffff81220f70>] vfs_rename+0x2e0/0x640
        [<ffffffff8121f9db>] ? lookup_dcache+0x3b/0xd0
        [<ffffffff810b8e41>] ? update_fast_ctr+0x21/0x40
        [<ffffffff8134ff12>] ? security_path_rename+0xa2/0xd0
        [<ffffffff81224af6>] SYSC_renameat2+0x4b6/0x540
        [<ffffffff810ba8ed>] ? trace_hardirqs_off+0xd/0x10
        [<ffffffff810022ba>] ? exit_to_usermode_loop+0x7a/0xd0
        [<ffffffff817e0ade>] ? int_ret_from_sys_call+0x52/0x9f
        [<ffffffff810bdc90>] ? trace_hardirqs_on_caller+0x100/0x1c0
        [<ffffffff81224b8e>] SyS_renameat2+0xe/0x10
        [<ffffffff8121f08e>] SyS_rename+0x1e/0x20
        [<ffffffff817e0957>] entry_SYSCALL_64_fastpath+0x12/0x6f
       ---[ end trace 2b31e17995404e42 ]---
      
      This is because: in the same inline directory, when we renaming one file
      from source name to target name which is not existed, once space of inline
      dentry is not enough, inline conversion will be triggered, after that all
      data in inline dentry will be moved to normal dentry page.
      
      After attaching the new entry in coverted dentry page, still we try to
      remove old entry in original inline dentry, since old entry has been
      moved, so it obviously doesn't make any effect, result in remaining old
      entry in converted dentry page.
      
      Now, we have two valid dentries pointed to the same inode which has nlink
      value of 1, deleting them both, above warning appears.
      
      This issue can be reproduced easily as below steps:
      1. mount f2fs with inline_dentry option
      2. mkdir dir
      3. touch 180 files named [001-180] in dir
      4. rename dir/180 dir/181
      5. rm dir/180 dir/181
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      993a0499
    • Chao Yu's avatar
      f2fs: detect error of update_dent_inode in ->rename · 9def1e92
      Chao Yu authored
      Should check and show correct return value of update_dent_inode in
      ->rename.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9def1e92
    • Shawn Lin's avatar
      f2fs: move sanity checking of cp into get_valid_checkpoint · 984ec63c
      Shawn Lin authored
      >From the function name of get_valid_checkpoint, it seems to return
      the valid cp or NULL for caller to check. If no valid one is found,
      f2fs_fill_super will print the err log. But if get_valid_checkpoint
      get one valid(the return value indicate that it's valid, however actually
      it is invalid after sanity checking), then print another similar err
      log. That seems strange. Let's keep sanity checking inside the procedure
      of geting valid cp. Another improvement we gained from this move is
      that even the large volume is supported, we check the cp in advanced
      to skip the following procedure if failing the sanity checking.
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      984ec63c
    • Shawn Lin's avatar
      f2fs: slightly reorganize read_raw_super_block · 2b39e907
      Shawn Lin authored
      read_raw_super_block was introduced to help find the
      first valid superblock. Commit da554e48 ("f2fs:
      recovering broken superblock during mount") changed the
      behaviour to read both of them and check whether need
      the recovery flag or not. So the comment before this
      function isn't consistent with what it actually does.
      Also, the origin code use two tags to round the err
      cases, which isn't so readable. So this patch amend
      the comment and slightly reorganize it.
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2b39e907
    • Chao Yu's avatar
      f2fs: reorder nat cache lock in cache_nat_entry · 1515aef0
      Chao Yu authored
      When lookuping nat entry in cache_nat_entry, if we fail to hit nat cache,
      we try to load nat entries a) from journal of current segment cache or b)
      from NAT pages for updating, during the process, write lock of
      nat_tree_lock will be held to avoid inconsistent condition in between
      nid cache and nat cache caused by racing among nat entry shrinker,
      checkpointer, nat entry updater.
      
      But this way may cause low efficient when updating nat cache, because it
      serializes accessing in journal cache or reading NAT pages.
      
      Here, we reorder lock and update flow as below to enhance accessing
      concurrency:
      
       - get_node_info
        - down_read(nat_tree_lock)
        - lookup nat cache --- hit -> unlock & return
        - lookup journal cache --- hit -> unlock & goto update
        - up_read(nat_tree_lock)
      update:
        - down_write(nat_tree_lock)
        - cache_nat_entry
         - lookup nat cache --- nohit -> update
        - up_write(nat_tree_lock)
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1515aef0
    • Chao Yu's avatar
      f2fs: split journal cache from curseg cache · b7ad7512
      Chao Yu authored
      In curseg cache, f2fs caches two different parts:
       - datas of current summay block, i.e. summary entries, footer info.
       - journal info, i.e. sparse nat/sit entries or io stat info.
      
      With this approach, 1) it may cause higher lock contention when we access
      or update both of the parts of cache since we use the same mutex lock
      curseg_mutex to protect the cache. 2) current summary block with last
      journal info will be writebacked into device as a normal summary block
      when flushing, however, we treat journal info as valid one only in current
      summary, so most normal summary blocks contain junk journal data, it wastes
      remaining space of summary block.
      
      So, in order to fix above issues, we split curseg cache into two parts:
      a) current summary block, protected by original mutex lock curseg_mutex
      b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
      
      When loading curseg cache during ->mount, we store summary info and
      journal info into different caches; When doing checkpoint, we combine
      datas of two cache into current summary block for persisting.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b7ad7512
    • Chao Yu's avatar
      f2fs: enhance IO path with block plug · e9f5b8b8
      Chao Yu authored
      Try to use block plug in more place as below to let process cache bios
      as much as possbile, in order to reduce lock overhead of queue in IO
      scheduler.
      1) sync_meta_pages
      2) ra_meta_pages
      3) f2fs_balance_fs_bg
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e9f5b8b8
    • Chao Yu's avatar
      f2fs: introduce f2fs_journal struct to wrap journal info · dfc08a12
      Chao Yu authored
      Introduce a new structure f2fs_journal to wrap journal info in struct
      f2fs_summary_block for readability.
      
      struct f2fs_journal {
      	union {
      		__le16 n_nats;
      		__le16 n_sits;
      	};
      	union {
      		struct nat_journal nat_j;
      		struct sit_journal sit_j;
      		struct f2fs_extra_info info;
      	};
      } __packed;
      
      struct f2fs_summary_block {
      	struct f2fs_summary entries[ENTRIES_IN_SUM];
      	struct f2fs_journal journal;
      	struct summary_footer footer;
      } __packed;
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dfc08a12
    • Chao Yu's avatar
      f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink · 922ec355
      Chao Yu authored
      This patch adopts f2fs with codes of ext4, it removes unneeded memory
      allocation in creating/accessing path of symlink.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      922ec355
    • Chao Yu's avatar
      f2fs crypto: handle unexpected lack of encryption keys · ae108668
      Chao Yu authored
      This patch syncs f2fs with commit abdd438b ("ext4 crypto: handle
      unexpected lack of encryption keys") from ext4.
      
      Fix up attempts by users to try to write to a file when they don't
      have access to the encryption key.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ae108668
    • Chao Yu's avatar
      f2fs crypto: make sure the encryption info is initialized on opendir(2) · ed3360ab
      Chao Yu authored
      This patch syncs f2fs with commit 6bc445e0 ("ext4 crypto: make
      sure the encryption info is initialized on opendir(2)") from ext4.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ed3360ab
    • Chao Yu's avatar
      f2fs: support revoking atomic written pages · 28bc106b
      Chao Yu authored
      f2fs support atomic write with following semantics:
      1. open db file
      2. ioctl start atomic write
      3. (write db file) * n
      4. ioctl commit atomic write
      5. close db file
      
      With this flow we can avoid file becoming corrupted when abnormal power
      cut, because we hold data of transaction in referenced pages linked in
      inmem_pages list of inode, but without setting them dirty, so these data
      won't be persisted unless we commit them in step 4.
      
      But we should still hold journal db file in memory by using volatile
      write, because our semantics of 'atomic write support' is incomplete, in
      step 4, we could fail to submit all dirty data of transaction, once
      partial dirty data was committed in storage, then after a checkpoint &
      abnormal power-cut, db file will be corrupted forever.
      
      So this patch tries to improve atomic write flow by adding a revoking flow,
      once inner error occurs in committing, this gives another chance to try to
      revoke these partial submitted data of current transaction, it makes
      committing operation more like aotmical one.
      
      If we're not lucky, once revoking operation was failed, EAGAIN will be
      reported to user for suggesting doing the recovery with held journal file,
      or retrying current transaction again.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      28bc106b
    • Chao Yu's avatar
      f2fs: split drop_inmem_pages from commit_inmem_pages · 29b96b54
      Chao Yu authored
      Split drop_inmem_pages from commit_inmem_pages for code readability,
      and prepare for the following modification.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      29b96b54
    • Jaegeuk Kim's avatar
      f2fs: avoid garbage lenghs in dentries · 7d9dfa1d
      Jaegeuk Kim authored
      This patch fixes to eliminate garbage name lengths in dentries in order
      to provide correct answers of readdir.
      
      For example, if a valid dentry consists of:
       bitmap : 1   1 1 1
       len    : 32  0 x 0,
      
      readdir can start with second bit_pos having len = 0.
      Or, it can start with third bit_pos having garbage.
      
      In both of cases, we should avoid to try filling dentries.
      So, this patch not only removes any garbage length, but also avoid entering
      zero length case in readdir.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7d9dfa1d
    • Jaegeuk Kim's avatar
      f2fs crypto: sync with ext4's fname padding · a263669f
      Jaegeuk Kim authored
      This patch fixes wrong adoption on fname padding.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a263669f
    • Jaegeuk Kim's avatar
      f2fs: use correct errno · 60b286c4
      Jaegeuk Kim authored
      This patch is to fix misused error number.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      60b286c4
    • Jaegeuk Kim's avatar
      f2fs crypto: add missing locking for keyring_key access · 745e8490
      Jaegeuk Kim authored
      This patch adopts:
      	ext4 crypto: add missing locking for keyring_key access
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      745e8490
    • Jaegeuk Kim's avatar
      f2fs crypto: check for too-short encrypted file names · 1dafa51d
      Jaegeuk Kim authored
      This patch adopts:
      	ext4 crypto: check for too-short encrypted file names
      
      An encrypted file name should never be shorter than an 16 bytes, the
      AES block size.  The 3.10 crypto layer will oops and crash the kernel
      if ciphertext shorter than the block size is passed to it.
      
      Fortunately, in modern kernels the crypto layer will not crash the
      kernel in this scenario, but nevertheless, it represents a corrupted
      directory, and we should detect it and mark the file system as
      corrupted so that e2fsck can fix this.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1dafa51d
    • Jaegeuk Kim's avatar
      f2fs crypto: f2fs_page_crypto() doesn't need a encryption context · ce855a3b
      Jaegeuk Kim authored
      This patch adopts:
      	ext4 crypto: ext4_page_crypto() doesn't need a encryption context
      
      Since ext4_page_crypto() doesn't need an encryption context (at least
      not any more), this allows us to simplify a number function signature
      and also allows us to avoid needing to allocate a context in
      ext4_block_write_begin().  It also means we no longer need a separate
      ext4_decrypt_one() function.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ce855a3b
    • Jaegeuk Kim's avatar
      f2fs crypto: fix spelling typo in comment · 0fac2d50
      Jaegeuk Kim authored
      This patch adopts:
      	ext4 crypto: fix spelling typo in comment
      Signed-off-by: default avatarLaurent Navet <laurent.navet@gmail.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0fac2d50
    • Jaegeuk Kim's avatar
      f2fs crypto: replace some BUG_ON()'s with error checks · 66aa3e12
      Jaegeuk Kim authored
      This patch adopts:
      	ext4 crypto: replace some BUG_ON()'s with error checks
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      66aa3e12
    • Jaegeuk Kim's avatar
      f2fs: increase i_size to avoid missing data · 8ef2af45
      Jaegeuk Kim authored
      When finsert is doing with dirting pages, we should increase i_size right away.
      Otherwise, the moved page is able to be dropped by the following
      filemap_write_and_wait_range before updating i_size.
      Especially, it can be done by
      	if ((page->index >= end_index + 1) || !offset)
      		goto out;
      in f2fs_write_data_page.
      
      This should resolve the below xfstests/091 failure reported by Dave.
      
      $ diff -u tests/generic/091.out /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad
      --- tests/generic/091.out       2014-01-20 16:57:33.000000000 +1100
      +++ /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad       2016-02-08 15:21:02.701375087 +1100
      @@ -1,7 +1,18 @@
       QA output created by 091
       fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
      -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
      -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
      -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
      -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
      -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
      +mapped writes DISABLED
      +skipping insert range behind EOF
      +skipping insert range behind EOF
      +truncating to largest ever: 0x11e00
      +dowrite: write: Invalid argument
      +LOG DUMP (7 total operations):
      +1(  1 mod 256): SKIPPED (no operation)
      +2(  2 mod 256): SKIPPED (no operation)
      +3(  3 mod 256): FALLOC   0x2e0f2 thru 0x3134a  (0x3258 bytes) PAST_EOF
      +4(  4 mod 256): SKIPPED (no operation)
      +5(  5 mod 256): SKIPPED (no operation)
      +6(  6 mod 256): TRUNCATE UP    from 0x0 to 0x11e00
      +7(  7 mod 256): WRITE    0x73400 thru 0x79fff  (0x6c00 bytes) HOLE
      +Log of operations saved to "/mnt/test/junk.fsxops"; replay with --replay-ops
      +Correct content saved for comparison
      +(maybe hexdump "/mnt/test/junk" vs "/mnt/test/junk.fsxgood")
      Reported-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8ef2af45
    • Jaegeuk Kim's avatar
      f2fs: preallocate blocks for buffered aio writes · 24b84912
      Jaegeuk Kim authored
      This patch preallocates data blocks for buffered aio writes.
      With this patch, we can avoid redundant locking and unlocking of node pages
      given consecutive aio request.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      24b84912
    • Jaegeuk Kim's avatar
      f2fs: move dio preallocation into f2fs_file_write_iter · b439b103
      Jaegeuk Kim authored
      This patch moves preallocation code for direct IOs into f2fs_file_write_iter.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b439b103
    • Yunlei He's avatar
      f2fs: fix missing skip pages info · d31c7c3f
      Yunlei He authored
      fix missing skip pages info in f2fs_writepages trace event.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d31c7c3f