1. 18 May, 2016 6 commits
  2. 16 May, 2016 6 commits
  3. 11 May, 2016 7 commits
    • Chao Yu's avatar
      f2fs: fix deadlock when flush inline data · ab47036d
      Chao Yu authored
      Below backtrace info was reported by Yunlei He:
      
      Call Trace:
       [<ffffffff817a9395>] schedule+0x35/0x80
       [<ffffffff817abb7d>] rwsem_down_read_failed+0xed/0x130
       [<ffffffff813c12a8>] call_rwsem_down_read_failed+0x18/0x
       [<ffffffff817ab1d0>] down_read+0x20/0x30
       [<ffffffffa02a1a12>] f2fs_evict_inode+0x242/0x3a0 [f2fs]
       [<ffffffff81217057>] evict+0xc7/0x1a0
       [<ffffffff81217cd6>] iput+0x196/0x200
       [<ffffffff812134f9>] __dentry_kill+0x179/0x1e0
       [<ffffffff812136f9>] dput+0x199/0x1f0
       [<ffffffff811fe77b>] __fput+0x18b/0x220
       [<ffffffff811fe84e>] ____fput+0xe/0x10
       [<ffffffff81097427>] task_work_run+0x77/0x90
       [<ffffffff81074d62>] exit_to_usermode_loop+0x73/0xa2
       [<ffffffff81003b7a>] do_syscall_64+0xfa/0x110
       [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Call Trace:
       [<ffffffff817a9395>] schedule+0x35/0x80
       [<ffffffff81216dc3>] __wait_on_freeing_inode+0xa3/0xd0
       [<ffffffff810bc300>] ? autoremove_wake_function+0x40/0x4
       [<ffffffff8121771d>] find_inode_fast+0x7d/0xb0
       [<ffffffff8121794a>] ilookup+0x6a/0xd0
       [<ffffffffa02bc740>] sync_node_pages+0x210/0x650 [f2fs]
       [<ffffffff8122e690>] ? do_fsync+0x70/0x70
       [<ffffffffa02b085e>] block_operations+0x9e/0xf0 [f2fs]
       [<ffffffff8137b795>] ? bio_endio+0x55/0x60
       [<ffffffffa02b0942>] write_checkpoint+0x92/0xba0 [f2fs]
       [<ffffffff8117da57>] ? mempool_free_slab+0x17/0x20
       [<ffffffff8117de8b>] ? mempool_free+0x2b/0x80
       [<ffffffff8122e690>] ? do_fsync+0x70/0x70
       [<ffffffffa02a53e3>] f2fs_sync_fs+0x63/0xd0 [f2fs]
       [<ffffffff8129630f>] ? ext4_sync_fs+0xbf/0x190
       [<ffffffff8122e6b0>] sync_fs_one_sb+0x20/0x30
       [<ffffffff812002e9>] iterate_supers+0xb9/0x110
       [<ffffffff8122e7b5>] sys_sync+0x55/0x90
       [<ffffffff81003ae9>] do_syscall_64+0x69/0x110
       [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25
      
      With following excuting serials, we will set inline_node in inode page
      after inode was unlinked, result in a deadloop described as below:
      1. open file
      2. write file
      3. unlink file
      4. write file
      5. close file
      
      Thread A				Thread B
       - dput
        - iput_final
         - inode->i_state |= I_FREEING
         - evict
          - f2fs_evict_inode
      					 - f2fs_sync_fs
      					  - write_checkpoint
      					   - block_operations
      					    - f2fs_lock_all (down_write(cp_rwsem))
           - f2fs_lock_op (down_read(cp_rwsem))
      					    - sync_node_pages
      					     - ilookup
      					      - find_inode_fast
      					       - __wait_on_freeing_inode
      					         (wait on I_FREEING clear)
      
      Here, we change to set inline_node flag only for linked inode for fixing.
      Reported-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Tested-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Cc: stable@vger.kernel.org # v4.6
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ab47036d
    • Jaegeuk Kim's avatar
      f2fs: avoid f2fs_bug_on during recovery · 3b9b10f9
      Jaegeuk Kim authored
      We don't need to use f2fs_bug_on() to treat with any error case when allocating
      a block during recovery.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3b9b10f9
    • Jaegeuk Kim's avatar
      f2fs: show # of orphan inodes · 652be551
      Jaegeuk Kim authored
      This adds debug information for # of orphan inodes.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      652be551
    • Chao Yu's avatar
      f2fs: support in batch fzero in dnode page · 6e961949
      Chao Yu authored
      This patch tries to speedup fzero_range by making space preallocation and
      address removal of blocks in one dnode page as in batch operation.
      
      In virtual machine, with zram driver:
      
      dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=4096
      time xfs_io -f /mnt/f2fs/file -c "fzero 0 4096M"
      
      Before:
      real	0m3.276s
      user	0m0.008s
      sys	0m3.260s
      
      After:
      real	0m1.568s
      user	0m0.000s
      sys	0m1.564s
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: consider ENOSPC case]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6e961949
    • Chao Yu's avatar
      f2fs: support in batch multi blocks preallocation · 46008c6d
      Chao Yu authored
      This patch introduces reserve_new_blocks to make preallocation of multi
      blocks as in batch operation, so it can avoid lots of redundant
      operation, result in better performance.
      
      In virtual machine, with rotational device:
      
      time fallocate -l 32G /mnt/f2fs/file
      
      Before:
      real	0m4.584s
      user	0m0.000s
      sys	0m4.580s
      
      After:
      real	0m0.292s
      user	0m0.000s
      sys	0m0.272s
      
      In x86, with SSD:
      
      time fallocate -l 500G $MNT/testfile
      
      Before : 24.758 s
      After  :  1.604 s
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix bugs and add performance numbers measured in x86.]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      46008c6d
    • Chao Yu's avatar
      f2fs: make atomic/volatile operation exclusive · 0fac558b
      Chao Yu authored
      atomic/volatile ioctl interfaces are exposed to user like other file
      operation interface, it needs to make them getting exclusion against
      to each other to avoid potential conflict among these operations
      in concurrent scenario.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0fac558b
    • Chao Yu's avatar
      f2fs: use mnt_{want,drop}_write_file in ioctl · 7fb17fe4
      Chao Yu authored
      In interfaces of ioctl, mnt_{want,drop}_write_file should be used for:
      - get exclusion against file system freezing which may used by lvm
        snapshot.
      - do telling filesystem that a write is about to be performed on it, and
        make sure that the writes are permitted.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7fb17fe4
  4. 07 May, 2016 21 commits
    • Jaegeuk Kim's avatar
      f2fs: do not preallocate block unaligned to 4KB · 0080c507
      Jaegeuk Kim authored
      Previously f2fs_preallocate_blocks() tries to allocate unaligned blocks.
      In f2fs_write_begin(), however, prepare_write_begin() does not skip its
      allocation due to (len != 4KB).
      So, it needs locking node page twice unexpectedly.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0080c507
    • Jaegeuk Kim's avatar
      f2fs: read node blocks ahead when truncating blocks · 79344efb
      Jaegeuk Kim authored
      This patch enables reading node blocks in advance when truncating large
      data blocks.
      
       > time rm $MNT/testfile (500GB) after drop_cachees
      Before : 9.422 s
      After  : 4.821 s
      Reported-by: default avatarStephen Bates <stephen.bates@microsemi.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      79344efb
    • Jaegeuk Kim's avatar
      f2fs: fallocate data blocks in single locked node page · e12dd7bd
      Jaegeuk Kim authored
      This patch is to improve the expand_inode speed in fallocate by allocating
      data blocks as many as possible in single locked node page.
      
      In SSD,
       # time fallocate -l 500G $MNT/testfile
      
      Before : 1m 33.410 s
      After  : 24.758 s
      Reported-by: default avatarStephen Bates <stephen.bates@microsemi.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e12dd7bd
    • Chao Yu's avatar
      f2fs: fix inode cache leak · f61cce5b
      Chao Yu authored
      When testing f2fs with inline_dentry option, generic/342 reports:
      VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds.  Have a nice day...
      
      After rmmod f2fs module, kenrel shows following dmesg:
       =============================================================================
       BUG f2fs_inode_cache (Tainted: G           O   ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       Disabling lock debugging due to kernel taint
       INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
       CPU: 3 PID: 7455 Comm: rmmod Tainted: G    B      O    4.6.0-rc4+ #16
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
        00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
        c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
        616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
       Call Trace:
        [<c13a83a0>] dump_stack+0x5f/0x8f
        [<c11c7276>] slab_err+0x76/0x80
        [<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
        [<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
        [<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
        [<c1198a38>] kmem_cache_destroy+0x158/0x1f0
        [<c176b43d>] ? mutex_unlock+0xd/0x10
        [<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
        [<c10f596c>] SyS_delete_module+0x16c/0x1d0
        [<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
        [<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
        [<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
        [<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
        [<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
        [<c176d888>] sysenter_past_esp+0x45/0x74
       INFO: Object 0xd1e6d9e0 @offset=6624
       kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
       CPU: 3 PID: 7455 Comm: rmmod Tainted: G    B      O    4.6.0-rc4+ #16
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
        00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
        c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
        f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
       Call Trace:
        [<c13a83a0>] dump_stack+0x5f/0x8f
        [<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
        [<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
        [<c10f596c>] SyS_delete_module+0x16c/0x1d0
        [<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
        [<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
        [<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
        [<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
        [<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
        [<c176d888>] sysenter_past_esp+0x45/0x74
      
      The reason is: in recovery flow, we use delayed iput mechanism for directory
      which has recovered dentry block. It means the reference of inode will be
      held until last dirty dentry page being writebacked.
      
      But when we mount f2fs with inline_dentry option, during recovery, dirent
      may only be recovered into dir inode page rather than dentry page, so there
      are no chance for us to release inode reference in ->writepage when
      writebacking last dentry page.
      
      We can call paired iget/iput explicityly for inline_dentry case, but for
      non-inline_dentry case, iput will call writeback_single_inode to write all
      data pages synchronously, but during recovery, ->writepages of f2fs skips
      writing all pages, result in losing dirent.
      
      This patch fixes this issue by obsoleting old mechanism, and introduce a
      new dir_list to hold all directory inodes which has recovered datas until
      finishing recovery.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f61cce5b
    • Jaegeuk Kim's avatar
      fscrypto/f2fs: allow fs-specific key prefix for fs encryption · b5a7aef1
      Jaegeuk Kim authored
      This patch allows fscrypto to handle a second key prefix given by filesystem.
      The main reason is to provide backward compatibility, since previously f2fs
      used "f2fs:" as a crypto prefix instead of "fscrypt:".
      Later, ext4 should also provide key_prefix() to give "ext4:".
      
      One concern decribed by Ted would be kinda double check overhead of prefixes.
      In x86, for example, validate_user_key consumes 8 ms after boot-up, which turns
      out derive_key_aes() consumed most of the time to load specific crypto module.
      After such the cold miss, it shows almost zero latencies, which treats as a
      negligible overhead.
      Note that request_key() detects wrong prefix in prior to derive_key_aes() even.
      
      Cc: Ted Tso <tytso@mit.edu>
      Cc: stable@vger.kernel.org # v4.6
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b5a7aef1
    • Chao Yu's avatar
      f2fs: avoid panic when truncating to max filesize · 09210c97
      Chao Yu authored
      The following panic occurs when truncating inode which has inline
      xattr to max filesize.
      
      [<ffffffffa013d3be>] get_dnode_of_data+0x4e/0x580 [f2fs]
      [<ffffffffa013aca1>] ? read_node_page+0x51/0x90 [f2fs]
      [<ffffffffa013ad99>] ? get_node_page.part.34+0xb9/0x170 [f2fs]
      [<ffffffffa01235b1>] truncate_blocks+0x131/0x3f0 [f2fs]
      [<ffffffffa01238e3>] f2fs_truncate+0x73/0x100 [f2fs]
      [<ffffffffa01239d2>] f2fs_setattr+0x62/0x2a0 [f2fs]
      [<ffffffff811a72c8>] notify_change+0x158/0x300
      [<ffffffff8118a42b>] do_truncate+0x6b/0xa0
      [<ffffffff8118e539>] ? __sb_start_write+0x49/0x100
      [<ffffffff8118a798>] do_sys_ftruncate.constprop.12+0x118/0x170
      [<ffffffff8118a82e>] SyS_ftruncate+0xe/0x10
      [<ffffffff8169efcf>] tracesys+0xe1/0xe6
      [<ffffffffa0139ae0>] get_node_path+0x210/0x220 [f2fs]
       <ffff880206a89ce8>
      --[ end trace 5fea664dfbcc6625 ]---
      
      The reason is truncate_blocks tries to truncate all node and data blocks
      start from specified block offset with value of (max filesize / block
      size), but actually, our valid max block offset is (max filesize / block
      size) - 1, so f2fs detects such invalid block offset with BUG_ON in
      truncation path.
      
      This patch lets f2fs skip truncating data which is exceeding max
      filesize.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      09210c97
    • Chao Yu's avatar
      f2fs: fix incorrect mapping in ->bmap · 43473f96
      Chao Yu authored
      Currently, generic_block_bmap is used in f2fs_bmap, its semantics is when
      the mapping is been found, return position of target physical block,
      otherwise return zero.
      
      But, previously, when there is no mapping info for specified logical block,
      f2fs_bmap will map target physical block to a uninitialized variable, which
      should be wrong. Fix it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      43473f96
    • Jaegeuk Kim's avatar
      f2fs: remove an obsolete variable · fb58ae22
      Jaegeuk Kim authored
      This patch removes an obsolete variable used in add_free_nid.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fb58ae22
    • Jaegeuk Kim's avatar
      f2fs: don't worry about inode leak in evict_inode · 29234b1d
      Jaegeuk Kim authored
      Even if an inode failed to release its blocks, it should be kept in an orphan
      inode list, so it will be released later.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      29234b1d
    • Chao Yu's avatar
      f2fs: shrink size of struct seg_entry · f51b4ce6
      Chao Yu authored
      Restructure struct seg_entry to eliminate holes in it, after that,
      in 32-bits machine, it reduces size from 32 bytes to 24 bytes; in
      64-bits machine, it reduces size from 56 bytes to 40 bytes.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f51b4ce6
    • Chao Yu's avatar
      f2fs: reuse get_extent_info · bd933d4f
      Chao Yu authored
      Reuse get_extent_info for readability.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bd933d4f
    • Chao Yu's avatar
      f2fs: remove unneeded memset when updating xattr · e3bc808c
      Chao Yu authored
      Each of fields in struct f2fs_xattr_entry will be assigned later,
      so previously we don't need to memset the struct.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e3bc808c
    • Chao Yu's avatar
      f2fs: remove unneeded readahead in find_fsync_dnodes · ae8d1db3
      Chao Yu authored
      In find_fsync_dnodes, get_tmp_page will read dnode page synchronously,
      previously, ra_meta_page did the same work, which is redundant, remove
      it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ae8d1db3
    • Jaegeuk Kim's avatar
      f2fs: retry to truncate blocks in -ENOMEM case · 4c0c2949
      Jaegeuk Kim authored
      This patch modifies to retry truncating node blocks in -ENOMEM case.
      Signed-off-by: default avatarHou Pengyang <houpengyang@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4c0c2949
    • Jaegeuk Kim's avatar
      f2fs: fix leak of orphan inode objects · 74ef9241
      Jaegeuk Kim authored
      When unmounting filesystem, we should release all the ino entries.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      74ef9241
    • Jaegeuk Kim's avatar
      f2fs: revisit error handling flows · 221149c0
      Jaegeuk Kim authored
      This patch fixes a couple of bugs regarding to orphan inodes when handling
      errors.
      
      This tries to
       - call alloc_nid_done with add_orphan_inode in handle_failed_inode
       - let truncate blocks in f2fs_evict_inode
       - not make a bad inode due to i_mode change
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      221149c0
    • Jaegeuk Kim's avatar
      f2fs: inject ENOSPC failures · cb78942b
      Jaegeuk Kim authored
      This patch injects ENOSPC failures.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      cb78942b
    • Jaegeuk Kim's avatar
      f2fs: inject page allocation failures · c41f3cc3
      Jaegeuk Kim authored
      This patch adds page allocation failures.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c41f3cc3
    • Jaegeuk Kim's avatar
      f2fs: inject kmalloc failure · 2c63fead
      Jaegeuk Kim authored
      This patch injects kmalloc failure given a fault injection rate.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2c63fead
    • Jaegeuk Kim's avatar
      f2fs: add mount option to select fault injection ratio · 73faec4d
      Jaegeuk Kim authored
      This patch adds a mount option to select fault ratio.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      73faec4d
    • Jaegeuk Kim's avatar
      f2fs: use f2fs_grab_cache_page instead of grab_cache_page · 300e129c
      Jaegeuk Kim authored
      This patch converts grab_cache_page to f2fs_grab_cache_page.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      300e129c