1. 09 May, 2019 33 commits
    • Randall Huang's avatar
      f2fs: fix to avoid accessing xattr across the boundary · 2777e654
      Randall Huang authored
      When we traverse xattr entries via __find_xattr(),
      if the raw filesystem content is faked or any hardware failure occurs,
      out-of-bound error can be detected by KASAN.
      Fix the issue by introducing boundary check.
      
      [   38.402878] c7   1827 BUG: KASAN: slab-out-of-bounds in f2fs_getxattr+0x518/0x68c
      [   38.402891] c7   1827 Read of size 4 at addr ffffffc0b6fb35dc by task
      [   38.402935] c7   1827 Call trace:
      [   38.402952] c7   1827 [<ffffff900809003c>] dump_backtrace+0x0/0x6bc
      [   38.402966] c7   1827 [<ffffff9008090030>] show_stack+0x20/0x2c
      [   38.402981] c7   1827 [<ffffff900871ab10>] dump_stack+0xfc/0x140
      [   38.402995] c7   1827 [<ffffff9008325c40>] print_address_description+0x80/0x2d8
      [   38.403009] c7   1827 [<ffffff900832629c>] kasan_report_error+0x198/0x1fc
      [   38.403022] c7   1827 [<ffffff9008326104>] kasan_report_error+0x0/0x1fc
      [   38.403037] c7   1827 [<ffffff9008325000>] __asan_load4+0x1b0/0x1b8
      [   38.403051] c7   1827 [<ffffff90085fcc44>] f2fs_getxattr+0x518/0x68c
      [   38.403066] c7   1827 [<ffffff90085fc508>] f2fs_xattr_generic_get+0xb0/0xd0
      [   38.403080] c7   1827 [<ffffff9008395708>] __vfs_getxattr+0x1f4/0x1fc
      [   38.403096] c7   1827 [<ffffff9008621bd0>] inode_doinit_with_dentry+0x360/0x938
      [   38.403109] c7   1827 [<ffffff900862d6cc>] selinux_d_instantiate+0x2c/0x38
      [   38.403123] c7   1827 [<ffffff900861b018>] security_d_instantiate+0x68/0x98
      [   38.403136] c7   1827 [<ffffff9008377db8>] d_splice_alias+0x58/0x348
      [   38.403149] c7   1827 [<ffffff900858d16c>] f2fs_lookup+0x608/0x774
      [   38.403163] c7   1827 [<ffffff900835eacc>] lookup_slow+0x1e0/0x2cc
      [   38.403177] c7   1827 [<ffffff9008367fe0>] walk_component+0x160/0x520
      [   38.403190] c7   1827 [<ffffff9008369ef4>] path_lookupat+0x110/0x2b4
      [   38.403203] c7   1827 [<ffffff900835dd38>] filename_lookup+0x1d8/0x3a8
      [   38.403216] c7   1827 [<ffffff900835eeb0>] user_path_at_empty+0x54/0x68
      [   38.403229] c7   1827 [<ffffff9008395f44>] SyS_getxattr+0xb4/0x18c
      [   38.403241] c7   1827 [<ffffff9008084200>] el0_svc_naked+0x34/0x38
      Signed-off-by: default avatarRandall Huang <huangrandall@google.com>
      [Jaegeuk Kim: Fix wrong ending boundary]
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2777e654
    • Chao Yu's avatar
      f2fs: fix to avoid potential race on sbi->unusable_block_count access/update · c9c8ed50
      Chao Yu authored
      Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
      order to avoid potential race on it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c9c8ed50
    • Chao Yu's avatar
      f2fs: add tracepoint for f2fs_filemap_fault() · d7648343
      Chao Yu authored
      This patch adds tracepoint for f2fs_filemap_fault().
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d7648343
    • Chao Yu's avatar
      f2fs: introduce DATA_GENERIC_ENHANCE · 93770ab7
      Chao Yu authored
      Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
      whether @blkaddr locates in main area or not.
      
      That check is weak, since the block address in range of main area can
      point to the address which is not valid in segment info table, and we
      can not detect such condition, we may suffer worse corruption as system
      continues running.
      
      So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
      which trigger SIT bitmap check rather than only range check.
      
      This patch did below changes as wel:
      - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
      - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
      - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
      - spread blkaddr check in:
       * f2fs_get_node_info()
       * __read_out_blkaddrs()
       * f2fs_submit_page_read()
       * ra_data_block()
       * do_recover_data()
      
      This patch can fix bug reported from bugzilla below:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203215
      https://bugzilla.kernel.org/show_bug.cgi?id=203223
      https://bugzilla.kernel.org/show_bug.cgi?id=203231
      https://bugzilla.kernel.org/show_bug.cgi?id=203235
      https://bugzilla.kernel.org/show_bug.cgi?id=203241
      
      = Update by Jaegeuk Kim =
      
      DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
      But, xfstest/generic/446 compalins some generated kernel messages saying invalid
      bitmap was detected when reading a block. The reaons is, when we get the
      block addresses from extent_cache, there is no lock to synchronize it from
      truncating the blocks in parallel.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      93770ab7
    • Chao Yu's avatar
      f2fs: fix to handle error in f2fs_disable_checkpoint() · 896285ad
      Chao Yu authored
      In f2fs_disable_checkpoint(), it needs to detect and propagate error
      number returned from f2fs_write_checkpoint().
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      896285ad
    • Chengguang Xu's avatar
      f2fs: remove redundant check in f2fs_file_write_iter() · d5d5f0c0
      Chengguang Xu authored
      We have already checked flag IOCB_DIRECT in the sanity
      check of flag IOCB_NOWAIT, so don't have to check it
      again here.
      Signed-off-by: default avatarChengguang Xu <cgxu519@gmx.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d5d5f0c0
    • Chao Yu's avatar
      f2fs: fix to be aware of readonly device in write_checkpoint() · f5a131bb
      Chao Yu authored
      As Park Ju Hyung reported:
      
      Probably unrelated but a similar issue:
      Warning appears upon unmounting a corrupted R/O f2fs loop image.
      
      Should be a trivial issue to fix as well :)
      
      [ 2373.758424] ------------[ cut here ]------------
      [ 2373.758428] generic_make_request: Trying to write to read-only
      block-device loop1 (partno 0)
      [ 2373.758455] WARNING: CPU: 1 PID: 13950 at block/blk-core.c:2174
      generic_make_request_checks+0x590/0x630
      [ 2373.758556] CPU: 1 PID: 13950 Comm: umount Tainted: G           O
         4.19.35-zen+ #1
      [ 2373.758558] Hardware name: System manufacturer System Product
      Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
      [ 2373.758564] RIP: 0010:generic_make_request_checks+0x590/0x630
      [ 2373.758567] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
      36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
      9b a7 ff <0f> 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
      16 89
      [ 2373.758570] RSP: 0018:ffff8882bdb43950 EFLAGS: 00010282
      [ 2373.758573] RAX: 0000000000000050 RBX: ffff8887244c6700 RCX: 0000000000000006
      [ 2373.758575] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
      [ 2373.758577] RBP: ffff888849c426c0 R08: 0000000000000004 R09: 00000000000003ba
      [ 2373.758579] R10: 0000000000000001 R11: 0000000000000029 R12: 0000000000001000
      [ 2373.758581] R13: 0000000000000000 R14: ffff888844a2e800 R15: ffff8882bdb43ac0
      [ 2373.758584] FS:  00007fc0d114f8c0(0000) GS:ffff88884ec40000(0000)
      knlGS:0000000000000000
      [ 2373.758586] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2373.758588] CR2: 00007fc0d1ad12c0 CR3: 00000002bdb82003 CR4: 00000000003606e0
      [ 2373.758590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 2373.758592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 2373.758593] Call Trace:
      [ 2373.758602]  ? generic_make_request+0x46/0x3d0
      [ 2373.758608]  ? wait_woken+0x80/0x80
      [ 2373.758612]  ? mempool_alloc+0xb7/0x1a0
      [ 2373.758618]  ? submit_bio+0x30/0x110
      [ 2373.758622]  ? bvec_alloc+0x7c/0xd0
      [ 2373.758628]  ? __submit_merged_bio+0x68/0x390
      [ 2373.758633]  ? f2fs_submit_page_write+0x1bb/0x7f0
      [ 2373.758638]  ? f2fs_do_write_meta_page+0x7f/0x160
      [ 2373.758642]  ? __f2fs_write_meta_page+0x70/0x140
      [ 2373.758647]  ? f2fs_sync_meta_pages+0x140/0x250
      [ 2373.758653]  ? f2fs_write_checkpoint+0x5c5/0x17b0
      [ 2373.758657]  ? f2fs_sync_fs+0x9c/0x110
      [ 2373.758664]  ? sync_filesystem+0x66/0x80
      [ 2373.758667]  ? generic_shutdown_super+0x1d/0x100
      [ 2373.758670]  ? kill_block_super+0x1c/0x40
      [ 2373.758674]  ? kill_f2fs_super+0x64/0xb0
      [ 2373.758678]  ? deactivate_locked_super+0x2d/0xb0
      [ 2373.758682]  ? cleanup_mnt+0x65/0xa0
      [ 2373.758688]  ? task_work_run+0x7f/0xa0
      [ 2373.758693]  ? exit_to_usermode_loop+0x9c/0xa0
      [ 2373.758698]  ? do_syscall_64+0xc7/0xf0
      [ 2373.758703]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 2373.758706] ---[ end trace 5d3639907c56271b ]---
      [ 2373.758780] print_req_error: I/O error, dev loop1, sector 143048
      [ 2373.758800] print_req_error: I/O error, dev loop1, sector 152200
      [ 2373.758808] print_req_error: I/O error, dev loop1, sector 8192
      [ 2373.758819] print_req_error: I/O error, dev loop1, sector 12272
      
      This patch adds to detect readonly device in write_checkpoint() to avoid
      trigger write IOs on it.
      Reported-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f5a131bb
    • Chao Yu's avatar
      f2fs: fix to skip recovery on readonly device · b61af314
      Chao Yu authored
      As Park Ju Hyung reported in mailing list:
      
      https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/
      
      generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
      WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630
      
       generic_make_request+0x46/0x3d0
       submit_bio+0x30/0x110
       __submit_merged_bio+0x68/0x390
       f2fs_submit_page_write+0x1bb/0x7f0
       f2fs_do_write_meta_page+0x7f/0x160
       __f2fs_write_meta_page+0x70/0x140
       f2fs_sync_meta_pages+0x140/0x250
       f2fs_write_checkpoint+0x5c5/0x17b0
       f2fs_sync_fs+0x9c/0x110
       sync_filesystem+0x66/0x80
       f2fs_recover_fsync_data+0x790/0xa30
       f2fs_fill_super+0xe4e/0x1980
       mount_bdev+0x518/0x610
       mount_fs+0x34/0x13f
       vfs_kern_mount.part.11+0x4f/0x120
       do_mount+0x2d1/0xe40
       __x64_sys_mount+0xbf/0xe0
       do_syscall_64+0x4a/0xf0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      print_req_error: I/O error, dev loop0, sector 4096
      
      If block device is readonly, we should never trigger write IO from
      filesystem layer, but previously, orphan and journal recovery didn't
      consider such condition, result in triggering above warning, fix it.
      Reported-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Tested-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b61af314
    • Chao Yu's avatar
      f2fs: fix to consider multiple device for readonly check · f824deb5
      Chao Yu authored
      This patch introduce f2fs_hw_is_readonly() to check whether lower
      device is readonly or not, it adapts multiple device scenario.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f824deb5
    • Chao Yu's avatar
      f2fs: relocate chksum_offset for large_nat_bitmap feature · b471eb99
      Chao Yu authored
      For large_nat_bitmap feature, there is a design flaw:
      
      Previous:
      
      struct f2fs_checkpoint layout:
      +--------------------------+  0x0000
      | checkpoint_ver           |
      | ......                   |
      | checksum_offset          |------+
      | ......                   |      |
      | sit_nat_version_bitmap[] |<-----|-------+
      | ......                   |      |       |
      | checksum_value           |<-----+       |
      +--------------------------+  0x1000      |
      |                          |      nat_bitmap + sit_bitmap
      | payload blocks           |              |
      |                          |              |
      +--------------------------|<-------------+
      
      Obviously, if nat_bitmap size + sit_bitmap size is larger than
      MAX_BITMAP_SIZE_IN_CKPT, nat_bitmap or sit_bitmap may overlap
      checkpoint checksum's position, once checkpoint() is triggered
      from kernel, nat or sit bitmap will be damaged by checksum field.
      
      In order to fix this, let's relocate checksum_value's position
      to the head of sit_nat_version_bitmap as below, then nat/sit
      bitmap and chksum value update will become safe.
      
      After:
      
      struct f2fs_checkpoint layout:
      +--------------------------+  0x0000
      | checkpoint_ver           |
      | ......                   |
      | checksum_offset          |------+
      | ......                   |      |
      | sit_nat_version_bitmap[] |<-----+
      | ......                   |<-------------+
      |                          |              |
      +--------------------------+  0x1000      |
      |                          |      nat_bitmap + sit_bitmap
      | payload blocks           |              |
      |                          |              |
      +--------------------------|<-------------+
      
      Related report and discussion:
      
      https://sourceforge.net/p/linux-f2fs/mailman/message/36642346/Reported-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b471eb99
    • Chao Yu's avatar
      f2fs: allow unfixed f2fs_checkpoint.checksum_offset · d7eb8f1c
      Chao Yu authored
      Previously, f2fs_checkpoint.checksum_offset points fixed position of
      f2fs_checkpoint structure:
      
      "#define CP_CHKSUM_OFFSET	4092"
      
      It is unnecessary, and it breaks the consecutiveness of nat and sit
      bitmap stored across checkpoint park block and payload blocks.
      
      This patch allows f2fs to handle unfixed .checksum_offset.
      
      In addition, for the case checksum value is stored in the middle of
      checkpoint park, calculating checksum value with superposition method
      like we did for inode_checksum.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d7eb8f1c
    • Youngjun Yoo's avatar
      f2fs: Replace spaces with tab · 3a912b77
      Youngjun Yoo authored
      Modify coding style
      ERROR: code indent should use tabs where possible
      Signed-off-by: default avatarYoungjun Yoo <youngjun.willow@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3a912b77
    • Youngjun Yoo's avatar
      f2fs: insert space before the open parenthesis '(' · c456362b
      Youngjun Yoo authored
      Modify coding style
      ERROR: space required before the open parenthesis '('
      Signed-off-by: default avatarYoungjun Yoo <youngjun.willow@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c456362b
    • Chao Yu's avatar
      f2fs: allow address pointer number of dnode aligning to specified size · d02a6e61
      Chao Yu authored
      This patch expands scalability of dnode layout, it allows address pointer
      number of dnode aligning to specified size (now, the size is one byte by
      default), and later the number can align to compress cluster size
      (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making
      design of compress meta layout simple.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d02a6e61
    • Chao Yu's avatar
      f2fs: introduce f2fs_read_single_page() for cleanup · 2df0ab04
      Chao Yu authored
      This patch introduces f2fs_read_single_page() to wrap core operations
      of reading one page in f2fs_mpage_readpages().
      
      In addition, if we failed in f2fs_mpage_readpages(), propagate error
      number to f2fs_read_data_page(), for f2fs_read_data_pages() path,
      always return success.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2df0ab04
    • Park Ju Hyung's avatar
      f2fs: mark is_extension_exist() inline · 5c533b19
      Park Ju Hyung authored
      The caller set_file_temperature() is marked as inline as well.
      It doesn't make much sense to leave is_extension_exist() un-inlined.
      Signed-off-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5c533b19
    • Chao Yu's avatar
      f2fs: fix to set FI_UPDATE_WRITE correctly · cd23ffa9
      Chao Yu authored
      This patch fixes to set FI_UPDATE_WRITE only if in-place IO was issued.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      cd23ffa9
    • Chao Yu's avatar
      f2fs: fix to avoid panic in f2fs_inplace_write_data() · 05573d6c
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203239
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_15.c
      ./run.sh f2fs
      sync
      
      - Kernel messages
       ------------[ cut here ]------------
       kernel BUG at fs/f2fs/segment.c:3162!
       RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
       Call Trace:
        f2fs_do_write_data_page+0x3c1/0x820
        __write_data_page+0x156/0x720
        f2fs_write_cache_pages+0x20d/0x460
        f2fs_write_data_pages+0x1b4/0x300
        do_writepages+0x15/0x60
        __filemap_fdatawrite_range+0x7c/0xb0
        file_write_and_wait_range+0x2c/0x80
        f2fs_do_sync_file+0x102/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_inplace_write_data() will trigger kernel panic due
      to data block locates in node type segment.
      
      To avoid panic, let's just return error code and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      05573d6c
    • Chao Yu's avatar
      f2fs: fix to do sanity check on valid block count of segment · e95bcdb2
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_13.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2102!
       RIP: 0010:update_sit_entry+0x394/0x410
       Call Trace:
        f2fs_allocate_data_block+0x16f/0x660
        do_write_page+0x62/0x170
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x372/0x1400
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      sit.vblocks and sum valid block count in sit.valid_map may be
      inconsistent, segment w/ zero vblocks will be treated as free
      segment, while allocating in free segment, we may allocate a
      free block, if its bitmap is valid previously, it can cause
      kernel crash due to bitmap verification failure.
      
      Anyway, to avoid further serious metadata inconsistence and
      corruption, it is necessary and worth to detect SIT
      inconsistence. So let's enable check_block_count() to verify
      vblocks and valid_map all the time rather than do it only
      CONFIG_F2FS_CHECK_FS is enabled.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e95bcdb2
    • Chao Yu's avatar
      f2fs: fix to do sanity check on valid node/block count · 7b63f72f
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203229
      
      - Overview
      When mounting the attached crafted image, following errors are reported.
      Additionally, it hangs on sync after trying to mount it.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      sync
      
      - Kernel message
       kernel BUG at fs/f2fs/recovery.c:591!
       RIP: 0010:recover_data+0x12d8/0x1780
       Call Trace:
        f2fs_recover_fsync_data+0x613/0x710
        f2fs_fill_super+0x1043/0x1aa0
        mount_bdev+0x16d/0x1a0
        mount_fs+0x4a/0x170
        vfs_kern_mount+0x5d/0x100
        do_mount+0x200/0xcf0
        ksys_mount+0x79/0xc0
        __x64_sys_mount+0x1c/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      With corrupted image wihch has out-of-range valid node/block count, during
      recovery, once we failed due to no free space, it will trigger kernel
      panic.
      
      Adding sanity check on valid node/block count in f2fs_sanity_check_ckpt()
      to detect such condition, so that potential panic can be avoided.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7b63f72f
    • Chao Yu's avatar
      f2fs: fix to avoid panic in do_recover_data() · 22d61e28
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203227
      
      - Overview
      When mounting the attached crafted image, following errors are reported.
      Additionally, it hangs on sync after trying to mount it.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/recovery.c:549!
       RIP: 0010:recover_data+0x167a/0x1780
       Call Trace:
        f2fs_recover_fsync_data+0x613/0x710
        f2fs_fill_super+0x1043/0x1aa0
        mount_bdev+0x16d/0x1a0
        mount_fs+0x4a/0x170
        vfs_kern_mount+0x5d/0x100
        do_mount+0x200/0xcf0
        ksys_mount+0x79/0xc0
        __x64_sys_mount+0x1c/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During recovery, if ofs_of_node is inconsistent in between recovered
      node page and original checkpointed node page, let's just fail recovery
      instead of making kernel panic.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      22d61e28
    • Chao Yu's avatar
      f2fs: fix to do sanity check on free nid · 626bcf2b
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203225
      
      - Overview
      When mounting the attached crafted image and unmounting it, following errors are reported.
      Additionally, it hangs on sync after unmounting.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      touch test/t
      umount test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:3073!
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
       Call Trace:
        f2fs_put_super+0xf4/0x270
        generic_shutdown_super+0x62/0x110
        kill_block_super+0x1c/0x50
        kill_f2fs_super+0xad/0xd0
        deactivate_locked_super+0x35/0x60
        cleanup_mnt+0x36/0x70
        task_work_run+0x75/0x90
        exit_to_usermode_loop+0x93/0xa0
        do_syscall_64+0xba/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
      
      NAT table is corrupted, so reserved meta/node inode ids were added into
      free list incorrectly, during file creation, since reserved id has cached
      in inode hash, so it fails the creation and preallocated nid can not be
      released later, result in kernel panic.
      
      To fix this issue, let's do nid boundary check during free nid loading.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      626bcf2b
    • Chao Yu's avatar
      f2fs: fix to do checksum even if inode page is uptodate · b42b179b
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b42b179b
    • Chao Yu's avatar
      f2fs: fix to avoid panic in f2fs_remove_inode_page() · 8b6810f8
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203219
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_06.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1183!
       RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
       Call Trace:
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_remove_inode_page() will trigger kernel panic due to
      inconsistent i_blocks value of inode.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing of potential image corruption.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8b6810f8
    • Chao Yu's avatar
      f2fs: fix to clear dirty inode in error path of f2fs_iget() · 546d22f0
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203217
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_test_05.c
      mkdir test
      mount -t f2fs tmp.img test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/inode.c:707!
       RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
       Call Trace:
        evict+0xba/0x180
        f2fs_iget+0x598/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_readlinkat+0x56/0x110
        __x64_sys_readlink+0x16/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During inode loading, __recover_inline_status() can recovery inode status
      and set inode dirty, once we failed in following process, it will fail
      the check in f2fs_evict_inode, result in trigger BUG_ON().
      
      Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      546d22f0
    • Chao Yu's avatar
      f2fs: remove new blank line of f2fs kernel message · bda52397
      Chao Yu authored
      Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bda52397
    • Chao Yu's avatar
      f2fs: fix wrong __is_meta_io() macro · 6dc3a126
      Chao Yu authored
      This patch changes codes as below:
      - don't use is_read_io() as a condition to judge the meta IO.
      - use .is_por to replace .is_meta to indicate IO is from recovery explicitly.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6dc3a126
    • Chao Yu's avatar
      f2fs: fix to avoid panic in dec_valid_node_count() · ea6d7e72
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203213
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:2012!
       RIP: 0010:truncate_node+0x2c9/0x2e0
       Call Trace:
        f2fs_truncate_xattr_node+0xa1/0x130
        f2fs_remove_inode_page+0x82/0x2d0
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_node_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ea6d7e72
    • Chao Yu's avatar
      f2fs: fix to avoid panic in dec_valid_block_count() · 5e159cd3
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203209
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_01.c
      ./run.sh f2fs
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:1788!
       RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
       Call Trace:
        f2fs_truncate_blocks+0x36d/0x3c0
        f2fs_truncate+0x88/0x110
        f2fs_setattr+0x3e1/0x460
        notify_change+0x2da/0x400
        do_truncate+0x6d/0xb0
        do_sys_ftruncate+0xf1/0x160
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_block_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5e159cd3
    • Chao Yu's avatar
      f2fs: fix to use inline space only if inline_xattr is enable · 622927f3
      Chao Yu authored
      With below mkfs and mount option:
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
      MOUNT_OPTIONS -- -o noinline_xattr
      
      We may miss xattr data with below testcase:
      - mkdir dir
      - setfattr -n "user.name" -v 0 dir
      - for ((i = 0; i < 190; i++)) do touch dir/$i; done
      - umount
      - mount
      - getfattr -n "user.name" dir
      
      user.name: No such attribute
      
      The root cause is that we persist xattr data into reserved inline xattr
      space, even if inline_xattr is not enable in inline directory inode, after
      inline dentry conversion, reserved space no longer exists, so that xattr
      data missed.
      
      Let's use inline xattr space only if inline_xattr flag is set on inode
      to fix this iusse.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      622927f3
    • Chao Yu's avatar
      f2fs: fix to retrieve inline xattr space · 45a74688
      Chao Yu authored
      With below mkfs and mount option, generic/339 of fstest will report that
      scratch image becomes corrupted.
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
      MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
      
      [ASSERT] (f2fs_check_dirent_position:1315)  --> Wrong position of dirent pino:1970, name: (...)
      level:8, dir_level:0, pgofs:951, correct range:[900, 901]
      
      In old kernel, inline data and directory always reserved 200 bytes in
      inode layout, even if inline_xattr is disabled, then new kernel tries
      to retrieve that space for non-inline xattr inode, but for inline dentry,
      its layout size should be fixed, so we just keep that reserved space.
      
      But the problem here is that, after inline dentry conversion, inline
      dentry layout no longer exists, if we still reserve inline xattr space,
      after dents updates, there will be a hole in inline xattr space, which
      can break hierarchy hash directory structure.
      
      This patch fixes this issue by retrieving inline xattr space after
      inline dentry conversion.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      45a74688
    • Chao Yu's avatar
      f2fs: fix error path of recovery · 98838579
      Chao Yu authored
      There are some places in where we missed to unlock page or unlock page
      incorrectly, fix them.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      98838579
    • Chao Yu's avatar
      f2fs: fix to avoid deadloop in foreground GC · 793ab1c8
      Chao Yu authored
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203211
      
      - Overview
      When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cd test
      touch t
      
      - Messages
      [   58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      ... (repeating)
      
      During GC, if segment type stored in SSA and SIT is inconsistent, we just
      skip migrating current segment directly, since we need to know the exact
      type to decide the migration function we use.
      
      So in foreground GC, we will easily run into a infinite loop as we may
      select the same victim segment which has inconsistent type due to greedy
      policy. In order to end up this, we choose to shutdown filesystem. For
      backgrond GC, we need to do that as well, so that we can avoid latter
      potential infinite looped foreground GC.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      793ab1c8
  2. 16 Apr, 2019 2 commits
  3. 05 Apr, 2019 5 commits
    • Chao Yu's avatar
      f2fs: add comment for conditional compilation statement · e1074d4b
      Chao Yu authored
      Commit af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      added function is_journalled_quota() in f2fs.h, but it located outside of
      _LINUX_F2FS_H macro coverage, it has been fixed with commit 0af725fc
      ("f2fs: fix wrong #endif").
      
      But anyway, in order to avoid making same mistake latter, let's add single
      line comment to notice which #if the last #endif is corresponding to.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: Remove unnecessary empty EOL]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e1074d4b
    • Chao Yu's avatar
      f2fs: fix potential recursive call when enabling data_flush · 186857c5
      Chao Yu authored
      As Hagbard Celine reported:
      
      Hi, this is a long standing bug that I've hit before on older kernels,
      but I was not able to get the syslog saved because of the nature of
      the bug. This time I had booted form a pen-drive, and was able to save
      the log to it's efi-partition.
      What i did to trigger it was to create a partition and format it f2fs,
      then mount it with options:
      "rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict".
      Then I unpacked a big .tar.xz to the partition (I used a
      gentoo-stage3-tarball as I was in process of installing Gentoo).
      
      Same options just without data_flush gives no problems.
      
      Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not
      properly unmounted. Some data may be corrupt. Please run fsck.
      Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest
      stack depth: 12064 bytes left
      Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at
      00000000a4b0733c (stack is 0000000056016422..0000000096e7463f)
      Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel: Call Trace:
      Mar 20 21:06:40 usbgentoo kernel:  read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  ? xas_load+0x8/0x50
      Mar 20 21:06:40 usbgentoo kernel:  __get_node_page+0x73/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_get_dnode_of_data+0x34e/0x580
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_inline_data+0x5e/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x421/0x690
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_cache_pages+0x1cf/0x460
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_data_pages+0x2b3/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_chksum_verify+0x1d/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  ? read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  do_writepages+0x3c/0xd0
      Mar 20 21:06:40 usbgentoo kernel:  __filemap_fdatawrite_range+0x7c/0xb0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_sync_dirty_inodes+0xf2/0x200
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs_bg+0x2a3/0x2c0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_dirtied+0x21/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs+0xd6/0x2b0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x4fb/0x690
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_single_inode+0x2a1/0x340
      Mar 20 21:06:40 usbgentoo kernel:  ? soft_cursor+0x1b4/0x220
      Mar 20 21:06:40 usbgentoo kernel:  writeback_sb_inodes+0x1d5/0x3e0
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_inodes_wb+0x58/0xa0
      Mar 20 21:06:40 usbgentoo kernel:  wb_writeback+0x250/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? 0xffffffff8c000000
      Mar 20 21:06:40 usbgentoo kernel:  ? cpumask_next+0x16/0x20
      Mar 20 21:06:40 usbgentoo kernel:  wb_workfn+0x2f6/0x3b0
      Mar 20 21:06:40 usbgentoo kernel:  ? __switch_to_asm+0x40/0x70
      Mar 20 21:06:40 usbgentoo kernel:  process_one_work+0x1f5/0x3f0
      Mar 20 21:06:40 usbgentoo kernel:  worker_thread+0x28/0x3c0
      Mar 20 21:06:40 usbgentoo kernel:  ? rescuer_thread+0x330/0x330
      Mar 20 21:06:40 usbgentoo kernel:  kthread+0x10e/0x130
      Mar 20 21:06:40 usbgentoo kernel:  ? kthread_create_on_node+0x60/0x60
      Mar 20 21:06:40 usbgentoo kernel:  ret_from_fork+0x35/0x40
      
      The root cause is that we run into an infinite recursive calling in
      between f2fs_balance_fs_bg and writepage() as described below:
      
      - f2fs_write_data_pages		--- A
       - __write_data_page
        - f2fs_balance_fs
         - f2fs_balance_fs_bg		--- B
          - f2fs_sync_dirty_inodes
           - filemap_fdatawrite
            - f2fs_write_data_pages	--- A
      ...
                - f2fs_balance_fs_bg	--- B
      ...
      
      In order to fix this issue, let's detect such condition in __write_data_page()
      and just skip calling f2fs_balance_fs() recursively.
      Reported-by: default avatarHagbard Celine <hagbardcelin@gmail.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      186857c5
    • Damien Le Moal's avatar
      f2fs: improve discard handling with multi-device volumes · 7f3d7719
      Damien Le Moal authored
      f2fs_hw_support_discard() only tests if the super block device supports
      discard. However, for a multi-device volume, not all disks used may
      support discard. Improve the check performed to test all devices of
      the volume and report discard as supported if at least one device of
      the volume supports discard. To implement this, introduce the helper
      function f2fs_bdev_support_discard(), which returns true for zoned block
      devices (where discard is processed as a zone reset) and for regular
      disks supporting the discard command.
      
      f2fs_bdev_support_discard() is also used in __queue_discard_cmd() to
      handle discard command issuing for a particular device of the volume.
      That is, prevent issuing a discard command for block devices that do
      not support it.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7f3d7719
    • Damien Le Moal's avatar
      f2fs: Reduce zoned block device memory usage · 95175daf
      Damien Le Moal authored
      For zoned block devices, an array of zone types for each device is
      allocated and initialized in order to determine if a section is stored
      on a sequential zone (zone reset needed) or a conventional zone (no
      zone reset needed and regular discard applies). Considering this usage,
      the zone types stored in memory can be replaced with a bitmap to
      indicate an equivalent information, that is, if a zone is sequential or
      not. This reduces the memory usage for each zoned device by roughly 8:
      on a 14TB disk with zones of 256 MB, the zone type array consumes
      13x4KB pages while the bitmap uses only 2x4KB pages.
      
      This patch changes the f2fs_dev_info structure blkz_type field to the
      bitmap blkz_seq. Access to this bitmap is done using the helper
      function f2fs_blkz_is_seq(), which is a rewrite of the function
      get_blkz_type().
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      95175daf
    • Damien Le Moal's avatar
      f2fs: Fix use of number of devices · 0916878d
      Damien Le Moal authored
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0916878d