1. 12 May, 2022 2 commits
    • Daeho Jeong's avatar
      f2fs: change the current atomic write way · 3db1de0e
      Daeho Jeong authored
      Current atomic write has three major issues like below.
       - keeps the updates in non-reclaimable memory space and they are even
         hard to be migrated, which is not good for contiguous memory
         allocation.
       - disk spaces used for atomic files cannot be garbage collected, so
         this makes it difficult for the filesystem to be defragmented.
       - If atomic write operations hit the threshold of either memory usage
         or garbage collection failure count, All the atomic write operations
         will fail immediately.
      
      To resolve the issues, I will keep a COW inode internally for all the
      updates to be flushed from memory, when we need to flush them out in a
      situation like high memory pressure. These COW inodes will be tagged
      as orphan inodes to be reclaimed in case of sudden power-cut or system
      failure during atomic writes.
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3db1de0e
    • Jaegeuk Kim's avatar
      f2fs: don't need inode lock for system hidden quota · 6213f5d4
      Jaegeuk Kim authored
      Let's avoid false-alarmed lockdep warning.
      
      [   58.914674] [T1501146] -> #2 (&sb->s_type->i_mutex_key#20){+.+.}-{3:3}:
      [   58.915975] [T1501146] system_server:        down_write+0x7c/0xe0
      [   58.916738] [T1501146] system_server:        f2fs_quota_sync+0x60/0x1a8
      [   58.917563] [T1501146] system_server:        block_operations+0x16c/0x43c
      [   58.918410] [T1501146] system_server:        f2fs_write_checkpoint+0x114/0x318
      [   58.919312] [T1501146] system_server:        f2fs_issue_checkpoint+0x178/0x21c
      [   58.920214] [T1501146] system_server:        f2fs_sync_fs+0x48/0x6c
      [   58.920999] [T1501146] system_server:        f2fs_do_sync_file+0x334/0x738
      [   58.921862] [T1501146] system_server:        f2fs_sync_file+0x30/0x48
      [   58.922667] [T1501146] system_server:        __arm64_sys_fsync+0x84/0xf8
      [   58.923506] [T1501146] system_server:        el0_svc_common.llvm.12821150825140585682+0xd8/0x20c
      [   58.924604] [T1501146] system_server:        do_el0_svc+0x28/0xa0
      [   58.925366] [T1501146] system_server:        el0_svc+0x24/0x38
      [   58.926094] [T1501146] system_server:        el0_sync_handler+0x88/0xec
      [   58.926920] [T1501146] system_server:        el0_sync+0x1b4/0x1c0
      
      [   58.927681] [T1501146] -> #1 (&sbi->cp_global_sem){+.+.}-{3:3}:
      [   58.928889] [T1501146] system_server:        down_write+0x7c/0xe0
      [   58.929650] [T1501146] system_server:        f2fs_write_checkpoint+0xbc/0x318
      [   58.930541] [T1501146] system_server:        f2fs_issue_checkpoint+0x178/0x21c
      [   58.931443] [T1501146] system_server:        f2fs_sync_fs+0x48/0x6c
      [   58.932226] [T1501146] system_server:        sync_filesystem+0xac/0x130
      [   58.933053] [T1501146] system_server:        generic_shutdown_super+0x38/0x150
      [   58.933958] [T1501146] system_server:        kill_block_super+0x24/0x58
      [   58.934791] [T1501146] system_server:        kill_f2fs_super+0xcc/0x124
      [   58.935618] [T1501146] system_server:        deactivate_locked_super+0x90/0x120
      [   58.936529] [T1501146] system_server:        deactivate_super+0x74/0xac
      [   58.937356] [T1501146] system_server:        cleanup_mnt+0x128/0x168
      [   58.938150] [T1501146] system_server:        __cleanup_mnt+0x18/0x28
      [   58.938944] [T1501146] system_server:        task_work_run+0xb8/0x14c
      [   58.939749] [T1501146] system_server:        do_notify_resume+0x114/0x1e8
      [   58.940595] [T1501146] system_server:        work_pending+0xc/0x5f0
      
      [   58.941375] [T1501146] -> #0 (&sbi->gc_lock){+.+.}-{3:3}:
      [   58.942519] [T1501146] system_server:        __lock_acquire+0x1270/0x2868
      [   58.943366] [T1501146] system_server:        lock_acquire+0x114/0x294
      [   58.944169] [T1501146] system_server:        down_write+0x7c/0xe0
      [   58.944930] [T1501146] system_server:        f2fs_issue_checkpoint+0x13c/0x21c
      [   58.945831] [T1501146] system_server:        f2fs_sync_fs+0x48/0x6c
      [   58.946614] [T1501146] system_server:        f2fs_do_sync_file+0x334/0x738
      [   58.947472] [T1501146] system_server:        f2fs_ioc_commit_atomic_write+0xc8/0x14c
      [   58.948439] [T1501146] system_server:        __f2fs_ioctl+0x674/0x154c
      [   58.949253] [T1501146] system_server:        f2fs_ioctl+0x54/0x88
      [   58.950018] [T1501146] system_server:        __arm64_sys_ioctl+0xa8/0x110
      [   58.950865] [T1501146] system_server:        el0_svc_common.llvm.12821150825140585682+0xd8/0x20c
      [   58.951965] [T1501146] system_server:        do_el0_svc+0x28/0xa0
      [   58.952727] [T1501146] system_server:        el0_svc+0x24/0x38
      [   58.953454] [T1501146] system_server:        el0_sync_handler+0x88/0xec
      [   58.954279] [T1501146] system_server:        el0_sync+0x1b4/0x1c0
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6213f5d4
  2. 09 May, 2022 2 commits
  3. 06 May, 2022 13 commits
    • Chao Yu's avatar
      f2fs: give priority to select unpinned section for foreground GC · 71419129
      Chao Yu authored
      Previously, during foreground GC, if victims contain data of pinned file,
      it will fail migration of the data, and meanwhile i_gc_failures of that
      pinned file may increase, and when it exceeds threshold, GC will unpin
      the file, result in breaking pinfile's semantics.
      
      In order to mitigate such condition, let's record and skip section which
      has pinned file's data and give priority to select unpinned one.
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      71419129
    • Chao Yu's avatar
      f2fs: fix to do sanity check on total_data_blocks · 6b8beca0
      Chao Yu authored
      As Yanming reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215916
      
      The kernel message is shown below:
      
      kernel BUG at fs/f2fs/segment.c:2560!
      Call Trace:
       allocate_segment_by_default+0x228/0x440
       f2fs_allocate_data_block+0x13d1/0x31f0
       do_write_page+0x18d/0x710
       f2fs_outplace_write_data+0x151/0x250
       f2fs_do_write_data_page+0xef9/0x1980
       move_data_page+0x6af/0xbc0
       do_garbage_collect+0x312f/0x46f0
       f2fs_gc+0x6b0/0x3bc0
       f2fs_balance_fs+0x921/0x2260
       f2fs_write_single_data_page+0x16be/0x2370
       f2fs_write_cache_pages+0x428/0xd00
       f2fs_write_data_pages+0x96e/0xd50
       do_writepages+0x168/0x550
       __writeback_single_inode+0x9f/0x870
       writeback_sb_inodes+0x47d/0xb20
       __writeback_inodes_wb+0xb2/0x200
       wb_writeback+0x4bd/0x660
       wb_workfn+0x5f3/0xab0
       process_one_work+0x79f/0x13e0
       worker_thread+0x89/0xf60
       kthread+0x26a/0x300
       ret_from_fork+0x22/0x30
      RIP: 0010:new_curseg+0xe8d/0x15f0
      
      The root cause is: ckpt.valid_block_count is inconsistent with SIT table,
      stat info indicates filesystem has free blocks, but SIT table indicates
      filesystem has no free segment.
      
      So that during garbage colloection, it triggers panic when LFS allocator
      fails to find free segment.
      
      This patch tries to fix this issue by checking consistency in between
      ckpt.valid_block_count and block accounted from SIT.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMing Yan <yanming@tju.edu.cn>
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6b8beca0
    • Chao Yu's avatar
      f2fs: fix deadloop in foreground GC · cfd66bb7
      Chao Yu authored
      As Yanming reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215914
      
      The root cause is: in a very small sized image, it's very easy to
      exceed threshold of foreground GC, if we calculate free space and
      dirty data based on section granularity, in corner case,
      has_not_enough_free_secs() will always return true, result in
      deadloop in f2fs_gc().
      
      So this patch refactors has_not_enough_free_secs() as below to fix
      this issue:
      1. calculate needed space based on block granularity, and separate
      all blocks to two parts, section part, and block part, comparing
      section part to free section, and comparing block part to free space
      in openned log.
      2. account F2FS_DIRTY_NODES, F2FS_DIRTY_IMETA and F2FS_DIRTY_DENTS
      as node block consumer;
      3. account F2FS_DIRTY_DENTS as data block consumer;
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMing Yan <yanming@tju.edu.cn>
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      cfd66bb7
    • Chao Yu's avatar
      f2fs: fix to do sanity check on block address in f2fs_do_zero_range() · 25f82362
      Chao Yu authored
      As Yanming reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215894
      
      I have encountered a bug in F2FS file system in kernel v5.17.
      
      I have uploaded the system call sequence as case.c, and a fuzzed image can
      be found in google net disk
      
      The kernel should enable CONFIG_KASAN=y and CONFIG_KASAN_INLINE=y. You can
      reproduce the bug by running the following commands:
      
      kernel BUG at fs/f2fs/segment.c:2291!
      Call Trace:
       f2fs_invalidate_blocks+0x193/0x2d0
       f2fs_fallocate+0x2593/0x4a70
       vfs_fallocate+0x2a5/0xac0
       ksys_fallocate+0x35/0x70
       __x64_sys_fallocate+0x8e/0xf0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The root cause is, after image was fuzzed, block mapping info in inode
      will be inconsistent with SIT table, so in f2fs_fallocate(), it will cause
      panic when updating SIT with invalid blkaddr.
      
      Let's fix the issue by adding sanity check on block address before updating
      SIT table with it.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMing Yan <yanming@tju.edu.cn>
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      25f82362
    • Chao Yu's avatar
      f2fs: fix to avoid f2fs_bug_on() in dec_valid_node_count() · 4d17e6fe
      Chao Yu authored
      As Yanming reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215897
      
      I have encountered a bug in F2FS file system in kernel v5.17.
      
      The kernel should enable CONFIG_KASAN=y and CONFIG_KASAN_INLINE=y. You can
      reproduce the bug by running the following commands:
      
      The kernel message is shown below:
      
      kernel BUG at fs/f2fs/f2fs.h:2511!
      Call Trace:
       f2fs_remove_inode_page+0x2a2/0x830
       f2fs_evict_inode+0x9b7/0x1510
       evict+0x282/0x4e0
       do_unlinkat+0x33a/0x540
       __x64_sys_unlinkat+0x8e/0xd0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The root cause is: .total_valid_block_count or .total_valid_node_count
      could fuzzed to zero, then once dec_valid_node_count() was called, it
      will cause BUG_ON(), this patch fixes to print warning info and set
      SBI_NEED_FSCK into CP instead of panic.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMing Yan <yanming@tju.edu.cn>
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4d17e6fe
    • Byungki Lee's avatar
      f2fs: write checkpoint during FG_GC · a9163b94
      Byungki Lee authored
      If there's not enough free sections each of which consistis of large segments,
      we can hit no free section for upcoming section allocation. Let's reclaim some
      prefree segments by writing checkpoints.
      Signed-off-by: default avatarByungki Lee <dominicus79@gmail.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a9163b94
    • Chao Yu's avatar
      f2fs: fix to clear dirty inode in f2fs_evict_inode() · f2db7105
      Chao Yu authored
      As Yanming reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215904
      
      The kernel message is shown below:
      
      kernel BUG at fs/f2fs/inode.c:825!
      Call Trace:
       evict+0x282/0x4e0
       __dentry_kill+0x2b2/0x4d0
       shrink_dentry_list+0x17c/0x4f0
       shrink_dcache_parent+0x143/0x1e0
       do_one_tree+0x9/0x30
       shrink_dcache_for_umount+0x51/0x120
       generic_shutdown_super+0x5c/0x3a0
       kill_block_super+0x90/0xd0
       kill_f2fs_super+0x225/0x310
       deactivate_locked_super+0x78/0xc0
       cleanup_mnt+0x2b7/0x480
       task_work_run+0xc8/0x150
       exit_to_user_mode_prepare+0x14a/0x150
       syscall_exit_to_user_mode+0x1d/0x40
       do_syscall_64+0x48/0x90
      
      The root cause is: inode node and dnode node share the same nid,
      so during f2fs_evict_inode(), dnode node truncation will invalidate
      its NAT entry, so when truncating inode node, it fails due to
      invalid NAT entry, result in inode is still marked as dirty, fix
      this issue by clearing dirty for inode and setting SBI_NEED_FSCK
      flag in filesystem.
      
      output from dump.f2fs:
      [print_node_info: 354] Node ID [0xf:15] is inode
      i_nid[0]                      		[0x       f : 15]
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMing Yan <yanming@tju.edu.cn>
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f2db7105
    • Luis Chamberlain's avatar
      f2fs: ensure only power of 2 zone sizes are allowed · 7f262f73
      Luis Chamberlain authored
      F2FS zoned support has power of 2 zone size assumption in many places
      such as in __f2fs_issue_discard_zone, init_blkz_info. As the power of 2
      requirement has been removed from the block layer, explicitly add a
      condition in f2fs to allow only power of 2 zone size devices.
      
      This condition will be relaxed once those calculation based on power of
      2 is made generic.
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarPankaj Raghav <p.raghav@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7f262f73
    • Luis Chamberlain's avatar
      f2fs: call bdev_zone_sectors() only once on init_blkz_info() · d46db459
      Luis Chamberlain authored
      Instead of calling bdev_zone_sectors() multiple times, call
      it once and cache the value locally. This will make the
      subsequent change easier to read.
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarPankaj Raghav <p.raghav@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d46db459
    • Niels Dossche's avatar
      f2fs: extend stat_lock to avoid potential race in statfs · 4de85145
      Niels Dossche authored
      There are multiple calculations and reads of fields of sbi that should
      be protected by stat_lock. As stat_lock is not used to read these
      values in statfs, this can lead to inconsistent results.
      Extend the locking to prevent this issue.
      Commit c9c8ed50 ("f2fs: fix to avoid potential race on
      sbi->unusable_block_count access/update")
      already added the use of sbi->stat_lock in statfs in
      order to make the calculation of multiple, different fields atomic so
      that results are consistent. This is similar to that patch regarding the
      change in statfs.
      Signed-off-by: default avatarNiels Dossche <dossche.niels@gmail.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4de85145
    • Jaegeuk Kim's avatar
      f2fs: avoid infinite loop to flush node pages · a7b8618a
      Jaegeuk Kim authored
      xfstests/generic/475 can give EIO all the time which give an infinite loop
      to flush node page like below. Let's avoid it.
      
      [16418.518551] Call Trace:
      [16418.518553]  ? dm_submit_bio+0x48/0x400
      [16418.518574]  ? submit_bio_checks+0x1ac/0x5a0
      [16418.525207]  __submit_bio+0x1a9/0x230
      [16418.525210]  ? kmem_cache_alloc+0x29e/0x3c0
      [16418.525223]  submit_bio_noacct+0xa8/0x2b0
      [16418.525226]  submit_bio+0x4d/0x130
      [16418.525238]  __submit_bio+0x49/0x310 [f2fs]
      [16418.525339]  ? bio_add_page+0x6a/0x90
      [16418.525344]  f2fs_submit_page_bio+0x134/0x1f0 [f2fs]
      [16418.525365]  read_node_page+0x125/0x1b0 [f2fs]
      [16418.525388]  __get_node_page.part.0+0x58/0x3f0 [f2fs]
      [16418.525409]  __get_node_page+0x2f/0x60 [f2fs]
      [16418.525431]  f2fs_get_dnode_of_data+0x423/0x860 [f2fs]
      [16418.525452]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [16418.525458]  ? __mod_memcg_state.part.0+0x2a/0x30
      [16418.525465]  ? __mod_memcg_lruvec_state+0x27/0x40
      [16418.525467]  ? __xa_set_mark+0x57/0x70
      [16418.525472]  f2fs_do_write_data_page+0x10e/0x7b0 [f2fs]
      [16418.525493]  f2fs_write_single_data_page+0x555/0x830 [f2fs]
      [16418.525514]  ? sysvec_apic_timer_interrupt+0x4e/0x90
      [16418.525518]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [16418.525523]  f2fs_write_cache_pages+0x303/0x880 [f2fs]
      [16418.525545]  ? blk_flush_plug_list+0x47/0x100
      [16418.525548]  f2fs_write_data_pages+0xfd/0x320 [f2fs]
      [16418.525569]  do_writepages+0xd5/0x210
      [16418.525648]  filemap_fdatawrite_wbc+0x7d/0xc0
      [16418.525655]  filemap_fdatawrite+0x50/0x70
      [16418.525658]  f2fs_sync_dirty_inodes+0xa4/0x230 [f2fs]
      [16418.525679]  f2fs_write_checkpoint+0x16d/0x1720 [f2fs]
      [16418.525699]  ? ttwu_do_wakeup+0x1c/0x160
      [16418.525709]  ? ttwu_do_activate+0x6d/0xd0
      [16418.525711]  ? __wait_for_common+0x11d/0x150
      [16418.525715]  kill_f2fs_super+0xca/0x100 [f2fs]
      [16418.525733]  deactivate_locked_super+0x3b/0xb0
      [16418.525739]  deactivate_super+0x40/0x50
      [16418.525741]  cleanup_mnt+0x139/0x190
      [16418.525747]  __cleanup_mnt+0x12/0x20
      [16418.525749]  task_work_run+0x6d/0xa0
      [16418.525765]  exit_to_user_mode_prepare+0x1ad/0x1b0
      [16418.525771]  syscall_exit_to_user_mode+0x27/0x50
      [16418.525774]  do_syscall_64+0x48/0xc0
      [16418.525776]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a7b8618a
    • Jaegeuk Kim's avatar
      f2fs: use flush command instead of FUA for zoned device · c550e25b
      Jaegeuk Kim authored
      The block layer for zoned disk can reorder the FUA'ed IOs. Let's use flush
      command to keep the write order.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c550e25b
    • Dongliang Mu's avatar
      f2fs: remove WARN_ON in f2fs_is_valid_blkaddr · dc2f78e2
      Dongliang Mu authored
      Syzbot triggers two WARNs in f2fs_is_valid_blkaddr and
      __is_bitmap_valid. For example, in f2fs_is_valid_blkaddr,
      if type is DATA_GENERIC_ENHANCE or DATA_GENERIC_ENHANCE_READ,
      it invokes WARN_ON if blkaddr is not in the right range.
      The call trace is as follows:
      
       f2fs_get_node_info+0x45f/0x1070
       read_node_page+0x577/0x1190
       __get_node_page.part.0+0x9e/0x10e0
       __get_node_page
       f2fs_get_node_page+0x109/0x180
       do_read_inode
       f2fs_iget+0x2a5/0x58b0
       f2fs_fill_super+0x3b39/0x7ca0
      
      Fix these two WARNs by replacing WARN_ON with dump_stack.
      
      Reported-by: syzbot+763ae12a2ede1d99d4dc@syzkaller.appspotmail.com
      Signed-off-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dc2f78e2
  4. 25 Apr, 2022 10 commits
  5. 24 Apr, 2022 8 commits
  6. 23 Apr, 2022 5 commits