1. 04 Oct, 2022 12 commits
    • Zhang Qilong's avatar
      f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file · a834aa3e
      Zhang Qilong authored
      The trace_f2fs_update_extent_tree_range could not record compressed
      block length in the cluster of compress file and we just add it.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a834aa3e
    • Chao Yu's avatar
      f2fs: fix to do sanity check on summary info · c6ad7fd1
      Chao Yu authored
      As Wenqing Liu reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=216456
      
      BUG: KASAN: use-after-free in recover_data+0x63ae/0x6ae0 [f2fs]
      Read of size 4 at addr ffff8881464dcd80 by task mount/1013
      
      CPU: 3 PID: 1013 Comm: mount Tainted: G        W          6.0.0-rc4 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      Call Trace:
       dump_stack_lvl+0x45/0x5e
       print_report.cold+0xf3/0x68d
       kasan_report+0xa8/0x130
       recover_data+0x63ae/0x6ae0 [f2fs]
       f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
       f2fs_fill_super+0x4665/0x61e0 [f2fs]
       mount_bdev+0x2cf/0x3b0
       legacy_get_tree+0xed/0x1d0
       vfs_get_tree+0x81/0x2b0
       path_mount+0x47e/0x19d0
       do_mount+0xce/0xf0
       __x64_sys_mount+0x12c/0x1a0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The root cause is: in fuzzed image, SSA table is corrupted: ofs_in_node
      is larger than ADDRS_PER_PAGE(), result in out-of-range access on 4k-size
      page.
      
      - recover_data
       - do_recover_data
        - check_index_in_prev_nodes
         - f2fs_data_blkaddr
      
      This patch adds sanity check on summary info in recovery and GC flow
      in where the flows rely on them.
      
      After patch:
      [   29.310883] F2FS-fs (loop0): Inconsistent ofs_in_node:65286 in summary, ino:0, nid:6, max:1018
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarWenqing Liu <wenqingliu0120@gmail.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c6ad7fd1
    • Christian Brauner's avatar
      f2fs: port to vfs{g,u}id_t and associated helpers · 1e8a9191
      Christian Brauner authored
      A while ago we introduced a dedicated vfs{g,u}id_t type in commit
      1e5267cd ("mnt_idmapping: add vfs{g,u}id_t"). We already switched
      over a good part of the VFS. Ultimately we will remove all legacy
      idmapped mount helpers that operate only on k{g,u}id_t in favor of the
      new type safe helpers that operate on vfs{g,u}id_t.
      
      Cc: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Chao Yu <chao@kernel.org>
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1e8a9191
    • Chao Yu's avatar
      f2fs: fix to do sanity check on destination blkaddr during recovery · 0ef4ca04
      Chao Yu authored
      As Wenqing Liu reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=216456
      
      loop5: detected capacity change from 0 to 131072
      F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
      F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
      F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
      F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
      F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
      F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
      F2FS-fs (loop5): Bitmap was wrongly set, blk:5634
      ------------[ cut here ]------------
      WARNING: CPU: 3 PID: 1013 at fs/f2fs/segment.c:2198
      RIP: 0010:update_sit_entry+0xa55/0x10b0 [f2fs]
      Call Trace:
       <TASK>
       f2fs_do_replace_block+0xa98/0x1890 [f2fs]
       f2fs_replace_block+0xeb/0x180 [f2fs]
       recover_data+0x1a69/0x6ae0 [f2fs]
       f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
       f2fs_fill_super+0x4665/0x61e0 [f2fs]
       mount_bdev+0x2cf/0x3b0
       legacy_get_tree+0xed/0x1d0
       vfs_get_tree+0x81/0x2b0
       path_mount+0x47e/0x19d0
       do_mount+0xce/0xf0
       __x64_sys_mount+0x12c/0x1a0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      If we enable CONFIG_F2FS_CHECK_FS config, it will trigger a kernel panic
      instead of warning.
      
      The root cause is: in fuzzed image, SIT table is inconsistent with inode
      mapping table, result in triggering such warning during SIT table update.
      
      This patch introduces a new flag DATA_GENERIC_ENHANCE_UPDATE, w/ this
      flag, data block recovery flow can check destination blkaddr's validation
      in SIT table, and skip f2fs_replace_block() to avoid inconsistent status.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarWenqing Liu <wenqingliu0120@gmail.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0ef4ca04
    • Weichao Guo's avatar
      f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT · f3b23c78
      Weichao Guo authored
      Cold files may be fragmented due to SSR, defragment is needed as
      sequential reads are dominant scenarios of these files. FI_OPU_WRITE
      should override FADVISE_COLD_BIT to avoid defragment fails.
      Signed-off-by: default avatarWeichao Guo <guoweichao@oppo.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f3b23c78
    • Zhang Qilong's avatar
      f2fs: fix race condition on setting FI_NO_EXTENT flag · 07725adc
      Zhang Qilong authored
      The following scenarios exist.
      process A:               process B:
      ->f2fs_drop_extent_tree  ->f2fs_update_extent_cache_range
                                ->f2fs_update_extent_tree_range
                                 ->write_lock
       ->set_inode_flag
                                 ->is_inode_flag_set
                                 ->__free_extent_tree // Shouldn't
                                                      // have been
                                                      // cleaned up
                                                      // here
        ->write_lock
      
      In this case, the "FI_NO_EXTENT" flag is set between
      f2fs_update_extent_tree_range and is_inode_flag_set
      by other process. it leads to clearing the whole exten
      tree which should not have happened. And we fix it by
      move the setting it to the range of write_lock.
      
      Fixes:5f281fab ("f2fs: disable extent_cache for fcollapse/finsert inodes")
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      07725adc
    • Zhang Qilong's avatar
      f2fs: remove redundant check in f2fs_sanity_check_cluster · 9df6d6f9
      Zhang Qilong authored
      It have checked "compressed" at the entry of
      f2fs_sanity_check_cluster, just remove the
      redundant check for better performance here.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9df6d6f9
    • Zhang Qilong's avatar
      f2fs: add static init_idisk_time function to reduce the code · 049ea86c
      Zhang Qilong authored
      We can use a inner function to init the disk time
      of f2fs_inode_info for cleaning redundant code.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      049ea86c
    • Yonggil Song's avatar
      f2fs: fix typo · d382e369
      Yonggil Song authored
      Fix typo in f2fs.h
      Detected by Jaeyoon Choi
      Signed-off-by: default avatarYonggil Song <yonggil.song@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d382e369
    • Shuqi Zhang's avatar
      f2fs: fix wrong dirty page count when race between mmap and fallocate. · 9b7eadd9
      Shuqi Zhang authored
      This is a BUG_ON issue as follows when running xfstest-generic-503:
      WARNING: CPU: 21 PID: 1385 at fs/f2fs/inode.c:762 f2fs_evict_inode+0x847/0xaa0
      Modules linked in:
      CPU: 21 PID: 1385 Comm: umount Not tainted 5.19.0-rc5+ #73
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
      
      Call Trace:
      evict+0x129/0x2d0
      dispose_list+0x4f/0xb0
      evict_inodes+0x204/0x230
      generic_shutdown_super+0x5b/0x1e0
      kill_block_super+0x29/0x80
      kill_f2fs_super+0xe6/0x140
      deactivate_locked_super+0x44/0xc0
      deactivate_super+0x79/0x90
      cleanup_mnt+0x114/0x1a0
      __cleanup_mnt+0x16/0x20
      task_work_run+0x98/0x100
      exit_to_user_mode_prepare+0x3d0/0x3e0
      syscall_exit_to_user_mode+0x12/0x30
      do_syscall_64+0x42/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Function flow analysis when BUG occurs:
      f2fs_fallocate                    mmap
                                        do_page_fault
                                          pte_spinlock  // ---lock_pte
                                          do_wp_page
                                            wp_page_shared
                                              pte_unmap_unlock   // unlock_pte
                                                do_page_mkwrite
                                                f2fs_vm_page_mkwrite
                                                  down_read(invalidate_lock)
                                                  lock_page
                                                  if (PageMappedToDisk(page))
                                                    goto out;
                                                  // set_page_dirty  --NOT RUN
                                                  out: up_read(invalidate_lock);
                                              finish_mkwrite_fault // unlock_pte
      f2fs_collapse_range
        down_write(i_mmap_sem)
        truncate_pagecache
          unmap_mapping_pages
            i_mmap_lock_write // down_write(i_mmap_rwsem)
              ......
              zap_pte_range
                pte_offset_map_lock // ---lock_pte
                 set_page_dirty
                  f2fs_dirty_data_folio
                    if (!folio_test_dirty(folio)) {
                                              fault_dirty_shared_page
                                                set_page_dirty
                                                  f2fs_dirty_data_folio
                                                    if (!folio_test_dirty(folio)) {
                                                      filemap_dirty_folio
                                                      f2fs_update_dirty_folio // ++
                                                    }
                                                  unlock_page
                      filemap_dirty_folio
                      f2fs_update_dirty_folio // page count++
                    }
                pte_unmap_unlock  // --unlock_pte
            i_mmap_unlock_write  // up_write(i_mmap_rwsem)
        truncate_inode_pages
        up_write(i_mmap_sem)
      
      When race happens between mmap-do_page_fault-wp_page_shared and
      fallocate-truncate_pagecache-zap_pte_range, the zap_pte_range calls
      function set_page_dirty without page lock. Besides, though
      truncate_pagecache has immap and pte lock, wp_page_shared calls
      fault_dirty_shared_page without any. In this case, two threads race
      in f2fs_dirty_data_folio function. Page is set to dirty only ONCE,
      but the count is added TWICE by calling filemap_dirty_folio.
      Thus the count of dirty page cannot accord with the real dirty pages.
      
      Following is the solution to in case of race happens without any lock.
      Since folio_test_set_dirty in filemap_dirty_folio is atomic, judge return
      value will not be at risk of race.
      Signed-off-by: default avatarShuqi Zhang <zhangshuqi3@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9b7eadd9
    • Zhang Qilong's avatar
      f2fs: use COMPRESS_MAPPING to get compress cache mapping · 173cdf2c
      Zhang Qilong authored
      Just use the defined COMPRESS_MAPPING to get compress cache
      mapping instaed of direct accessing name.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      173cdf2c
    • Zhang Qilong's avatar
      f2fs: return the tmp_ptr directly in __bitmap_ptr · 280dfeae
      Zhang Qilong authored
      Just return tmp_ptr here, it's no need to dereference
      checkpoint pointer again.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      280dfeae
  2. 13 Sep, 2022 5 commits
  3. 30 Aug, 2022 5 commits
  4. 29 Aug, 2022 2 commits
  5. 28 Aug, 2022 16 commits