1. 18 Apr, 2023 2 commits
  2. 17 Apr, 2023 2 commits
  3. 13 Apr, 2023 11 commits
  4. 10 Apr, 2023 10 commits
    • Chao Yu's avatar
      f2fs: fix to drop all dirty pages during umount() if cp_error is set · c9b3649a
      Chao Yu authored
      xfstest generic/361 reports a bug as below:
      
      f2fs_bug_on(sbi, sbi->fsync_node_num);
      
      kernel BUG at fs/f2fs/super.c:1627!
      RIP: 0010:f2fs_put_super+0x3a8/0x3b0
      Call Trace:
       generic_shutdown_super+0x8c/0x1b0
       kill_block_super+0x2b/0x60
       kill_f2fs_super+0x87/0x110
       deactivate_locked_super+0x39/0x80
       deactivate_super+0x46/0x50
       cleanup_mnt+0x109/0x170
       __cleanup_mnt+0x16/0x20
       task_work_run+0x65/0xa0
       exit_to_user_mode_prepare+0x175/0x190
       syscall_exit_to_user_mode+0x25/0x50
       do_syscall_64+0x4c/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      During umount(), if cp_error is set, f2fs_wait_on_all_pages() should
      not stop waiting all F2FS_WB_CP_DATA pages to be writebacked, otherwise,
      fsync_node_num can be non-zero after f2fs_wait_on_all_pages() causing
      this bug.
      
      In this case, to avoid deadloop in f2fs_wait_on_all_pages(), it needs
      to drop all dirty pages rather than redirtying them.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c9b3649a
    • Chao Yu's avatar
      f2fs: fix to avoid use-after-free for cached IPU bio · 5cdb422c
      Chao Yu authored
      xfstest generic/019 reports a bug:
      
      kernel BUG at mm/filemap.c:1619!
      RIP: 0010:folio_end_writeback+0x8a/0x90
      Call Trace:
       end_page_writeback+0x1c/0x60
       f2fs_write_end_io+0x199/0x420
       bio_endio+0x104/0x180
       submit_bio_noacct+0xa5/0x510
       submit_bio+0x48/0x80
       f2fs_submit_write_bio+0x35/0x300
       f2fs_submit_merged_ipu_write+0x2a0/0x2b0
       f2fs_write_single_data_page+0x838/0x8b0
       f2fs_write_cache_pages+0x379/0xa30
       f2fs_write_data_pages+0x30c/0x340
       do_writepages+0xd8/0x1b0
       __writeback_single_inode+0x44/0x370
       writeback_sb_inodes+0x233/0x4d0
       __writeback_inodes_wb+0x56/0xf0
       wb_writeback+0x1dd/0x2d0
       wb_workfn+0x367/0x4a0
       process_one_work+0x21d/0x430
       worker_thread+0x4e/0x3c0
       kthread+0x103/0x130
       ret_from_fork+0x2c/0x50
      
      The root cause is: after cp_error is set, f2fs_submit_merged_ipu_write()
      in f2fs_write_single_data_page() tries to flush IPU bio in cache, however
      f2fs_submit_merged_ipu_write() missed to check validity of @bio parameter,
      result in submitting random cached bio which belong to other IO context,
      then it will cause use-after-free issue, fix it by adding additional
      validity check.
      
      Fixes: 0b20fcec ("f2fs: cache global IPU bio")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5cdb422c
    • Chao Yu's avatar
      f2fs: remove unneeded in-memory i_crtime copy · c277991d
      Chao Yu authored
      i_crtime will never change after inode creation, so we don't need
      to copy it into f2fs_inode_info.i_disk_time[3], and monitor its
      change to decide whether updating inode page, remove related stuff.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c277991d
    • Chao Yu's avatar
      f2fs: use f2fs_hw_is_readonly() instead of bdev_read_only() · 68f0453d
      Chao Yu authored
      f2fs has supported multi-device feature, to check devices' rw status,
      it should use f2fs_hw_is_readonly() rather than bdev_read_only(), fix
      it.
      
      Meanwhile, it removes f2fs_hw_is_readonly() check condition in:
      - f2fs_write_checkpoint()
      - f2fs_convert_inline_inode()
      As it has checked f2fs_readonly() condition, and if f2fs' devices
      were readonly, f2fs_readonly() must be true.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      68f0453d
    • Weizhao Ouyang's avatar
      f2fs: use common implementation of file type · 0c9f4521
      Weizhao Ouyang authored
      Use common implementation of file type conversion helpers.
      Signed-off-by: default avatarWeizhao Ouyang <o451686892@gmail.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0c9f4521
    • Yangtao Li's avatar
      f2fs: merge lz4hc_compress_pages() to lz4_compress_pages() · 3094e557
      Yangtao Li authored
      Remove unnecessary lz4hc_compress_pages().
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      [Jaegeuk Kim: clean up]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3094e557
    • Yangtao Li's avatar
      f2fs: convert to use sysfs_emit · 084e15ea
      Yangtao Li authored
      Let's use sysfs_emit.
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      084e15ea
    • Yangtao Li's avatar
      f2fs: set default compress option only when sb_has_compression · c2c14ca5
      Yangtao Li authored
      If the compress feature is not enabled, there is no need to set
      compress-related parameters.
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c2c14ca5
    • Yonggil Song's avatar
      f2fs: Fix system crash due to lack of free space in LFS · d11cef14
      Yonggil Song authored
      When f2fs tries to checkpoint during foreground gc in LFS mode, system
      crash occurs due to lack of free space if the amount of dirty node and
      dentry pages generated by data migration exceeds free space.
      The reproduction sequence is as follows.
      
       - 20GiB capacity block device (null_blk)
       - format and mount with LFS mode
       - create a file and write 20,000MiB
       - 4k random write on full range of the file
      
       RIP: 0010:new_curseg+0x48a/0x510 [f2fs]
       Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff
       RSP: 0018:ffff977bc397b218 EFLAGS: 00010246
       RAX: 00000000000027b9 RBX: 0000000000000000 RCX: 00000000000027c0
       RDX: 0000000000000000 RSI: 00000000000027b9 RDI: ffff8c25ab4e74f8
       RBP: ffff977bc397b268 R08: 00000000000027b9 R09: ffff8c29e4a34b40
       R10: 0000000000000001 R11: ffff977bc397b0d8 R12: 0000000000000000
       R13: ffff8c25b4dd81a0 R14: 0000000000000000 R15: ffff8c2f667f9000
       FS: 0000000000000000(0000) GS:ffff8c344ec80000(0000) knlGS:0000000000000000
       CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000c00055d000 CR3: 0000000e30810003 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
       <TASK>
       allocate_segment_by_default+0x9c/0x110 [f2fs]
       f2fs_allocate_data_block+0x243/0xa30 [f2fs]
       ? __mod_lruvec_page_state+0xa0/0x150
       do_write_page+0x80/0x160 [f2fs]
       f2fs_do_write_node_page+0x32/0x50 [f2fs]
       __write_node_page+0x339/0x730 [f2fs]
       f2fs_sync_node_pages+0x5a6/0x780 [f2fs]
       block_operations+0x257/0x340 [f2fs]
       f2fs_write_checkpoint+0x102/0x1050 [f2fs]
       f2fs_gc+0x27c/0x630 [f2fs]
       ? folio_mark_dirty+0x36/0x70
       f2fs_balance_fs+0x16f/0x180 [f2fs]
      
      This patch adds checking whether free sections are enough before checkpoint
      during gc.
      Signed-off-by: default avatarYonggil Song <yonggil.song@samsung.com>
      [Jaegeuk Kim: code clean-up]
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d11cef14
    • Yangtao Li's avatar
      f2fs: remove struct victim_selection default_v_ops · 19e0e21a
      Yangtao Li authored
      There is only single instance of these ops, and Jaegeuk point out that:
      
          Originally this was intended to give a chance to provide other
          allocation option. Anyway, it seems quit hard to do it anymore.
      
      So remove the indirection and call f2fs_get_victim() directly.
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      19e0e21a
  5. 04 Apr, 2023 5 commits
  6. 29 Mar, 2023 10 commits
    • Jaegeuk Kim's avatar
      f2fs: fix scheduling while atomic in decompression path · 1aa161e4
      Jaegeuk Kim authored
      [   16.945668][    C0] Call trace:
      [   16.945678][    C0]  dump_backtrace+0x110/0x204
      [   16.945706][    C0]  dump_stack_lvl+0x84/0xbc
      [   16.945735][    C0]  __schedule_bug+0xb8/0x1ac
      [   16.945756][    C0]  __schedule+0x724/0xbdc
      [   16.945778][    C0]  schedule+0x154/0x258
      [   16.945793][    C0]  bit_wait_io+0x48/0xa4
      [   16.945808][    C0]  out_of_line_wait_on_bit+0x114/0x198
      [   16.945824][    C0]  __sync_dirty_buffer+0x1f8/0x2e8
      [   16.945853][    C0]  __f2fs_commit_super+0x140/0x1f4
      [   16.945881][    C0]  f2fs_commit_super+0x110/0x28c
      [   16.945898][    C0]  f2fs_handle_error+0x1f4/0x2f4
      [   16.945917][    C0]  f2fs_decompress_cluster+0xc4/0x450
      [   16.945942][    C0]  f2fs_end_read_compressed_page+0xc0/0xfc
      [   16.945959][    C0]  f2fs_handle_step_decompress+0x118/0x1cc
      [   16.945978][    C0]  f2fs_read_end_io+0x168/0x2b0
      [   16.945993][    C0]  bio_endio+0x25c/0x2c8
      [   16.946015][    C0]  dm_io_dec_pending+0x3e8/0x57c
      [   16.946052][    C0]  clone_endio+0x134/0x254
      [   16.946069][    C0]  bio_endio+0x25c/0x2c8
      [   16.946084][    C0]  blk_update_request+0x1d4/0x478
      [   16.946103][    C0]  scsi_end_request+0x38/0x4cc
      [   16.946129][    C0]  scsi_io_completion+0x94/0x184
      [   16.946147][    C0]  scsi_finish_command+0xe8/0x154
      [   16.946164][    C0]  scsi_complete+0x90/0x1d8
      [   16.946181][    C0]  blk_done_softirq+0xa4/0x11c
      [   16.946198][    C0]  _stext+0x184/0x614
      [   16.946214][    C0]  __irq_exit_rcu+0x78/0x144
      [   16.946234][    C0]  handle_domain_irq+0xd4/0x154
      [   16.946260][    C0]  gic_handle_irq.33881+0x5c/0x27c
      [   16.946281][    C0]  call_on_irq_stack+0x40/0x70
      [   16.946298][    C0]  do_interrupt_handler+0x48/0xa4
      [   16.946313][    C0]  el1_interrupt+0x38/0x68
      [   16.946346][    C0]  el1h_64_irq_handler+0x20/0x30
      [   16.946362][    C0]  el1h_64_irq+0x78/0x7c
      [   16.946377][    C0]  finish_task_switch+0xc8/0x3d8
      [   16.946394][    C0]  __schedule+0x600/0xbdc
      [   16.946408][    C0]  preempt_schedule_common+0x34/0x5c
      [   16.946423][    C0]  preempt_schedule+0x44/0x48
      [   16.946438][    C0]  process_one_work+0x30c/0x550
      [   16.946456][    C0]  worker_thread+0x414/0x8bc
      [   16.946472][    C0]  kthread+0x16c/0x1e0
      [   16.946486][    C0]  ret_from_fork+0x10/0x20
      
      Fixes: bff139b4 ("f2fs: handle decompress only post processing in softirq")
      Fixes: 95fa90c9 ("f2fs: support recording errors into superblock")
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1aa161e4
    • Hans Holmberg's avatar
      f2fs: preserve direct write semantics when buffering is forced · 92318f20
      Hans Holmberg authored
      In some cases, e.g. for zoned block devices, direct writes are
      forced into buffered writes that will populate the page cache
      and be written out just like buffered io.
      
      Direct reads, on the other hand, is supported for the zoned
      block device case. This has the effect that applications
      built for direct io will fill up the page cache with data
      that will never be read, and that is a waste of resources.
      
      If we agree that this is a problem, how do we fix it?
      
      A) Supporting proper direct writes for zoned block devices would
      be the best, but it is currently not supported (probably for
      a good but non-obvious reason). Would it be feasible to
      implement proper direct IO?
      
      B) Avoid the cost of keeping unwanted data by syncing and throwing
      out the cached pages for buffered O_DIRECT writes before completion.
      
      This patch implements B) by reusing the code for how partial
      block writes are flushed out on the "normal" direct write path.
      
      Note that this changes the performance characteristics of f2fs
      quite a bit.
      
      Direct IO performance for zoned block devices is lower for
      small writes after this patch, but this should be expected
      with direct IO and in line with how f2fs behaves on top of
      conventional block devices.
      
      Another open question is if the flushing should be done for
      all cases where buffered writes are forced.
      Signed-off-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      Reviewed-by: default avatarYonggil Song <yonggil.song@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      92318f20
    • Yangtao Li's avatar
      f2fs: compress: fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages() · babedcba
      Yangtao Li authored
      BUG_ON() will be triggered when writing files concurrently,
      because the same page is writtenback multiple times.
      
      1597 void folio_end_writeback(struct folio *folio)
      1598 {
      		......
      1618     if (!__folio_end_writeback(folio))
      1619         BUG();
      		......
      1625 }
      
      kernel BUG at mm/filemap.c:1619!
      Call Trace:
       <TASK>
       f2fs_write_end_io+0x1a0/0x370
       blk_update_request+0x6c/0x410
       blk_mq_end_request+0x15/0x130
       blk_complete_reqs+0x3c/0x50
       __do_softirq+0xb8/0x29b
       ? sort_range+0x20/0x20
       run_ksoftirqd+0x19/0x20
       smpboot_thread_fn+0x10b/0x1d0
       kthread+0xde/0x110
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x22/0x30
       </TASK>
      
      Below is the concurrency scenario:
      
      [Process A]		[Process B]		[Process C]
      f2fs_write_raw_pages()
        - redirty_page_for_writepage()
        - unlock page()
      			f2fs_do_write_data_page()
      			  - lock_page()
      			  - clear_page_dirty_for_io()
      			  - set_page_writeback() [1st writeback]
      			    .....
      			    - unlock page()
      
      						generic_perform_write()
      						  - f2fs_write_begin()
      						    - wait_for_stable_page()
      
      						  - f2fs_write_end()
      						    - set_page_dirty()
      
        - lock_page()
          - f2fs_do_write_data_page()
            - set_page_writeback() [2st writeback]
      
      This problem was introduced by the previous commit 7377e853 ("f2fs:
      compress: fix potential deadlock of compress file"). All pagelocks were
      released in f2fs_write_raw_pages(), but whether the page was
      in the writeback state was ignored in the subsequent writing process.
      Let's fix it by waiting for the page to writeback before writing.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Fixes: 4c8ff709 ("f2fs: support data compression")
      Fixes: 7377e853 ("f2fs: compress: fix potential deadlock of compress file")
      Signed-off-by: default avatarQi Han <hanqi@vivo.com>
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      babedcba
    • Yangtao Li's avatar
      f2fs: remove else in f2fs_write_cache_pages() · c948be79
      Yangtao Li authored
      As Christoph Hellwig point out:
      
      	Please avoid the else by doing the goto in the branch.
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c948be79
    • Jaegeuk Kim's avatar
      f2fs: apply zone capacity to all zone type · 0b37ed21
      Jaegeuk Kim authored
      If we manage the zone capacity per zone type, it'll break the GC assumption.
      And, the current logic complains valid block count mismatch.
      Let's apply zone capacity to all zone type, if specified.
      
      Fixes: de881df9 ("f2fs: support zone capacity less than zone size")
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0b37ed21
    • Yangtao Li's avatar
      f2fs: fix to handle filemap_fdatawrite() error in f2fs_ioc_decompress_file/f2fs_ioc_compress_file · b822dc91
      Yangtao Li authored
      It seems inappropriate that the current logic does not handle
      filemap_fdatawrite() errors, so let's fix it.
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b822dc91
    • Yangtao Li's avatar
      f2fs: convert to MAX_SBI_FLAG instead of 32 in stat_show() · 5bb9c111
      Yangtao Li authored
      BIW reduce the s_flag array size and make s_flag constant.
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5bb9c111
    • Yonggil Song's avatar
      f2fs: Fix discard bug on zoned block devices with 2MiB zone size · 6797ebc4
      Yonggil Song authored
      When using f2fs on a zoned block device with 2MiB zone size, IO errors
      occurs because f2fs tries to write data to a zone that has not been reset.
      
      The cause is that f2fs tries to discard multiple zones at once. This is
      caused by a condition in f2fs_clear_prefree_segments that does not check
      for zoned block devices when setting the discard range. This leads to
      invalid reset commands and write pointer mismatches.
      
      This patch fixes the zoned block device with 2MiB zone size to reset one
      zone at a time.
      Signed-off-by: default avatarYonggil Song <yonggil.song@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6797ebc4
    • Jaegeuk Kim's avatar
      f2fs: remove entire rb_entry sharing · bf21acf9
      Jaegeuk Kim authored
      This is a last part to remove the memory sharing for rb_tree in extent_cache.
      
      This should also fix arm32 memory alignment issue.
      
      [struct extent_node]               [struct rb_entry]
      [0] struct rb_node rb_node;        [0] struct rb_node rb_node;
        union {                              union {
          struct {                             struct {
      [16]  unsigned int fofs;           [12]    unsigned int ofs;
            unsigned int len;                    unsigned int len;
                                               };
                                               unsigned long long key;
                                             } __packed;
      
      Cc: <stable@vger.kernel.org>
      Fixes: 13054c54 ("f2fs: introduce infra macro and data structure of rb-tree extent cache")
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bf21acf9
    • Jaegeuk Kim's avatar
      f2fs: factor out discard_cmd usage from general rb_tree use · f69475dd
      Jaegeuk Kim authored
      This is a second part to remove the mixed use of rb_tree in discard_cmd from
      extent_cache.
      
      This should also fix arm32 memory alignment issue caused by shared rb_entry.
      
      [struct discard_cmd]               [struct rb_entry]
      [0] struct rb_node rb_node;        [0] struct rb_node rb_node;
        union {                              union {
          struct {                             struct {
      [16]  block_t lstart;              [12]    unsigned int ofs;
            block_t len;                         unsigned int len;
                                               };
                                               unsigned long long key;
                                             } __packed;
      
      Cc: <stable@vger.kernel.org>
      Fixes: 004b6862 ("f2fs: use rb-tree to track pending discard commands")
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f69475dd