• Shuqi Zhang's avatar
    f2fs: fix wrong dirty page count when race between mmap and fallocate. · 9b7eadd9
    Shuqi Zhang authored
    This is a BUG_ON issue as follows when running xfstest-generic-503:
    WARNING: CPU: 21 PID: 1385 at fs/f2fs/inode.c:762 f2fs_evict_inode+0x847/0xaa0
    Modules linked in:
    CPU: 21 PID: 1385 Comm: umount Not tainted 5.19.0-rc5+ #73
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
    
    Call Trace:
    evict+0x129/0x2d0
    dispose_list+0x4f/0xb0
    evict_inodes+0x204/0x230
    generic_shutdown_super+0x5b/0x1e0
    kill_block_super+0x29/0x80
    kill_f2fs_super+0xe6/0x140
    deactivate_locked_super+0x44/0xc0
    deactivate_super+0x79/0x90
    cleanup_mnt+0x114/0x1a0
    __cleanup_mnt+0x16/0x20
    task_work_run+0x98/0x100
    exit_to_user_mode_prepare+0x3d0/0x3e0
    syscall_exit_to_user_mode+0x12/0x30
    do_syscall_64+0x42/0x80
    entry_SYSCALL_64_after_hwframe+0x46/0xb0
    
    Function flow analysis when BUG occurs:
    f2fs_fallocate                    mmap
                                      do_page_fault
                                        pte_spinlock  // ---lock_pte
                                        do_wp_page
                                          wp_page_shared
                                            pte_unmap_unlock   // unlock_pte
                                              do_page_mkwrite
                                              f2fs_vm_page_mkwrite
                                                down_read(invalidate_lock)
                                                lock_page
                                                if (PageMappedToDisk(page))
                                                  goto out;
                                                // set_page_dirty  --NOT RUN
                                                out: up_read(invalidate_lock);
                                            finish_mkwrite_fault // unlock_pte
    f2fs_collapse_range
      down_write(i_mmap_sem)
      truncate_pagecache
        unmap_mapping_pages
          i_mmap_lock_write // down_write(i_mmap_rwsem)
            ......
            zap_pte_range
              pte_offset_map_lock // ---lock_pte
               set_page_dirty
                f2fs_dirty_data_folio
                  if (!folio_test_dirty(folio)) {
                                            fault_dirty_shared_page
                                              set_page_dirty
                                                f2fs_dirty_data_folio
                                                  if (!folio_test_dirty(folio)) {
                                                    filemap_dirty_folio
                                                    f2fs_update_dirty_folio // ++
                                                  }
                                                unlock_page
                    filemap_dirty_folio
                    f2fs_update_dirty_folio // page count++
                  }
              pte_unmap_unlock  // --unlock_pte
          i_mmap_unlock_write  // up_write(i_mmap_rwsem)
      truncate_inode_pages
      up_write(i_mmap_sem)
    
    When race happens between mmap-do_page_fault-wp_page_shared and
    fallocate-truncate_pagecache-zap_pte_range, the zap_pte_range calls
    function set_page_dirty without page lock. Besides, though
    truncate_pagecache has immap and pte lock, wp_page_shared calls
    fault_dirty_shared_page without any. In this case, two threads race
    in f2fs_dirty_data_folio function. Page is set to dirty only ONCE,
    but the count is added TWICE by calling filemap_dirty_folio.
    Thus the count of dirty page cannot accord with the real dirty pages.
    
    Following is the solution to in case of race happens without any lock.
    Since folio_test_set_dirty in filemap_dirty_folio is atomic, judge return
    value will not be at risk of race.
    Signed-off-by: default avatarShuqi Zhang <zhangshuqi3@huawei.com>
    Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
    9b7eadd9
data.c 99.9 KB