1. 06 Sep, 2019 6 commits
    • Jaegeuk Kim's avatar
      f2fs: fix flushing node pages when checkpoint is disabled · 100c0655
      Jaegeuk Kim authored
      This patch fixes skipping node page writes when checkpoint is disabled.
      In this period, we can't rely on checkpoint to flush node pages.
      
      Fixes: fd8c8caf ("f2fs: let checkpoint flush dnode page of regular")
      Fixes: 4354994f ("f2fs: checkpoint disabling")
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      100c0655
    • Chao Yu's avatar
      f2fs: enhance f2fs_is_checkpoint_ready()'s readability · 00e09c0b
      Chao Yu authored
      This patch changes sematics of f2fs_is_checkpoint_ready()'s return
      value as: return true when checkpoint is ready, other return false,
      it can improve readability of below conditions.
      
      f2fs_submit_page_write()
      ...
      	if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) ||
      				!f2fs_is_checkpoint_ready(sbi))
      		__submit_merged_bio(io);
      
      f2fs_balance_fs()
      ...
      	if (!f2fs_is_checkpoint_ready(sbi))
      		return;
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      00e09c0b
    • Chao Yu's avatar
      f2fs: clean up __bio_alloc()'s parameter · b757f6ed
      Chao Yu authored
      Just cleanup, no logic change.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b757f6ed
    • Chao Yu's avatar
      f2fs: fix wrong error injection path in inc_valid_block_count() · 9ea2f0be
      Chao Yu authored
      If FAULT_BLOCK type error injection is on, in inc_valid_block_count()
      we may decrease sbi->alloc_valid_block_count percpu stat count
      incorrectly, fix it.
      
      Fixes: 36b877af ("f2fs: Keep alloc_valid_block_count in sync")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9ea2f0be
    • Chao Yu's avatar
      f2fs: fix to writeout dirty inode during node flush · 052a82d8
      Chao Yu authored
      As Eric reported:
      
      On xfstest generic/204 on f2fs, I'm getting a kernel BUG.
      
       allocate_segment_by_default+0x9d/0x100 [f2fs]
       f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
       do_write_page+0x62/0x110 [f2fs]
       f2fs_do_write_node_page+0x2b/0xa0 [f2fs]
       __write_node_page+0x2ec/0x590 [f2fs]
       f2fs_sync_node_pages+0x756/0x7e0 [f2fs]
       block_operations+0x25b/0x350 [f2fs]
       f2fs_write_checkpoint+0x104/0x1150 [f2fs]
       f2fs_sync_fs+0xa2/0x120 [f2fs]
       f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
       f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
       do_writepages+0x1c/0x70
       __writeback_single_inode+0x45/0x320
       writeback_sb_inodes+0x273/0x5c0
       wb_writeback+0xff/0x2e0
       wb_workfn+0xa1/0x370
       process_one_work+0x138/0x350
       worker_thread+0x4d/0x3d0
       kthread+0x109/0x140
      
      The root cause of this issue is, in a very small partition, e.g.
      in generic/204 testcase of fstest suit, filesystem's free space
      is 50MB, so at most we can write 12800 inline inode with command:
      `echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i`,
      then filesystem will have:
      - 12800 dirty inline data page
      - 12800 dirty inode page
      - and 12800 dirty imeta (dirty inode)
      
      When we flush node-inode's page cache, we can also flush inline
      data with each inode page, however it will run out-of-free-space
      in device, then once it triggers checkpoint, there is no room for
      huge number of imeta, at this time, GC is useless, as there is no
      dirty segment at all.
      
      In order to fix this, we try to recognize inode page during
      node_inode's page flushing, and update inode page from dirty inode,
      so that later another imeta (dirty inode) flush can be avoided.
      Reported-and-tested-by: default avatarEric Biggers <ebiggers@kernel.org>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      052a82d8
    • Chao Yu's avatar
      f2fs: optimize case-insensitive lookups · 950d47f2
      Chao Yu authored
      This patch ports below casefold enhancement patch from ext4 to f2fs
      
      commit 3ae72562 ("ext4: optimize case-insensitive lookups")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      950d47f2
  2. 23 Aug, 2019 30 commits
    • Chao Yu's avatar
      f2fs: introduce f2fs_match_name() for cleanup · fe76a166
      Chao Yu authored
      This patch introduces f2fs_match_name() for cleanup.
      
      BTW, it avoids to fallback to normal comparison once it doesn't
      match casefolded name.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fe76a166
    • Sahitya Tummala's avatar
      f2fs: Fix indefinite loop in f2fs_gc() · bbf9f7d9
      Sahitya Tummala authored
      Policy - Foreground GC, LFS and greedy GC mode.
      
      Under this policy, f2fs_gc() loops forever to GC as it doesn't have
      enough free segements to proceed and thus it keeps calling gc_more
      for the same victim segment.  This can happen if the selected victim
      segment could not be GC'd due to failed blkaddr validity check i.e.
      is_alive() returns false for the blocks set in current validity map.
      
      Fix this by keeping track of such invalid segments and skip those
      segments for selection in get_victim_by_default() to avoid endless
      GC loop under such error scenarios. Currently, add this logic under
      CONFIG_F2FS_CHECK_FS to be able to root cause the issue in debug
      version.
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix wrong bitmap size]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bbf9f7d9
    • Chao Yu's avatar
      f2fs: allocate memory in batch in build_sit_info() · 2fde3dd1
      Chao Yu authored
      build_sit_info() allocate all bitmaps for each segment one by one,
      it's quite low efficiency, this pach changes to allocate large
      continuous memory at a time, and divide it and assign for each bitmaps
      of segment. For large size image, it can expect improving its mount
      speed.
      Signed-off-by: default avatarChen Gong <gongchen4@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2fde3dd1
    • Chao Yu's avatar
      f2fs: support FS_IOC_{GET,SET}FSLABEL · 4507847c
      Chao Yu authored
      Support two generic fs ioctls FS_IOC_{GET,SET}FSLABEL, letting
      f2fs pass generic/492 testcase.
      
      Fixes were made by Eric where:
       - f2fs: fix buffer overruns in FS_IOC_{GET, SET}FSLABEL
         utf16s_to_utf8s() and utf8s_to_utf16s() take the number of characters,
         not the number of bytes.
      
       - f2fs: fix copying too many bytes in FS_IOC_SETFSLABEL
         Userspace provides a null-terminated string, so don't assume that the
         full FSLABEL_MAX bytes can always be copied.
      
       - f2fs: add missing authorization check in FS_IOC_SETFSLABEL
         FS_IOC_SETFSLABEL modifies the filesystem superblock, so it shouldn't be
         allowed to regular users.  Require CAP_SYS_ADMIN, like xfs and btrfs do.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4507847c
    • Chao Yu's avatar
      f2fs: fix to avoid data corruption by forbidding SSR overwrite · 899fee36
      Chao Yu authored
      There is one case can cause data corruption.
      
      - write 4k to fileA
      - fsync fileA, 4k data is writebacked to lbaA
      - write 4k to fileA
      - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
      - write 4k to fileB
      - kworker flush 4k to lbaA due to SSR
      - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
      data
      
      One solution is tracking all fsynced file's block history, and disallow
      SSR overwrite on newly invalidated block on that file.
      
      However, during recovery, no matter the dnode is flushed or fsynced, all
      previous dnodes until last fsynced one in node chain can be recovered,
      that means we need to record all block change in flushed dnode, which
      will cause heavy cost, so let's just use simple fix by forbidding SSR
      overwrite directly.
      
      Fixes: 5b6c6be2 ("f2fs: use SSR for warm node as well")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      899fee36
    • YueHaibing's avatar
      f2fs: Fix build error while CONFIG_NLS=m · aabc172b
      YueHaibing authored
      If CONFIG_F2FS_FS=y but CONFIG_NLS=m, building fails:
      
      fs/f2fs/file.o: In function `f2fs_ioctl':
      file.c:(.text+0xb86f): undefined reference to `utf16s_to_utf8s'
      file.c:(.text+0xe651): undefined reference to `utf8s_to_utf16s'
      
      Select CONFIG_NLS to fix this.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 61a3da4d5ef8 ("f2fs: support FS_IOC_{GET,SET}FSLABEL")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      aabc172b
    • Chao Yu's avatar
      Revert "f2fs: avoid out-of-range memory access" · a37d0862
      Chao Yu authored
      As Pavel Machek reported:
      
      "We normally use -EUCLEAN to signal filesystem corruption. Plus, it is
      good idea to report it to the syslog and mark filesystem as "needing
      fsck" if filesystem can do that."
      
      Still we need improve the original patch with:
      - use unlikely keyword
      - add message print
      - return EUCLEAN
      
      However, after rethink this patch, I don't think we should add such
      condition check here as below reasons:
      - We have already checked the field in f2fs_sanity_check_ckpt(),
      - If there is fs corrupt or security vulnerability, there is nothing
      to guarantee the field is integrated after the check, unless we do
      the check before each of its use, however no filesystem does that.
      - We only have similar check for bitmap, which was added due to there
      is bitmap corruption happened on f2fs' runtime in product.
      - There are so many key fields in SB/CP/NAT did have such check
      after f2fs_sanity_check_{sb,cp,..}.
      
      So I propose to revert this unneeded check.
      
      This reverts commit 56f3ce67.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a37d0862
    • Lihong Kou's avatar
      f2fs: cleanup the code in build_sit_entries. · 290c30d4
      Lihong Kou authored
      We do not need to set the SBI_NEED_FSCK flag in the error paths, if we
      return error here, we will not update the checkpoint flag, so the code
      is useless, just remove it.
      Signed-off-by: default avatarLihong Kou <koulihong@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      290c30d4
    • Chao Yu's avatar
      f2fs: fix wrong available node count calculation · 27cae0bc
      Chao Yu authored
      In mkfs, we have counted quota file's node number in cp.valid_node_count,
      so we have to avoid wrong substraction of quota node number in
      .available_nid/.avail_node_count calculation.
      
      f2fs_write_check_point_pack()
      {
      ..
      	set_cp(valid_node_count, 1 + c.quota_inum + c.lpf_inum);
      
      Fixes: 292c196a ("f2fs: reserve nid resource for quota sysfile")
      Fixes: 7b63f72f ("f2fs: fix to do sanity check on valid node/block count")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      27cae0bc
    • Lihong Kou's avatar
      f2fs: remove duplicate code in f2fs_file_write_iter · 0b86f789
      Lihong Kou authored
      We will do the same check in generic_write_checks.
      if (iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)
              return -EINVAL;
      just remove the same check in f2fs_file_write_iter.
      Signed-off-by: default avatarLihong Kou <koulihong@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0b86f789
    • Chao Yu's avatar
      f2fs: fix to migrate blocks correctly during defragment · d3a1a0e1
      Chao Yu authored
      During defragment, we missed to trigger fragmented blocks migration
      for below condition:
      
      In defragment region:
      - total number of valid blocks is smaller than 512;
      - the tail part of the region are all holes;
      
      In addtion, return zero to user via range->len if there is no
      fragmented blocks.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d3a1a0e1
    • Chao Yu's avatar
      f2fs: use wrapped f2fs_cp_error() · 33ac18a1
      Chao Yu authored
      Just cleanup, no logic change.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      33ac18a1
    • Chao Yu's avatar
      f2fs: fix to use more generic EOPNOTSUPP · fd114ab2
      Chao Yu authored
      EOPNOTSUPP is widely used as error number indicating operation is
      not supported in syscall, and ENOTSUPP was defined and only used
      for NFSv3 protocol, so use EOPNOTSUPP instead.
      
      Fixes: 0a2aa8fb ("f2fs: refactor __exchange_data_block for speed up")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fd114ab2
    • Chao Yu's avatar
      f2fs: use wrapped IS_SWAPFILE() · 3ee0c5d3
      Chao Yu authored
      Just cleanup, no logic change.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3ee0c5d3
    • Daniel Rosenberg's avatar
      f2fs: Support case-insensitive file name lookups · 2c2eb7a3
      Daniel Rosenberg authored
      Modeled after commit b886ee3e ("ext4: Support case-insensitive file
      name lookups")
      
      """
      This patch implements the actual support for case-insensitive file name
      lookups in f2fs, based on the feature bit and the encoding stored in the
      superblock.
      
      A filesystem that has the casefold feature set is able to configure
      directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
      to succeed in that directory in a case-insensitive fashion, i.e: match
      a directory entry even if the name used by userspace is not a byte per
      byte match with the disk name, but is an equivalent case-insensitive
      version of the Unicode string.  This operation is called a
      case-insensitive file name lookup.
      
      The feature is configured as an inode attribute applied to directories
      and inherited by its children.  This attribute can only be enabled on
      empty directories for filesystems that support the encoding feature,
      thus preventing collision of file names that only differ by case.
      
      * dcache handling:
      
      For a +F directory, F2Fs only stores the first equivalent name dentry
      used in the dcache. This is done to prevent unintentional duplication of
      dentries in the dcache, while also allowing the VFS code to quickly find
      the right entry in the cache despite which equivalent string was used in
      a previous lookup, without having to resort to ->lookup().
      
      d_hash() of casefolded directories is implemented as the hash of the
      casefolded string, such that we always have a well-known bucket for all
      the equivalencies of the same string. d_compare() uses the
      utf8_strncasecmp() infrastructure, which handles the comparison of
      equivalent, same case, names as well.
      
      For now, negative lookups are not inserted in the dcache, since they
      would need to be invalidated anyway, because we can't trust missing file
      dentries.  This is bad for performance but requires some leveraging of
      the vfs layer to fix.  We can live without that for now, and so does
      everyone else.
      
      * on-disk data:
      
      Despite using a specific version of the name as the internal
      representation within the dcache, the name stored and fetched from the
      disk is a byte-per-byte match with what the user requested, making this
      implementation 'name-preserving'. i.e. no actual information is lost
      when writing to storage.
      
      DX is supported by modifying the hashes used in +F directories to make
      them case/encoding-aware.  The new disk hashes are calculated as the
      hash of the full casefolded string, instead of the string directly.
      This allows us to efficiently search for file names in the htree without
      requiring the user to provide an exact name.
      
      * Dealing with invalid sequences:
      
      By default, when a invalid UTF-8 sequence is identified, ext4 will treat
      it as an opaque byte sequence, ignoring the encoding and reverting to
      the old behavior for that unique file.  This means that case-insensitive
      file name lookup will not work only for that file.  An optional bit can
      be set in the superblock telling the filesystem code and userspace tools
      to enforce the encoding.  When that optional bit is set, any attempt to
      create a file name using an invalid UTF-8 sequence will fail and return
      an error to userspace.
      
      * Normalization algorithm:
      
      The UTF-8 algorithms used to compare strings in f2fs is implemented
      in fs/unicode, and is based on a previous version developed by
      SGI.  It implements the Canonical decomposition (NFD) algorithm
      described by the Unicode specification 12.1, or higher, combined with
      the elimination of ignorable code points (NFDi) and full
      case-folding (CF) as documented in fs/unicode/utf8_norm.c.
      
      NFD seems to be the best normalization method for F2FS because:
      
        - It has a lower cost than NFC/NFKC (which requires
          decomposing to NFD as an intermediary step)
        - It doesn't eliminate important semantic meaning like
          compatibility decompositions.
      
      Although:
      
      - This implementation is not completely linguistic accurate, because
      different languages have conflicting rules, which would require the
      specialization of the filesystem to a given locale, which brings all
      sorts of problems for removable media and for users who use more than
      one language.
      """
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2c2eb7a3
    • Daniel Rosenberg's avatar
      f2fs: include charset encoding information in the superblock · 5aba5430
      Daniel Rosenberg authored
      Add charset encoding to f2fs to support casefolding. It is modeled after
      the same feature introduced in commit c83ad55e ("ext4: include charset
      encoding information in the superblock")
      
      Currently this is not compatible with encryption, similar to the current
      ext4 imlpementation. This will change in the future.
      
      >From the ext4 patch:
      """
      The s_encoding field stores a magic number indicating the encoding
      format and version used globally by file and directory names in the
      filesystem.  The s_encoding_flags defines policies for using the charset
      encoding, like how to handle invalid sequences.  The magic number is
      mapped to the exact charset table, but the mapping is specific to ext4.
      Since we don't have any commitment to support old encodings, the only
      encoding I am supporting right now is utf8-12.1.0.
      
      The current implementation prevents the user from enabling encoding and
      per-directory encryption on the same filesystem at the same time.  The
      incompatibility between these features lies in how we do efficient
      directory searches when we cannot be sure the encryption of the user
      provided fname will match the actual hash stored in the disk without
      decrypting every directory entry, because of normalization cases.  My
      quickest solution is to simply block the concurrent use of these
      features for now, and enable it later, once we have a better solution.
      """
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5aba5430
    • Daniel Rosenberg's avatar
      fs: Reserve flag for casefolding · 71e90b46
      Daniel Rosenberg authored
      In preparation for including the casefold feature within f2fs, elevate
      the EXT4_CASEFOLD_FL flag to FS_CASEFOLD_FL.
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      71e90b46
    • Chao Yu's avatar
      f2fs: fix to avoid call kvfree under spinlock · 0921835c
      Chao Yu authored
      vfree() don't wish to be called from interrupt context, move it
      out of spin_lock_irqsave() coverage.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0921835c
    • Jia-Ju Bai's avatar
      fs: f2fs: Remove unnecessary checks of SM_I(sbi) in update_general_status() · 280fd422
      Jia-Ju Bai authored
      In fill_super() and put_super(), f2fs_destroy_stats() is called
      in prior to f2fs_destroy_segment_manager(), so if current
      sbi can still be visited in global stat list, SM_I(sbi) should be
      released yet.
      For this reason, SM_I(sbi) does not need to be checked in
      update_general_status().
      Thank Chao Yu for advice.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      280fd422
    • Chao Yu's avatar
      f2fs: disallow direct IO in atomic write · 038d0698
      Chao Yu authored
      Atomic write needs page cache to cache data of transaction,
      direct IO should never be allowed in atomic write, detect
      and deny it when open atomic write file.
      Signed-off-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      038d0698
    • Chao Yu's avatar
      f2fs: fix to handle quota_{on,off} correctly · fe973b06
      Chao Yu authored
      With quota_ino feature on, generic/232 reports an inconsistence issue
      on the image.
      
      The root cause is that the testcase tries to:
      - use quotactl to shutdown journalled quota based on sysfile;
      - and then use quotactl to enable/turn on quota based on specific file
      (aquota.user or aquota.group).
      
      Eventually, quota sysfile will be out-of-update due to following specific
      file creation.
      
      Change as below to fix this issue:
      - deny enabling quota based on specific file if quota sysfile exists.
      - set SBI_QUOTA_NEED_REPAIR once sysfile based quota shutdowns via
      ioctl.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fe973b06
    • Chao Yu's avatar
      f2fs: fix to detect cp error in f2fs_setxattr() · a25c2cdc
      Chao Yu authored
      It needs to return -EIO if filesystem has been shutdown, fix the
      miss case in f2fs_setxattr().
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a25c2cdc
    • Chao Yu's avatar
      f2fs: fix to spread f2fs_is_checkpoint_ready() · 955ebcd3
      Chao Yu authored
      We missed to call f2fs_is_checkpoint_ready() in several places, it may
      allow space allocation even when free space was exhausted during
      checkpoint is disabled, fix to add them.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      955ebcd3
    • Chao Yu's avatar
      f2fs: support fiemap() for directory inode · 7975f349
      Chao Yu authored
      Adjust f2fs_fiemap() to support fiemap() on directory inode.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7975f349
    • Chao Yu's avatar
      f2fs: fix to avoid discard command leak · 04f9287a
      Chao Yu authored
       =============================================================================
       BUG discard_cmd (Tainted: G    B      OE  ): Objects remaining in discard_cmd on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       INFO: Slab 0xffffe1ac481d22c0 objects=36 used=2 fp=0xffff936b4748bf50 flags=0x2ffff0000000100
       Call Trace:
        dump_stack+0x63/0x87
        slab_err+0xa1/0xb0
        __kmem_cache_shutdown+0x183/0x390
        shutdown_cache+0x14/0x110
        kmem_cache_destroy+0x195/0x1c0
        f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
        exit_f2fs_fs+0x35/0x641 [f2fs]
        SyS_delete_module+0x155/0x230
        ? vtime_user_exit+0x29/0x70
        do_syscall_64+0x6e/0x160
        entry_SYSCALL64_slow_path+0x25/0x25
      
       INFO: Object 0xffff936b4748b000 @offset=0
       INFO: Object 0xffff936b4748b070 @offset=112
       kmem_cache_destroy discard_cmd: Slab cache still has objects
       Call Trace:
        dump_stack+0x63/0x87
        kmem_cache_destroy+0x1b4/0x1c0
        f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
        exit_f2fs_fs+0x35/0x641 [f2fs]
        SyS_delete_module+0x155/0x230
        do_syscall_64+0x6e/0x160
        entry_SYSCALL64_slow_path+0x25/0x25
      
      Recovery can cache discard commands, so in error path of fill_super(),
      we need give a chance to handle them, otherwise it will lead to leak
      of discard_cmd slab cache.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      04f9287a
    • Chao Yu's avatar
      f2fs: fix to avoid tagging SBI_QUOTA_NEED_REPAIR incorrectly · 0f1898f9
      Chao Yu authored
      On a quota disabled image, with fault injection, SBI_QUOTA_NEED_REPAIR
      will be set incorrectly in error path of f2fs_evict_inode(), fix it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0f1898f9
    • Chao Yu's avatar
      f2fs: fix to drop meta/node pages during umount · a8933b6b
      Chao Yu authored
      As reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=204193
      
      A null pointer dereference bug is triggered in f2fs under kernel-5.1.3.
      
       kasan_report.cold+0x5/0x32
       f2fs_write_end_io+0x215/0x650
       bio_endio+0x26e/0x320
       blk_update_request+0x209/0x5d0
       blk_mq_end_request+0x2e/0x230
       lo_complete_rq+0x12c/0x190
       blk_done_softirq+0x14a/0x1a0
       __do_softirq+0x119/0x3e5
       irq_exit+0x94/0xe0
       call_function_single_interrupt+0xf/0x20
      
      During umount, we will access NULL sbi->node_inode pointer in
      f2fs_write_end_io():
      
      	f2fs_bug_on(sbi, page->mapping == NODE_MAPPING(sbi) &&
      				page->index != nid_of_node(page));
      
      The reason is if disable_checkpoint mount option is on, meta dirty
      pages can remain during umount, and then be flushed by iput() of
      meta_inode, however node_inode has been iput()ed before
      meta_inode's iput().
      
      Since checkpoint is disabled, all meta/node datas are useless and
      should be dropped in next mount, so in umount, let's adjust
      drop_inode() to give a hint to iput_final() to drop all those dirty
      datas correctly.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a8933b6b
    • Chao Yu's avatar
      f2fs: disallow switching io_bits option during remount · 1f78adfa
      Chao Yu authored
      If IO alignment feature is turned on after remount, we didn't
      initialize mempool of it, it turns out we will encounter panic
      during IO submission due to access NULL mempool pointer.
      
      This feature should be set only at mount time, so simply deny
      configuring during remount.
      
      This fixes bug reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=204135Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1f78adfa
    • Chao Yu's avatar
      f2fs: fix panic of IO alignment feature · c72db71e
      Chao Yu authored
      Since 07173c3e ("block: enable multipage bvecs"), one bio vector
      can store multi pages, so that we can not calculate max IO size of
      bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of
      f2fs always has that assumption, so finally, it may cause panic during
      IO submission as below stack.
      
       kernel BUG at fs/f2fs/data.c:317!
       RIP: 0010:__submit_merged_bio+0x8b0/0x8c0
       Call Trace:
        f2fs_submit_page_write+0x3cd/0xdd0
        do_write_page+0x15d/0x360
        f2fs_outplace_write_data+0xd7/0x210
        f2fs_do_write_data_page+0x43b/0xf30
        __write_data_page+0xcf6/0x1140
        f2fs_write_cache_pages+0x3ba/0xb40
        f2fs_write_data_pages+0x3dd/0x8b0
        do_writepages+0xbb/0x1e0
        __writeback_single_inode+0xb6/0x800
        writeback_sb_inodes+0x441/0x910
        wb_writeback+0x261/0x650
        wb_workfn+0x1f9/0x7a0
        process_one_work+0x503/0x970
        worker_thread+0x7d/0x820
        kthread+0x1ad/0x210
        ret_from_fork+0x35/0x40
      
      This patch adds one extra condition to check left space in bio while
      trying merging page to bio, to avoid panic.
      
      This bug was reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=204043Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c72db71e
    • Chao Yu's avatar
      f2fs: introduce {page,io}_is_mergeable() for readability · 8896cbdf
      Chao Yu authored
      Wrap merge condition into function for readability, no logic change.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8896cbdf
  3. 16 Aug, 2019 4 commits
    • Jaegeuk Kim's avatar
      f2fs: fix livelock in swapfile writes · 75a037f3
      Jaegeuk Kim authored
      This patch fixes livelock in the below call path when writing swap pages.
      
      [46374.617256] c2    701  __switch_to+0xe4/0x100
      [46374.617265] c2    701  __schedule+0x80c/0xbc4
      [46374.617273] c2    701  schedule+0x74/0x98
      [46374.617281] c2    701  rwsem_down_read_failed+0x190/0x234
      [46374.617291] c2    701  down_read+0x58/0x5c
      [46374.617300] c2    701  f2fs_map_blocks+0x138/0x9a8
      [46374.617310] c2    701  get_data_block_dio_write+0x74/0x104
      [46374.617320] c2    701  __blockdev_direct_IO+0x1350/0x3930
      [46374.617331] c2    701  f2fs_direct_IO+0x55c/0x8bc
      [46374.617341] c2    701  __swap_writepage+0x1d0/0x3e8
      [46374.617351] c2    701  swap_writepage+0x44/0x54
      [46374.617360] c2    701  shrink_page_list+0x140/0xe80
      [46374.617371] c2    701  shrink_inactive_list+0x510/0x918
      [46374.617381] c2    701  shrink_node_memcg+0x2d4/0x804
      [46374.617391] c2    701  shrink_node+0x10c/0x2f8
      [46374.617400] c2    701  do_try_to_free_pages+0x178/0x38c
      [46374.617410] c2    701  try_to_free_pages+0x348/0x4b8
      [46374.617419] c2    701  __alloc_pages_nodemask+0x7f8/0x1014
      [46374.617429] c2    701  pagecache_get_page+0x184/0x2cc
      [46374.617438] c2    701  f2fs_new_node_page+0x60/0x41c
      [46374.617449] c2    701  f2fs_new_inode_page+0x50/0x7c
      [46374.617460] c2    701  f2fs_init_inode_metadata+0x128/0x530
      [46374.617472] c2    701  f2fs_add_inline_entry+0x138/0xd64
      [46374.617480] c2    701  f2fs_do_add_link+0xf4/0x178
      [46374.617488] c2    701  f2fs_create+0x1e4/0x3ac
      [46374.617497] c2    701  path_openat+0xdc0/0x1308
      [46374.617507] c2    701  do_filp_open+0x78/0x124
      [46374.617516] c2    701  do_sys_open+0x134/0x248
      [46374.617525] c2    701  SyS_openat+0x14/0x20
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      75a037f3
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · b7e7c85d
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Don't taint the kernel if CPUs have different sets of page sizes
         supported (other than the one in use).
      
       - Issue I-cache maintenance for module ftrace trampoline.
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side
        arm64: cpufeature: Don't treat granule sizes as strict
      b7e7c85d
    • Will Deacon's avatar
      arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side · b6143d10
      Will Deacon authored
      The initial support for dynamic ftrace trampolines in modules made use
      of an indirect branch which loaded its target from the beginning of
      a special section (e71a4e1b ("arm64: ftrace: add support for far
      branches to dynamic ftrace")). Since no instructions were being patched,
      no cache maintenance was needed. However, later in be0f272b ("arm64:
      ftrace: emit ftrace-mod.o contents through code") this code was reworked
      to output the trampoline instructions directly into the PLT entry but,
      unfortunately, the necessary cache maintenance was overlooked.
      
      Add a call to __flush_icache_range() after writing the new trampoline
      instructions but before patching in the branch to the trampoline.
      
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: <stable@vger.kernel.org>
      Fixes: be0f272b ("arm64: ftrace: emit ftrace-mod.o contents through code")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      b6143d10
    • Linus Torvalds's avatar
      Merge tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 2d63ba3e
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These add a check to avoid recent suspend-to-idle power regression on
        systems with NVMe drives where the PCIe ASPM policy is "performance"
        (or when the kernel is built without ASPM support), fix an issue
        related to frequency limits in the schedutil cpufreq governor and fix
        a mistake related to the PM QoS usage in the cpufreq core introduced
        recently.
      
        Specifics:
      
         - Disable NVMe power optimization related to suspend-to-idle added
           recently on systems where PCIe ASPM is not able to put PCIe links
           into low-power states to prevent excess power from being drawn by
           the system while suspended (Rafael Wysocki).
      
         - Make the schedutil governor handle frequency limits changes
           properly in all cases (Viresh Kumar).
      
         - Prevent the cpufreq core from treating positive values returned by
           dev_pm_qos_update_request() as errors (Viresh Kumar)"
      
      * tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
        PCI/ASPM: Add pcie_aspm_enabled()
        cpufreq: schedutil: Don't skip freq update when limits change
        cpufreq: dev_pm_qos_update_request() can return 1 on success
      2d63ba3e