1. 03 Nov, 2019 8 commits
  2. 31 Oct, 2019 1 commit
    • Dave Chinner's avatar
      xfs: properly serialise fallocate against AIO+DIO · 249bd908
      Dave Chinner authored
      AIO+DIO can extend the file size on IO completion, and it holds
      no inode locks while the IO is in flight. Therefore, a race
      condition exists in file size updates if we do something like this:
      
      aio-thread			fallocate-thread
      
      lock inode
      submit IO beyond inode->i_size
      unlock inode
      .....
      				lock inode
      				break layouts
      				if (off + len > inode->i_size)
      					new_size = off + len
      				.....
      				inode_dio_wait()
      				<blocks>
      .....
      completes
      inode->i_size updated
      inode_dio_done()
      ....
      				<wakes>
      				<does stuff no long beyond EOF>
      				if (new_size)
      					xfs_vn_setattr(inode, new_size)
      
      
      Yup, that attempt to extend the file size in the fallocate code
      turns into a truncate - it removes the whatever the aio write
      allocated and put to disk, and reduced the inode size back down to
      where the fallocate operation ends.
      
      Fundamentally, xfs_file_fallocate()  not compatible with racing
      AIO+DIO completions, so we need to move the inode_dio_wait() call
      up to where the lock the inode and break the layouts.
      
      Secondly, storing the inode size and then using it unchecked without
      holding the ILOCK is not safe; we can only do such a thing if we've
      locked out and drained all IO and other modification operations,
      which we don't do initially in xfs_file_fallocate.
      
      It should be noted that some of the fallocate operations are
      compound operations - they are made up of multiple manipulations
      that may zero data, and so we may need to flush and invalidate the
      file multiple times during an operation. However, we only need to
      lock out IO and other space manipulation operations once, as that
      lockout is maintained until the entire fallocate operation has been
      completed.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      249bd908
  3. 29 Oct, 2019 19 commits
  4. 28 Oct, 2019 8 commits
  5. 24 Oct, 2019 4 commits
    • Jan Kara's avatar
      xfs: Sanity check flags of Q_XQUOTARM call · 3dd4d40b
      Jan Kara authored
      Flags passed to Q_XQUOTARM were not sanity checked for invalid values.
      Fix that.
      
      Fixes: 9da93f9b ("xfs: fix Q_XQUOTARM ioctl")
      Reported-by: default avatarYang Xu <xuyang2018.jy@cn.fujitsu.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      3dd4d40b
    • Ben Dooks (Codethink)'s avatar
      xfs: add mising include of xfs_pnfs.h for missing declarations · 1aa63006
      Ben Dooks (Codethink) authored
      The xfs_pnfs.c file is missing an include of xfs_pnfs.h to
      add the prototypes of the functions it exports. Include this
      file to fix the following sparse warnings:
      
      fs/xfs/xfs_pnfs.c:27:1: warning: symbol 'xfs_break_leased_layouts' was not declared. Should it be static?
      fs/xfs/xfs_pnfs.c:52:1: warning: symbol 'xfs_fs_get_uuid' was not declared. Should it be static?
      fs/xfs/xfs_pnfs.c:77:1: warning: symbol 'xfs_fs_map_blocks' was not declared. Should it be static?
      fs/xfs/xfs_pnfs.c:226:1: warning: symbol 'xfs_fs_commit_blocks' was not declared. Should it be static?
      Signed-off-by: default avatarBen Dooks (Codethink) <ben.dooks@codethink.co.uk>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      1aa63006
    • Brian Foster's avatar
      xfs: don't set bmapi total block req where minleft is · da781e64
      Brian Foster authored
      xfs_bmapi_write() takes a total block requirement parameter that is
      passed down to the block allocation code and is used to specify the
      total block requirement of the associated transaction. This is used
      to try and select an AG that can not only satisfy the requested
      extent allocation, but can also accommodate subsequent allocations
      that might be required to complete the transaction. For example,
      additional bmbt block allocations may be required on insertion of
      the resulting extent to an inode data fork.
      
      While it's important for callers to calculate and reserve such extra
      blocks in the transaction, it is not necessary to pass the total
      value to xfs_bmapi_write() in all cases. The latter automatically
      sets minleft to ensure that sufficient free blocks remain after the
      allocation attempt to expand the format of the associated inode
      (i.e., such as extent to btree conversion, btree splits, etc).
      Therefore, any callers that pass a total block requirement of the
      bmap mapping length plus worst case bmbt expansion essentially
      specify the additional reservation requirement twice. These callers
      can pass a total of zero to rely on the bmapi minleft policy.
      
      Beyond being superfluous, the primary motivation for this change is
      that the total reservation logic in the bmbt code is dubious in
      scenarios where minlen < maxlen and a maxlen extent cannot be
      allocated (which is more common for data extent allocations where
      contiguity is not required). The total value is based on maxlen in
      the xfs_bmapi_write() caller. If the bmbt code falls back to an
      allocation between minlen and maxlen, that allocation will not
      succeed until total is reset to minlen, which essentially throws
      away any additional reservation included in total by the caller. In
      addition, the total value is not reset until after alignment is
      dropped, which means that such callers drop alignment far too
      aggressively than necessary.
      
      Update all callers of xfs_bmapi_write() that pass a total block
      value of the mapping length plus bmbt reservation to instead pass
      zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
      requirement. This trades off slightly less conservative AG selection
      for the ability to preserve alignment in more scenarios.
      xfs_bmapi_write() callers that incorporate unrelated or additional
      reservations in total beyond what is already included in minleft
      must continue to use the former.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      da781e64
    • Dave Chinner's avatar
      xfs: cap longest free extent to maximum allocatable · 1c743574
      Dave Chinner authored
      Cap longest extent to the largest we can allocate based on limits
      calculated at mount time. Dynamic state (such as finobt blocks)
      can result in the longest free extent exceeding the size we can
      allocate, and that results in failure to align full AG allocations
      when the AG is empty.
      
      Result:
      
      xfs_io-4413  [003]   426.412459: xfs_alloc_vextent_loopfailed: dev 8:96 agno 0 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 262148 alignment 32 minalignslop 0 len 0 type NEAR_BNO otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff
      
      minlen and maxlen are now separated by the alignment size, and
      allocation fails because args.total > free space in the AG.
      
      [bfoster: Added xfs_bmap_btalloc() changes.]
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      1c743574