1. 04 Sep, 2019 2 commits
    • kaixuxia's avatar
      xfs: Fix deadlock between AGI and AGF with RENAME_WHITEOUT · bc56ad8c
      kaixuxia authored
      When performing rename operation with RENAME_WHITEOUT flag, we will
      hold AGF lock to allocate or free extents in manipulating the dirents
      firstly, and then doing the xfs_iunlink_remove() call last to hold
      AGI lock to modify the tmpfile info, so we the lock order AGI->AGF.
      
      The big problem here is that we have an ordering constraint on AGF
      and AGI locking - inode allocation locks the AGI, then can allocate
      a new extent for new inodes, locking the AGF after the AGI. Hence
      the ordering that is imposed by other parts of the code is AGI before
      AGF. So we get an ABBA deadlock between the AGI and AGF here.
      
      Process A:
      Call trace:
       ? __schedule+0x2bd/0x620
       schedule+0x33/0x90
       schedule_timeout+0x17d/0x290
       __down_common+0xef/0x125
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       down+0x3b/0x50
       xfs_buf_lock+0x34/0xf0 [xfs]
       xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_buf_get_map+0x37/0x230 [xfs]
       xfs_buf_read_map+0x29/0x190 [xfs]
       xfs_trans_read_buf_map+0x13d/0x520 [xfs]
       xfs_read_agf+0xa6/0x180 [xfs]
       ? schedule_timeout+0x17d/0x290
       xfs_alloc_read_agf+0x52/0x1f0 [xfs]
       xfs_alloc_fix_freelist+0x432/0x590 [xfs]
       ? down+0x3b/0x50
       ? xfs_buf_lock+0x34/0xf0 [xfs]
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_alloc_vextent+0x301/0x6c0 [xfs]
       xfs_ialloc_ag_alloc+0x182/0x700 [xfs]
       ? _xfs_trans_bjoin+0x72/0xf0 [xfs]
       xfs_dialloc+0x116/0x290 [xfs]
       xfs_ialloc+0x6d/0x5e0 [xfs]
       ? xfs_log_reserve+0x165/0x280 [xfs]
       xfs_dir_ialloc+0x8c/0x240 [xfs]
       xfs_create+0x35a/0x610 [xfs]
       xfs_generic_create+0x1f1/0x2f0 [xfs]
       ...
      
      Process B:
      Call trace:
       ? __schedule+0x2bd/0x620
       ? xfs_bmapi_allocate+0x245/0x380 [xfs]
       schedule+0x33/0x90
       schedule_timeout+0x17d/0x290
       ? xfs_buf_find+0x1fd/0x6c0 [xfs]
       __down_common+0xef/0x125
       ? xfs_buf_get_map+0x37/0x230 [xfs]
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       down+0x3b/0x50
       xfs_buf_lock+0x34/0xf0 [xfs]
       xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_buf_get_map+0x37/0x230 [xfs]
       xfs_buf_read_map+0x29/0x190 [xfs]
       xfs_trans_read_buf_map+0x13d/0x520 [xfs]
       xfs_read_agi+0xa8/0x160 [xfs]
       xfs_iunlink_remove+0x6f/0x2a0 [xfs]
       ? current_time+0x46/0x80
       ? xfs_trans_ichgtime+0x39/0xb0 [xfs]
       xfs_rename+0x57a/0xae0 [xfs]
       xfs_vn_rename+0xe4/0x150 [xfs]
       ...
      
      In this patch we move the xfs_iunlink_remove() call to
      before acquiring the AGF lock to preserve correct AGI/AGF locking
      order.
      Signed-off-by: default avatarkaixuxia <kaixuxia@tencent.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      bc56ad8c
    • Darrick J. Wong's avatar
      xfs: define a flags field for the AG geometry ioctl structure · 76f17933
      Darrick J. Wong authored
      Define a flags field for the AG geometry ioctl structure.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      76f17933
  2. 03 Sep, 2019 1 commit
  3. 31 Aug, 2019 17 commits
  4. 30 Aug, 2019 1 commit
  5. 28 Aug, 2019 9 commits
  6. 27 Aug, 2019 5 commits
    • Darrick J. Wong's avatar
      xfs: bmap scrub should only scrub records once · 519e5869
      Darrick J. Wong authored
      The inode block mapping scrub function does more work for btree format
      extent maps than is absolutely necessary -- first it will walk the bmbt
      and check all the entries, and then it will load the incore tree and
      check every entry in that tree, possibly for a second time.
      
      Simplify the code and decrease check runtime by separating the two
      responsibilities.  The bmbt walk will make sure the incore extent
      mappings are loaded, check the shape of the bmap btree (via xchk_btree)
      and check that every bmbt record has a corresponding incore extent map;
      and the incore extent map walk takes all the responsibility for checking
      the mapping records and cross referencing them with other AG metadata.
      
      This enables us to clean up some messy parameter handling and reduce
      redundant code.  Rename a few functions to make the split of
      responsibilities clearer.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      519e5869
    • zhengbin's avatar
      xfs: remove excess function parameter description in 'xfs_btree_sblock_v5hdr_verify' · 71912e08
      zhengbin authored
      Fixes gcc warning:
      
      fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'max_recs' description in 'xfs_btree_sblock_v5hdr_verify'
      fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'pag_max_level' description in 'xfs_btree_sblock_v5hdr_verify'
      
      Fixes: c5ab131b ("libxfs: refactor short btree block verification")
      Signed-off-by: default avatarzhengbin <zhengbin13@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      71912e08
    • Dave Chinner's avatar
      xfs: add kmem_alloc_io() · f8f9ee47
      Dave Chinner authored
      Memory we use to submit for IO needs strict alignment to the
      underlying driver contraints. Worst case, this is 512 bytes. Given
      that all allocations for IO are always a power of 2 multiple of 512
      bytes, the kernel heap provides natural alignment for objects of
      these sizes and that suffices.
      
      Until, of course, memory debugging of some kind is turned on (e.g.
      red zones, poisoning, KASAN) and then the alignment of the heap
      objects is thrown out the window. Then we get weird IO errors and
      data corruption problems because drivers don't validate alignment
      and do the wrong thing when passed unaligned memory buffers in bios.
      
      TO fix this, introduce kmem_alloc_io(), which will guaranteeat least
      512 byte alignment of buffers for IO, even if memory debugging
      options are turned on. It is assumed that the minimum allocation
      size will be 512 bytes, and that sizes will be power of 2 mulitples
      of 512 bytes.
      
      Use this everywhere we allocate buffers for IO.
      
      This no longer fails with log recovery errors when KASAN is enabled
      due to the brd driver not handling unaligned memory buffers:
      
      # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      f8f9ee47
    • Dave Chinner's avatar
      xfs: get allocation alignment from the buftarg · d916275a
      Dave Chinner authored
      Needed to feed into the allocation routine to guarantee the memory
      buffers we add to bios are correctly aligned to the underlying
      device.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      d916275a
    • Dave Chinner's avatar
      xfs: add kmem allocation trace points · 0ad95687
      Dave Chinner authored
      When trying to correlate XFS kernel allocations to memory reclaim
      behaviour, it is useful to know what allocations XFS is actually
      attempting. This information is not directly available from
      tracepoints in the generic memory allocation and reclaim
      tracepoints, so these new trace points provide a high level
      indication of what the XFS memory demand actually is.
      
      There is no per-filesystem context in this code, so we just trace
      the type of allocation, the size and the allocation constraints.
      The kmem code also doesn't include much of the common XFS headers,
      so there are a few definitions that need to be added to the trace
      headers and a couple of types that need to be made common to avoid
      needing to include the whole world in the kmem code.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      0ad95687
  7. 26 Aug, 2019 1 commit
  8. 25 Aug, 2019 4 commits