1. 04 May, 2021 1 commit
    • Darrick J. Wong's avatar
      xfs: don't allow log writes if the data device is readonly · 8e9800f9
      Darrick J. Wong authored
      While running generic/050 with an external log, I observed this warning
      in dmesg:
      
      Trying to write to read-only block-device sda4 (partno 4)
      WARNING: CPU: 2 PID: 215677 at block/blk-core.c:704 submit_bio_checks+0x256/0x510
      Call Trace:
       submit_bio_noacct+0x2c/0x430
       _xfs_buf_ioapply+0x283/0x3c0 [xfs]
       __xfs_buf_submit+0x6a/0x210 [xfs]
       xfs_buf_delwri_submit_buffers+0xf8/0x270 [xfs]
       xfsaild+0x2db/0xc50 [xfs]
       kthread+0x14b/0x170
      
      I think this happened because we tried to cover the log after a readonly
      mount, and the AIL tried to write the primary superblock to the data
      device.  The test marks the data device readonly, but it doesn't do the
      same to the external log device.  Therefore, XFS thinks that the log is
      writable, even though AIL writes whine to dmesg because the data device
      is read only.
      
      Fix this by amending xfs_log_writable to prevent writes when the AIL
      can't possible write anything into the filesystem.
      
      Note: As for the external log or the rt devices being readonly--
      xfs_blkdev_get will complain about that if we aren't doing a norecovery
      mount.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      8e9800f9
  2. 29 Apr, 2021 8 commits
    • Darrick J. Wong's avatar
      xfs: fix xfs_reflink_unshare usage of filemap_write_and_wait_range · d4f74e16
      Darrick J. Wong authored
      The final parameter of filemap_write_and_wait_range is the end of the
      range to flush, not the length of the range to flush.
      
      Fixes: 46afb062 ("xfs: only flush the unshared range in xfs_reflink_unshare")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      d4f74e16
    • Brian Foster's avatar
      xfs: set aside allocation btree blocks from block reservation · fd43cf60
      Brian Foster authored
      The blocks used for allocation btrees (bnobt and countbt) are
      technically considered free space. This is because as free space is
      used, allocbt blocks are removed and naturally become available for
      traditional allocation. However, this means that a significant
      portion of free space may consist of in-use btree blocks if free
      space is severely fragmented.
      
      On large filesystems with large perag reservations, this can lead to
      a rare but nasty condition where a significant amount of physical
      free space is available, but the majority of actual usable blocks
      consist of in-use allocbt blocks. We have a record of a (~12TB, 32
      AG) filesystem with multiple AGs in a state with ~2.5GB or so free
      blocks tracked across ~300 total allocbt blocks, but effectively at
      100% full because the the free space is entirely consumed by
      refcountbt perag reservation.
      
      Such a large perag reservation is by design on large filesystems.
      The problem is that because the free space is so fragmented, this AG
      contributes the 300 or so allocbt blocks to the global counters as
      free space. If this pattern repeats across enough AGs, the
      filesystem lands in a state where global block reservation can
      outrun physical block availability. For example, a streaming
      buffered write on the affected filesystem continues to allow delayed
      allocation beyond the point where writeback starts to fail due to
      physical block allocation failures. The expected behavior is for the
      delalloc block reservation to fail gracefully with -ENOSPC before
      physical block allocation failure is a possibility.
      
      To address this problem, set aside in-use allocbt blocks at
      reservation time and thus ensure they cannot be reserved until truly
      available for physical allocation. This allows alloc btree metadata
      to continue to reside in free space, but dynamically adjusts
      reservation availability based on internal state. Note that the
      logic requires that the allocbt counter is fully populated at
      reservation time before it is fully effective. We currently rely on
      the mount time AGF scan in the perag reservation initialization code
      for this dependency on filesystems where it's most important (i.e.
      with active perag reservations).
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      fd43cf60
    • Brian Foster's avatar
      xfs: introduce in-core global counter of allocbt blocks · 16eaab83
      Brian Foster authored
      Introduce an in-core counter to track the sum of all allocbt blocks
      used by the filesystem. This value is currently tracked per-ag via
      the ->agf_btreeblks field in the AGF, which also happens to include
      rmapbt blocks. A global, in-core count of allocbt blocks is required
      to identify the subset of global ->m_fdblocks that consists of
      unavailable blocks currently used for allocation btrees. To support
      this calculation at block reservation time, construct a similar
      global counter for allocbt blocks, populate it on first read of each
      AGF and update it as allocbt blocks are used and released.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      16eaab83
    • Brian Foster's avatar
      xfs: unconditionally read all AGFs on mounts with perag reservation · 2675ad38
      Brian Foster authored
      perag reservation is enabled at mount time on a per AG basis. The
      upcoming change to set aside allocbt blocks from block reservation
      requires a populated allocbt counter as soon as possible after mount
      to be fully effective against large perag reservations. Therefore as
      a preparation step, initialize the pagf on all mounts where at least
      one reservation is active. Note that this already occurs to some
      degree on most default format filesystems as reservation requirement
      calculations already depend on the AGF or AGI, depending on the
      reservation type.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      2675ad38
    • Darrick J. Wong's avatar
      xfs: count free space btree blocks when scrubbing pre-lazysbcount fses · e147a756
      Darrick J. Wong authored
      Since agf_btreeblks didn't exist before the lazysbcount feature, the fs
      summary count scrubber needs to walk the free space btrees to determine
      the amount of space being used by those btrees.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarGao Xiang <hsiangkao@redhat.com>
      e147a756
    • Dave Chinner's avatar
      xfs: update superblock counters correctly for !lazysbcount · 6543990a
      Dave Chinner authored
      Keep the mount superblock counters up to date for !lazysbcount
      filesystems so that when we log the superblock they do not need
      updating in any way because they are already correct.
      
      It's found by what Zorro reported:
      1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
      2. mount $dev $mnt
      3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
      4. umount $mnt
      5. xfs_repair -n $dev
      and I've seen no problem with this patch.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reported-by: default avatarZorro Lang <zlang@redhat.com>
      Reviewed-by: default avatarGao Xiang <hsiangkao@redhat.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      6543990a
    • Darrick J. Wong's avatar
      xfs: don't check agf_btreeblks on pre-lazysbcount filesystems · e6c01077
      Darrick J. Wong authored
      The AGF free space btree block counter wasn't added until the
      lazysbcount feature was added to XFS midway through the life of the V4
      format, so ignore the field when checking.  Online AGF repair requires
      rmapbt, so it doesn't need the feature check.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      e6c01077
    • Darrick J. Wong's avatar
      xfs: remove obsolete AGF counter debugging · 1aec7c3d
      Darrick J. Wong authored
      In commit f8f2835a we changed the behavior of XFS to use EFIs to
      remove blocks from an overfilled AGFL because there were complaints
      about transaction overruns that stemmed from trying to free multiple
      blocks in a single transaction.
      
      Unfortunately, that commit missed a subtlety in the debug-mode
      transaction accounting when a realtime volume is attached.  If a
      realtime file undergoes a data fork mapping change such that realtime
      extents are allocated (or freed) in the same transaction that a data
      device block is also allocated (or freed), we can trip a debugging
      assertion.  This can happen (for example) if a realtime extent is
      allocated and it is necessary to reshape the bmbt to hold the new
      mapping.
      
      When we go to allocate a bmbt block from an AG, the first thing the data
      device block allocator does is ensure that the freelist is the proper
      length.  If the freelist is too long, it will trim the freelist to the
      proper length.
      
      In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to
      record the decrement in the AG free list count.  Prior to f8f28 we would
      put the free block back in the free space btrees in the same
      transaction, which calls xfs_trans_agblocks_delta() to record the
      increment in the AG free block count.  Since AGFL blocks are included in
      the global free block count (fdblocks), there is no corresponding
      fdblocks update, so the AGFL free satisfies the following condition in
      xfs_trans_apply_sb_deltas:
      
      	/*
      	 * Check that superblock mods match the mods made to AGF counters.
      	 */
      	ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) ==
      	       (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta +
      		tp->t_ag_btree_delta));
      
      The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is
      the number blocks that were allocated.
      
      After commit f8f28 we defer the block freeing to the next chained
      transaction, which means that the calls to xfs_trans_agflist_delta and
      xfs_trans_agblocks_delta occur in separate transactions.  The (first)
      transaction that shortens the free list trips on the comparison, which
      has now become:
      
      (X + 0) == ((X) + -1 + 0)
      
      because we haven't freed the AGFL block yet; we've only logged an
      intention to free it.  When the second transaction (the deferred free)
      commits, it will evaluate the expression as:
      
      (0 + 0) == (1 + 0 + 0)
      
      and trip over that in turn.
      
      At this point, the astute reader may note that the two commits tagged by
      this patch have been in the kernel for a long time but haven't generated
      any bug reports.  How is it that the author became aware of this bug?
      
      This originally surfaced as an intermittent failure when I was testing
      realtime rmap, but a different bug report by Zorro Lang reveals the same
      assertion occuring on !lazysbcount filesystems.
      
      The common factor to both reports (and why this problem wasn't
      previously reported) becomes apparent if we consider when
      xfs_trans_apply_sb_deltas is called by __xfs_trans_commit():
      
      	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
      		xfs_trans_apply_sb_deltas(tp);
      
      With a modern lazysbcount filesystem, transactions update only the
      percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence
      xfs_trans_apply_sb_deltas is rarely called.
      
      However, updates to the count of free realtime extents are not part of
      lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or
      removing data fork mappings to realtime files; similarly,
      XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems.
      
      Dave mentioned in response to an earlier version of this patch:
      
      "IIUC, what you are saying is that this debug code is simply not
      exercised in normal testing and hasn't been for the past decade?  And it
      still won't be exercised on anything other than realtime device testing?
      
      "...it was debugging code from 1994 that was largely turned into dead
      code when lazysbcounters were introduced in 2007. Hence I'm not sure it
      holds any value anymore."
      
      This debugging code isn't especially helpful - you can modify the
      flcount on one AG and the freeblks of another AG, and it won't trigger.
      Add the fact that nobody noticed for a decade, and let's just get rid of
      it (and start testing realtime :P).
      
      This bug was found by running generic/051 on either a V4 filesystem
      lacking lazysbcount; or a V5 filesystem with a realtime volume.
      
      Cc: bfoster@redhat.com, zlang@redhat.com
      Fixes: f8f2835a ("xfs: defer agfl block frees when dfops is available")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      1aec7c3d
  3. 23 Apr, 2021 2 commits
  4. 16 Apr, 2021 1 commit
  5. 15 Apr, 2021 7 commits
  6. 09 Apr, 2021 7 commits
  7. 07 Apr, 2021 14 commits