1. 08 Jan, 2018 15 commits
    • Darrick J. Wong's avatar
      xfs: catch a few more error codes when scrubbing secondary sb · e5b37faa
      Darrick J. Wong authored
      The superblock validation routines return a variety of error codes to
      reject a mount request.  For scrub we can assume that the mount
      succeeded, so if we see these things appear when scrubbing secondary sb
      X, we can treat them all like corruption.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      e5b37faa
    • Darrick J. Wong's avatar
      xfs: ignore agfl read errors when not scrubbing agfl · 5a0f4337
      Darrick J. Wong authored
      In xfs_scrub_ag_read_headers, if we're not scrubbing the AGFL but
      hit a read error reading the AGFL, we should reset the error code
      so that it doesn't propagate up into the caller.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      5a0f4337
    • Darrick J. Wong's avatar
      iomap: report collisions between directio and buffered writes to userspace · 5a9d929d
      Darrick J. Wong authored
      If two programs simultaneously try to write to the same part of a file
      via direct IO and buffered IO, there's a chance that the post-diowrite
      pagecache invalidation will fail on the dirty page.  When this happens,
      the dio write succeeded, which means that the page cache is no longer
      coherent with the disk!
      
      Programs are not supposed to mix IO types and this is a clear case of
      data corruption, so store an EIO which will be reflected to userspace
      during the next fsync.  Replace the WARN_ON with a ratelimited pr_crit
      so that the developers have /some/ kind of breadcrumb to track down the
      offending program(s) and file(s) involved.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      5a9d929d
    • Brian Foster's avatar
      xfs: eliminate duplicate icreate tx reservation functions · c017cb5d
      Brian Foster authored
      The create transaction reservation calculation has two different
      branches of code depending on whether the filesystem is a v5 format
      fs or older. Each branch considers the max reservation between the
      allocation case (new chunk allocation + record insert) and the
      modify case (chunk exists, record modification) of inode allocation.
      
      The modify case is the same for both superblock versions with the
      exception of the finobt. The finobt helper checks the feature bit,
      however, and so the modify case already shares the same code.
      
      Now that inode chunk allocation has been refactored into a helper
      that checks the superblock version to calculate the appropriate
      reservation for the create transaction, the only remaining
      difference between the create and icreate branches is the call to
      the finobt helper. As noted above, the finobt helper is a no-op when
      the feature is not enabled. Therefore, these branches are
      effectively duplicate and can be condensed.
      
      Remove the xfs_calc_create_*() branch of functions and update the
      various callers to use the xfs_calc_icreate_*() variant. The latter
      creates the same reservation size for v4 create transactions as the
      removed branch. As such, this patch does not result in transaction
      reservation changes.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      c017cb5d
    • Brian Foster's avatar
      xfs: refactor inode chunk alloc/free tx reservation · 57af33e4
      Brian Foster authored
      The reservation for the various forms of inode allocation is
      scattered across several different functions. This includes two
      variants of chunk allocation (v5 icreate transactions vs. older
      create transactions) and the inode free transaction.
      
      To clean up some of this code and clarify the purpose of specific
      allocfree reservations, continue the pattern of defining helper
      functions for smaller operational units of broader transactions.
      Refactor the reservation into an inode chunk alloc/free helper that
      considers the various conditions based on filesystem format.
      
      An inode chunk free involves an extent free and buffer
      invalidations. The latter requires reservation for log headers only.
      An inode chunk allocation modifies the free space btrees and logs
      the chunk on v4 supers. v5 supers initialize the inode chunk using
      ordered buffers and so do not log the chunk.
      
      As a side effect of this refactoring, add one more allocfree res to
      the ifree transaction. Technically this does not serve a specific
      purpose because inode chunks are freed via deferred operations and
      thus occur after a transaction roll. tr_ifree has a bit of a history
      of tx overruns caused by too many agfl fixups during sustained file
      deletion workloads, so add this extra reservation as a form of
      padding nonetheless.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      57af33e4
    • Brian Foster's avatar
      xfs: include an allocfree res for inobt modifications · f03c78f3
      Brian Foster authored
      Analysis of recent reports of log reservation overruns and code
      inspection has uncovered that the reservations associated with inode
      operations may not cover the worst case scenarios. In particular,
      many cases only include one allocfree res. for a particular
      operation even though said operations may also entail AGFL fixups
      and inode btree block allocations in addition to the actual inode
      chunk allocation. This can easily turn into two or three block
      allocations (or frees) per operation.
      
      In theory, the only way to define the worst case reservation is to
      include an allocfree res for each individual allocation in a
      transaction. Since that is impractical (we can perform multiple agfl
      fixups per tx and not every allocation results in a full tree
      operation), we need to find a reasonable compromise that addresses
      the deficiency in practice without blowing out the size of the
      transactions.
      
      Since the inode btrees are not filled by the AGFL, record insertion
      and removal can directly result in block allocations and frees
      depending on the shape of the tree. These allocations and frees
      occur in the same transaction context as the inobt update itself,
      but are separate from the allocation/free that might be required for
      an inode chunk. Therefore, it makes sense to assume that an [f]inobt
      insert/remove can directly result in one or more block allocations
      on behalf of the tree.
      
      Refactor the inode transaction reservations to include one allocfree
      res. per inode btree modification to cover allocations required by
      the tree itself. This separates the reservation required to allocate
      the inode chunk from the reservation required for inobt record
      insertion/removal. Apply the same logic to the finobt. This results
      in killing off the finobt modify condition because we no longer
      assume that the broader transaction reservation will cover finobt
      block allocations and finobt shape changes can occur in either of
      the inobt allocation or modify situations.
      Suggested-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      f03c78f3
    • Brian Foster's avatar
      xfs: truncate transaction does not modify the inobt · a606ebdb
      Brian Foster authored
      The truncate transaction does not ever modify the inode btree, but
      includes an associated log reservation. Update
      xfs_calc_itruncate_reservation() to remove the reservation
      associated with inobt updates.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      a606ebdb
    • Brian Foster's avatar
      xfs: fix up agi unlinked list reservations · e8341d9f
      Brian Foster authored
      The current AGI unlinked list addition and removal reservations do
      not reflect the worst case log usage. An unlinked list removal can
      log up to two on-disk inode clusters but only includes reservation
      for one. An unlinked list addition logs the on-disk cluster but
      includes reservation for an in-core inode.
      
      Update the AGI unlinked list reservation helpers to calculate the
      correct worst case reservation for the associated operations.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      e8341d9f
    • Brian Foster's avatar
      xfs: include inobt buffers in ifree tx log reservation · a6f48590
      Brian Foster authored
      The tr_ifree transaction handles inode unlinks and inode chunk
      frees. The current transaction calculation does not accurately
      reflect worst case changes to the inode btree, however. The inobt
      portion of the current transaction reservation only covers
      modification of a single inobt buffer (for the particular inode
      record). This is a historical artifact from the days before XFS
      supported full inode chunk removal.
      
      When support for inode chunk removal was added in commit
      254f6311ed1b ("Implement deletion of inode clusters in XFS."), the
      additional log reservation required for chunk removal was not added
      correctly. The new reservation only considered the header overhead
      of associated buffers rather than the full contents of the btrees
      and AGF and AGFL buffers affected by the transaction. The
      reservation for the free space btrees was subsequently fixed up in
      commit 5fe6abb82f76 ("Add space for inode and allocation btrees to
      ITRUNCATE log reservation"), but the res. for full inobt joins has
      never been added.
      
      Further review of the ifree reservation uncovered a couple more
      problems:
      
      - The undocumented +2 blocks are intended for the AGF and AGFL, but
        are also not sized correctly and should be logged as full sectors
        (not FSBs).
      - The additional single block header is undocumented and serves no
        apparent purpose.
      
      Update xfs_calc_ifree_reservation() to include a full inobt join in
      the reservation calculation. Refactor the undocumented blocks
      appropriately and fix up the comments to reflect the current
      calculation.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      a6f48590
    • Brian Foster's avatar
      xfs: print transaction log reservation on overrun · 2c8f6265
      Brian Foster authored
      The transaction dump code displays the content and reservation
      consumption of a particular transaction in the event of an overrun.
      It currently displays the reservation associated with the
      transaction ticket, but not the original reservation attached to the
      transaction.
      
      The latter value reflects the original transaction reservation
      calculation before additional reservation overhead is assigned, such
      as for the CIL context header and potential split region headers.
      
      Update xlog_print_trans() to also print the original transaction
      reservation in the event of overrun. This provides a reference point
      to identify how much reservation overhead was added to a particular
      ticket by xfs_log_calc_unit_res().
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      2c8f6265
    • Darrick J. Wong's avatar
      xfs: scrub inode nsec fields · 29c1c123
      Darrick J. Wong authored
      Check that the nanosecond fields in each timestamp aren't larger
      than a billion.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      29c1c123
    • Eric Sandeen's avatar
      xfs: move all scrub input checking to xfs_scrub_validate · 8e630837
      Eric Sandeen authored
      There were ad-hoc checks for some scrub types but not others;
      mark each scrub type with ... it's type, and use that to validate
      the allowed and/or required input fields.
      
      Moving these checks out of xfs_scrub_setup_ag_header makes it
      a thin wrapper, so unwrap it in the process.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      [darrick: add xfs_ prefix to enum, check scrub args after checking type]
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      8e630837
    • Eric Sandeen's avatar
      xfs: factor out scrub input checking · 0a085ddf
      Eric Sandeen authored
      Do this before adding more core checks.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      0a085ddf
    • Eric Sandeen's avatar
      xfs: explicitly initialize meta_scrub_ops array by type · bfb3e9b9
      Eric Sandeen authored
      An implicit mapping to type by order of initialization seems
      error-prone, and doesn't lend itself to cscope-ing.
      
      Also add sanity checks about size of array vs. max types,
      and a defensive check that ->scrub exists before using it.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      bfb3e9b9
    • Richard Wareing's avatar
      xfs: Show realtime device stats on statfs calls if realtime flags set · a0158315
      Richard Wareing authored
      - Reports realtime device free blocks in statfs calls if (realtime)
        inheritance bit is set on the inode of directory, or realtime flag
        in the case of files.  This is a bit more intuitive, especially for
        use-cases which are using a much larger device for the realtime device.
      - Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
        realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
        option.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarRichard Wareing <rwareing@fb.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      a0158315
  2. 07 Jan, 2018 8 commits
  3. 06 Jan, 2018 7 commits
  4. 05 Jan, 2018 10 commits