An error occurred fetching the project authors.
  1. 29 Dec, 2018 1 commit
  2. 18 Oct, 2018 3 commits
    • Brian Foster's avatar
      xfs: clear ail delwri queued bufs on unmount of shutdown fs · efc3289c
      Brian Foster authored
      In the typical unmount case, the AIL is forced out by the unmount
      sequence before the xfsaild task is stopped. Since AIL items are
      removed on writeback completion, this means that the AIL
      ->ail_buf_list delwri queue has been drained. This is not always
      true in the shutdown case, however.
      
      It's possible for buffers to sit on a delwri queue for a period of
      time across submission attempts if said items are locked or have
      been relogged and pinned since first added to the queue. If the
      attempt to log such an item results in a log I/O error, the error
      processing can shutdown the fs, remove the item from the AIL, stale
      the buffer (dropping the LRU reference) and clear its delwri queue
      state. The latter bit means the buffer will be released from a
      delwri queue on the next submission attempt, but this might never
      occur if the filesystem has shutdown and the AIL is empty.
      
      This means that such buffers are held indefinitely by the AIL delwri
      queue across destruction of the AIL. Aside from being a memory leak,
      these buffers can also hold references to in-core perag structures.
      The latter problem manifests as a generic/475 failure, reproducing
      the following asserts at unmount time:
      
        XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
      	file: fs/xfs/xfs_mount.c, line: 151
        XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
      	file: fs/xfs/xfs_mount.c, line: 132
      
      To prevent this problem, clear the AIL delwri queue as a final step
      before xfsaild() exit. The !empty state should never occur in the
      normal case, so add an assert to catch unexpected problems going
      forward.
      
      [dgc: add comment explaining need for xfs_buf_delwri_cancel() after
       calling xfs_buf_delwri_submit_nowait().]
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      efc3289c
    • Dave Chinner's avatar
      xfs: fix use-after-free race in xfs_buf_rele · 37fd1678
      Dave Chinner authored
      When looking at a 4.18 based KASAN use after free report, I noticed
      that racing xfs_buf_rele() may race on dropping the last reference
      to the buffer and taking the buffer lock. This was the symptom
      displayed by the KASAN report, but the actual issue that was
      reported had already been fixed in 4.19-rc1 by commit e339dd8d
      ("xfs: use sync buffer I/O for sync delwri queue submission").
      
      Despite this, I think there is still an issue with xfs_buf_rele()
      in this code:
      
              release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
              spin_lock(&bp->b_lock);
              if (!release) {
      .....
      
      If two threads race on the b_lock after both dropping a reference
      and one getting dropping the last reference so release = true, we
      end up with:
      
      CPU 0				CPU 1
      atomic_dec_and_lock()
      				atomic_dec_and_lock()
      				spin_lock(&bp->b_lock)
      spin_lock(&bp->b_lock)
      <spins>
      				<release = true bp->b_lru_ref = 0>
      				<remove from lists>
      				freebuf = true
      				spin_unlock(&bp->b_lock)
      				xfs_buf_free(bp)
      <gets lock, reading and writing freed memory>
      <accesses freed memory>
      spin_unlock(&bp->b_lock) <reads/writes freed memory>
      
      IOWs, we can't safely take bp->b_lock after dropping the hold
      reference because the buffer may go away at any time after we
      drop that reference. However, this can be fixed simply by taking the
      bp->b_lock before we drop the reference.
      
      It is safe to nest the pag_buf_lock inside bp->b_lock as the
      pag_buf_lock is only used to serialise against lookup in
      xfs_buf_find() and no other locks are held over or under the
      pag_buf_lock there. Make this clear by documenting the buffer lock
      orders at the top of the file.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      37fd1678
    • Darrick J. Wong's avatar
      xfs: always assign buffer verifiers when one is provided · 1aff5696
      Darrick J. Wong authored
      If a caller supplies buffer ops when trying to read a buffer and the
      buffer doesn't already have buf ops assigned, ensure that the ops are
      assigned to the buffer and the verifier is run on that buffer.
      
      Note that current XFS code is careful to assign buffer ops after a
      xfs_{trans_,}buf_read call in which ops were not supplied.  However, we
      should apply ops defensively in case there is ever a coding mistake; and
      an upcoming repair patch will need to be able to read a buffer without
      assigning buf ops.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      1aff5696
  3. 12 Aug, 2018 1 commit
  4. 12 Jul, 2018 4 commits
  5. 08 Jun, 2018 1 commit
  6. 06 Jun, 2018 1 commit
    • Dave Chinner's avatar
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner authored
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  7. 09 May, 2018 2 commits
  8. 09 Apr, 2018 1 commit
  9. 12 Mar, 2018 1 commit
  10. 29 Jan, 2018 1 commit
  11. 09 Jan, 2018 1 commit
  12. 08 Jan, 2018 2 commits
  13. 28 Nov, 2017 1 commit
  14. 01 Nov, 2017 1 commit
  15. 27 Oct, 2017 1 commit
  16. 26 Oct, 2017 1 commit
    • Brian Foster's avatar
      xfs: buffer lru reference count error injection tag · 7561d27e
      Brian Foster authored
      XFS uses a fixed reference count for certain types of buffers in the
      internal LRU cache. These reference counts dictate how aggressively
      certain buffers are reclaimed vs. others. While the reference counts
      implements priority across different buffer types, all buffers
      (other than uncached buffers) are typically cached for at least one
      reclaim cycle.
      
      We've had at least one bug recently that has been hidden by a
      released buffer sitting around in the LRU. Users hitting the problem
      were able to reproduce under enough memory pressure to cause
      aggressive reclaim in a particular window of time.
      
      To support future xfstests cases, add an error injection tag to
      hardcode the buffer reference count to zero. When enabled, this
      bypasses caching of associated buffers and facilitates test cases
      that depend on this behavior.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      7561d27e
  17. 26 Sep, 2017 1 commit
  18. 31 Aug, 2017 1 commit
  19. 23 Aug, 2017 1 commit
    • Christoph Hellwig's avatar
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig authored
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74d46992
  20. 19 Jun, 2017 2 commits
    • Darrick J. Wong's avatar
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong authored
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      c8ce540d
    • Brian Foster's avatar
      xfs: push buffer of flush locked dquot to avoid quotacheck deadlock · 7912e7fe
      Brian Foster authored
      Reclaim during quotacheck can lead to deadlocks on the dquot flush
      lock:
      
       - Quotacheck populates a local delwri queue with the physical dquot
         buffers.
       - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
         dirties all of the dquots.
       - Reclaim kicks in and attempts to flush a dquot whose buffer is
         already queud on the quotacheck queue. The flush succeeds but
         queueing to the reclaim delwri queue fails as the backing buffer is
         already queued. The flush unlock is now deferred to I/O completion
         of the buffer from the quotacheck queue.
       - The dqadjust bulkstat continues and dirties the recently flushed
         dquot once again.
       - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
         the flush lock to update the backing buffers with the in-core
         recalculated values. It deadlocks on the redirtied dquot as the
         flush lock was already acquired by reclaim, but the buffer resides
         on the local delwri queue which isn't submitted until the end of
         quotacheck.
      
      This is reproduced by running quotacheck on a filesystem with a
      couple million inodes in low memory (512MB-1GB) situations. This is
      a regression as of commit 43ff2122 ("xfs: on-stack delayed write
      buffer lists"), which removed a trylock and buffer I/O submission
      from the quotacheck dquot flush sequence.
      
      Quotacheck first resets and collects the physical dquot buffers in a
      delwri queue. Then, it traverses the filesystem inodes via bulkstat,
      updates the in-core dquots, flushes the corrected dquots to the
      backing buffers and finally submits the delwri queue for I/O. Since
      the backing buffers are queued across the entire quotacheck
      operation, dquot reclaim cannot possibly complete a dquot flush
      before quotacheck completes.
      
      Therefore, quotacheck must submit the buffer for I/O in order to
      cycle the flush lock and flush the dirty in-core dquot to the
      buffer. Add a delwri queue buffer push mechanism to submit an
      individual buffer for I/O without losing the delwri queue status and
      use it from quotacheck to avoid the deadlock. This restores
      quotacheck behavior to as before the regression was introduced.
      Reported-by: default avatarMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      7912e7fe
  21. 09 Jun, 2017 1 commit
  22. 08 Jun, 2017 1 commit
  23. 31 May, 2017 1 commit
    • Brian Foster's avatar
      xfs: use ->b_state to fix buffer I/O accounting release race · 63db7c81
      Brian Foster authored
      We've had user reports of unmount hangs in xfs_wait_buftarg() that
      analysis shows is due to btp->bt_io_count == -1. bt_io_count
      represents the count of in-flight asynchronous buffers and thus
      should always be >= 0. xfs_wait_buftarg() waits for this value to
      stabilize to zero in order to ensure that all untracked (with
      respect to the lru) buffers have completed I/O processing before
      unmount proceeds to tear down in-core data structures.
      
      The value of -1 implies an I/O accounting decrement race. Indeed,
      the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele()
      (where the buffer lock is no longer held) means that bp->b_flags can
      be updated from an unsafe context. While a user-level reproducer is
      currently not available, some intrusive hacks to run racing buffer
      lookups/ioacct/releases from multiple threads was used to
      successfully manufacture this problem.
      
      Existing callers do not expect to acquire the buffer lock from
      xfs_buf_rele(). Therefore, we can not safely update ->b_flags from
      this context. It turns out that we already have separate buffer
      state bits and associated serialization for dealing with buffer LRU
      state in the form of ->b_state and ->b_lock. Therefore, replace the
      _XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O
      accounting wrappers appropriately and make sure they are used with
      the correct locking. This ensures that buffer in-flight state can be
      modified at buffer release time without racing with modifications
      from a buffer lock holder.
      
      Fixes: 9c7504aa ("xfs: track and serialize in-flight async buffers against unmount")
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Tested-by: default avatarLibor Pechacek <lpechacek@suse.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      63db7c81
  24. 03 May, 2017 1 commit
  25. 25 Apr, 2017 1 commit
  26. 02 Mar, 2017 1 commit
  27. 02 Feb, 2017 1 commit
  28. 26 Jan, 2017 1 commit
    • Darrick J. Wong's avatar
      xfs: clear _XBF_PAGES from buffers when readahead page · 2aa6ba7b
      Darrick J. Wong authored
      If we try to allocate memory pages to back an xfs_buf that we're trying
      to read, it's possible that we'll be so short on memory that the page
      allocation fails.  For a blocking read we'll just wait, but for
      readahead we simply dump all the pages we've collected so far.
      
      Unfortunately, after dumping the pages we neglect to clear the
      _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
      thinks that b_pages still points to pages we own.  It then double-frees
      the b_pages pages.
      
      This results in screaming about negative page refcounts from the memory
      manager, which xfs oughtn't be triggering.  To reproduce this case,
      mount a filesystem where the size of the inodes far outweighs the
      availalble memory (a ~500M inode filesystem on a VM with 300MB memory
      did the trick here) and run bulkstat in parallel with other memory
      eating processes to put a huge load on the system.  The "check summary"
      phase of xfs_scrub also works for this purpose.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      2aa6ba7b
  29. 09 Dec, 2016 1 commit
  30. 07 Dec, 2016 1 commit
    • Lucas Stach's avatar
      xfs: use rhashtable to track buffer cache · 6031e73a
      Lucas Stach authored
      On filesystems with a lot of metadata and in metadata intensive workloads
      xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of
      the CPU time is spent on CPU cache misses while traversing the rbtree.
      
      As the buffer cache does not need any kind of ordering, but fast lookups
      a hashtable is the natural data structure to use. The rhashtable
      infrastructure provides a self-scaling hashtable implementation and
      allows lookups to proceed while the table is going through a resize
      operation.
      
      This reduces the CPU-time spent for the lookups to 1/3 even for small
      filesystems with a relatively small number of cached buffers, with
      possibly much larger gains on higher loaded filesystems.
      
      [dchinner: reduce minimum hash size to an acceptable size for large
      	   filesystems with many AGs with no active use.]
      [dchinner: remove stale rbtree asserts.]
      [dchinner: use xfs_buf_map for compare function argument.]
      [dchinner: make functions static.]
      [dchinner: remove redundant comments.]
      Signed-off-by: default avatarLucas Stach <dev@lynxeye.de>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      6031e73a
  31. 01 Nov, 2016 1 commit
  32. 26 Aug, 2016 1 commit
    • Brian Foster's avatar
      xfs: prevent dropping ioend completions during buftarg wait · 800b2694
      Brian Foster authored
      xfs_wait_buftarg() waits for all pending I/O, drains the ioend
      completion workqueue and walks the LRU until all buffers in the cache
      have been released. This is traditionally an unmount operation` but the
      mechanism is also reused during filesystem freeze.
      
      xfs_wait_buftarg() invokes drain_workqueue() as part of the quiesce,
      which is intended more for a shutdown sequence in that it indicates to
      the queue that new operations are not expected once the drain has begun.
      New work jobs after this point result in a WARN_ON_ONCE() and are
      otherwise dropped.
      
      With filesystem freeze, however, read operations are allowed and can
      proceed during or after the workqueue drain. If such a read occurs
      during the drain sequence, the workqueue infrastructure complains about
      the queued ioend completion work item and drops it on the floor. As a
      result, the buffer remains on the LRU and the freeze never completes.
      
      Despite the fact that the overall buffer cache cleanup is not necessary
      during freeze, fix up this operation such that it is safe to invoke
      during non-unmount quiesce operations. Replace the drain_workqueue()
      call with flush_workqueue(), which runs a similar serialization on
      pending workqueue jobs without causing new jobs to be dropped. This is
      safe for unmount as unmount independently locks out new operations by
      the time xfs_wait_buftarg() is invoked.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      800b2694