An error occurred fetching the project authors.
- 29 Dec, 2018 1 commit
-
-
Julia Lawall authored
Drop LIST_HEAD where the variable it declares has never been used. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 26f1fe85 ("xfs: reduce lock hold times in buffer writeback") Signed-off-by:
Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 18 Oct, 2018 3 commits
-
-
Brian Foster authored
In the typical unmount case, the AIL is forced out by the unmount sequence before the xfsaild task is stopped. Since AIL items are removed on writeback completion, this means that the AIL ->ail_buf_list delwri queue has been drained. This is not always true in the shutdown case, however. It's possible for buffers to sit on a delwri queue for a period of time across submission attempts if said items are locked or have been relogged and pinned since first added to the queue. If the attempt to log such an item results in a log I/O error, the error processing can shutdown the fs, remove the item from the AIL, stale the buffer (dropping the LRU reference) and clear its delwri queue state. The latter bit means the buffer will be released from a delwri queue on the next submission attempt, but this might never occur if the filesystem has shutdown and the AIL is empty. This means that such buffers are held indefinitely by the AIL delwri queue across destruction of the AIL. Aside from being a memory leak, these buffers can also hold references to in-core perag structures. The latter problem manifests as a generic/475 failure, reproducing the following asserts at unmount time: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 151 XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 132 To prevent this problem, clear the AIL delwri queue as a final step before xfsaild() exit. The !empty state should never occur in the normal case, so add an assert to catch unexpected problems going forward. [dgc: add comment explaining need for xfs_buf_delwri_cancel() after calling xfs_buf_delwri_submit_nowait().] Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
When looking at a 4.18 based KASAN use after free report, I noticed that racing xfs_buf_rele() may race on dropping the last reference to the buffer and taking the buffer lock. This was the symptom displayed by the KASAN report, but the actual issue that was reported had already been fixed in 4.19-rc1 by commit e339dd8d ("xfs: use sync buffer I/O for sync delwri queue submission"). Despite this, I think there is still an issue with xfs_buf_rele() in this code: release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock); spin_lock(&bp->b_lock); if (!release) { ..... If two threads race on the b_lock after both dropping a reference and one getting dropping the last reference so release = true, we end up with: CPU 0 CPU 1 atomic_dec_and_lock() atomic_dec_and_lock() spin_lock(&bp->b_lock) spin_lock(&bp->b_lock) <spins> <release = true bp->b_lru_ref = 0> <remove from lists> freebuf = true spin_unlock(&bp->b_lock) xfs_buf_free(bp) <gets lock, reading and writing freed memory> <accesses freed memory> spin_unlock(&bp->b_lock) <reads/writes freed memory> IOWs, we can't safely take bp->b_lock after dropping the hold reference because the buffer may go away at any time after we drop that reference. However, this can be fixed simply by taking the bp->b_lock before we drop the reference. It is safe to nest the pag_buf_lock inside bp->b_lock as the pag_buf_lock is only used to serialise against lookup in xfs_buf_find() and no other locks are held over or under the pag_buf_lock there. Make this clear by documenting the buffer lock orders at the top of the file. Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
If a caller supplies buffer ops when trying to read a buffer and the buffer doesn't already have buf ops assigned, ensure that the ops are assigned to the buffer and the verifier is run on that buffer. Note that current XFS code is careful to assign buffer ops after a xfs_{trans_,}buf_read call in which ops were not supplied. However, we should apply ops defensively in case there is ever a coding mistake; and an upcoming repair patch will need to be able to read a buffer without assigning buf ops. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 12 Aug, 2018 1 commit
-
-
Eric Sandeen authored
The old lock tracking infrastructure in xfs using the b_last_holder field seems to only be useful if you can get into the system with a debugger; it seems that the existing tracepoints would be the way to go these days, and this old infrastructure can be removed. Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 12 Jul, 2018 4 commits
-
-
Brian Foster authored
Now that there is only one caller, fold the common submission helper into __xfs_buf_submit(). Suggested-by:
Christoph Hellwig <hch@infradead.org> Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
Brian Foster authored
The buffer I/O submission path consists of separate function calls per type. The buffer I/O type is already controlled via buffer state (XBF_ASYNC), however, so there is no real need for separate submission functions. Combine the buffer submission functions into a single function that processes the buffer appropriately based on XBF_ASYNC. Retain an internal helper with a conditional wait parameter to continue to support batched !XBF_ASYNC submission/completion required by delwri queues. Suggested-by:
Christoph Hellwig <hch@infradead.org> Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
Brian Foster authored
If a delwri queue occurs of a buffer that sits on a delwri queue wait list, the queue sets _XBF_DELWRI_Q without changing the state of ->b_list. This occurs, for example, if another thread beats the current delwri waiter thread to the buffer lock after I/O completion. Once the waiter acquires the lock, it removes the buffer from the wait list and leaves a buffer with _XBF_DELWRI_Q set but not populated on a list. This results in a lost buffer submission and in turn can result in assert failures due to _XBF_DELWRI_Q being set on buffer reclaim or filesystem lockups if the buffer happens to cover an item in the AIL. This problem has been reproduced by repeated iterations of xfs/305 on high CPU count (28xcpu) systems with limited memory (~1GB). Dirty dquot reclaim races with an xfsaild push of a separate dquot backed by the same buffer such that the buffer sits on the reclaim wait list at the time xfsaild attempts to queue it. Since the latter dquot has been flush locked but the underlying buffer not submitted for I/O, the dquot pins the AIL and causes the filesystem to livelock. This race is essentially made possible by the buffer lock cycle involved with waiting on a synchronous delwri queue submission. Close the race by using synchronous buffer I/O for respective delwri queue submission. This means the buffer remains locked across the I/O and so is inaccessible from other contexts while in the intermediate wait list state. The sync buffer I/O wait mechanism is factored into a helper such that sync delwri buffer submission and serialization are batched operations. Designed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
Brian Foster authored
Sync and async buffer submission both do generally similar things with a couple odd exceptions. Refactor the core buffer submission code into a common helper to isolate buffer submission from completion handling of synchronous buffer I/O. This patch does not change behavior. It is a step towards support for using synchronous buffer I/O via synchronous delwri queue submission. Designed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 08 Jun, 2018 1 commit
-
-
Dave Chinner authored
xfs_reflink_convert_cow() manipulates the incore extent list in GFP_KERNEL context in the IO submission path whilst holding locked pages under writeback. This is a memory reclaim deadlock vector. This code is not in a transaction, so any memory allocations it makes aren't protected via the memalloc_nofs_save() context that transactions carry. Hence we need to run this call under memalloc_nofs_save() context to prevent potential memory allocations from being run as GFP_KERNEL and deadlocking. Signed-Off-By:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Allison Henderson <allison.henderson@oracle.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 06 Jun, 2018 1 commit
-
-
Dave Chinner authored
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 09 May, 2018 2 commits
-
-
Dave Chinner authored
When looking at an event trace recently, I noticed that non-blocking buffer lookup attempts would fail on cached locked buffers and then run the slow cache-miss path. This means we are doing an xfs_buf allocation, lookup and free unnecessarily every time we avoid blocking on a locked buffer. Fix this by changing _xfs_buf_find() to return an error status to the caller to indicate that we failed the lock attempt rather than just returning a NULL. This allows the higher level code to discriminate between a cache miss and an cache hit that we failed to lock. This also allows us to return a -EFSCORRUPTED state if we are asked to look up a block number outside the range of the filesystem in _xfs_buf_find(), which moves us one step closer to being able to handle such errors in a more graceful manner at the higher levels. Signed-Off-By:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
Dave Chinner authored
Move xfs_buf_incore out of line and make it the only way to look up a buffer in the buffer cache from outside the buffer cache. Convert the external users of _xfs_buf_find() to xfs_buf_incore() and make _xfs_buf_find() static. Signed-Off-By:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> [darrick: actually rename xfs_incore -> xfs_buf_incore] Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 09 Apr, 2018 1 commit
-
-
Eric Sandeen authored
Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 12 Mar, 2018 1 commit
-
-
Vratislav Bendel authored
Due to an inverted logic mistake in xfs_buftarg_isolate() the xfs_buffers with zero b_lru_ref will take another trip around LRU, while isolating buffers with non-zero b_lru_ref. Additionally those isolated buffers end up right back on the LRU once they are released, because b_lru_ref remains elevated. Fix that circuitous route by leaving them on the LRU as originally intended. Signed-off-by:
Vratislav Bendel <vbendel@redhat.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 29 Jan, 2018 1 commit
-
-
Carlos Maiolino authored
Now that buffer's b_fspriv has been split, just replace the current singly linked list of xfs_log_items, by the list_head infrastructure. Also, remove the xfs_log_item argument from xfs_buf_resubmit_failed_buffers(), there is no need for this argument, once the log items can be walked through the list_head in the buffer. Signed-off-by:
Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by:
Bill O'Donnell <billodo@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style cleanups] Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 09 Jan, 2018 1 commit
-
-
Darrick J. Wong authored
If a metadata IO error happens, we report the location of the failed IO request in units of daddrs. However, the printk message misleads people into thinking that the units are fs blocks, so fix the reported units. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com>
-
- 08 Jan, 2018 2 commits
-
-
Darrick J. Wong authored
Increase the corrupt buffer dump to the first 128 bytes since v5 filesystems have larger block headers than before. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com>
-
Darrick J. Wong authored
Since all verification errors also mark the buffer as having an error, we can combine these two calls. Later we'll add a xfs_failaddr_t parameter to promote the idea of reporting corruption errors and the address of the failing check to enable better debugging reports. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com>
-
- 28 Nov, 2017 1 commit
-
-
Michal Hocko authored
percpu_counter_init failure path doesn't clean up &btp->bt_lru list. Call list_lru_destroy in that error path. Similarly register_shrinker error path is not handled. While it is unlikely to trigger these error path, it is not impossible especially the later might fail with large NUMAs. Let's handle the failure to make the code more robust. Noticed-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 01 Nov, 2017 1 commit
-
-
Darrick J. Wong authored
Move the error injection tag names into a libxfs header so that we can share it between kernel and userspace. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com>
-
- 27 Oct, 2017 1 commit
-
-
Brian Foster authored
Fix an unused variable warning on non-DEBUG builds introduced by commit 7561d27e ("xfs: buffer lru reference count error injection tag"). Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 26 Oct, 2017 1 commit
-
-
Brian Foster authored
XFS uses a fixed reference count for certain types of buffers in the internal LRU cache. These reference counts dictate how aggressively certain buffers are reclaimed vs. others. While the reference counts implements priority across different buffer types, all buffers (other than uncached buffers) are typically cached for at least one reclaim cycle. We've had at least one bug recently that has been hidden by a released buffer sitting around in the LRU. Users hitting the problem were able to reproduce under enough memory pressure to cause aggressive reclaim in a particular window of time. To support future xfstests cases, add an error injection tag to hardcode the buffer reference count to zero. When enabled, this bypasses caching of associated buffers and facilitates test cases that depend on this behavior. Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 26 Sep, 2017 1 commit
-
-
Colin Ian King authored
Variable total_nr_pages is being initialized and then updated with the same value, this latter assignment is redundant and can be removed. Cleans up clang build warning: Value stored to 'total_nr_pages' during its initialization is never read Signed-off-by:
Colin Ian King <colin.king@canonical.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 31 Aug, 2017 1 commit
-
-
Dan Williams authored
The ->iomap_begin() operation is a hot path, so cache the fs_dax_get_by_host() result at mount time to avoid the incurring the hash lookup overhead on a per-i/o basis. Reported-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- 23 Aug, 2017 1 commit
-
-
Christoph Hellwig authored
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 19 Jun, 2017 2 commits
-
-
Darrick J. Wong authored
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
Brian Foster authored
Reclaim during quotacheck can lead to deadlocks on the dquot flush lock: - Quotacheck populates a local delwri queue with the physical dquot buffers. - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and dirties all of the dquots. - Reclaim kicks in and attempts to flush a dquot whose buffer is already queud on the quotacheck queue. The flush succeeds but queueing to the reclaim delwri queue fails as the backing buffer is already queued. The flush unlock is now deferred to I/O completion of the buffer from the quotacheck queue. - The dqadjust bulkstat continues and dirties the recently flushed dquot once again. - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires the flush lock to update the backing buffers with the in-core recalculated values. It deadlocks on the redirtied dquot as the flush lock was already acquired by reclaim, but the buffer resides on the local delwri queue which isn't submitted until the end of quotacheck. This is reproduced by running quotacheck on a filesystem with a couple million inodes in low memory (512MB-1GB) situations. This is a regression as of commit 43ff2122 ("xfs: on-stack delayed write buffer lists"), which removed a trylock and buffer I/O submission from the quotacheck dquot flush sequence. Quotacheck first resets and collects the physical dquot buffers in a delwri queue. Then, it traverses the filesystem inodes via bulkstat, updates the in-core dquots, flushes the corrected dquots to the backing buffers and finally submits the delwri queue for I/O. Since the backing buffers are queued across the entire quotacheck operation, dquot reclaim cannot possibly complete a dquot flush before quotacheck completes. Therefore, quotacheck must submit the buffer for I/O in order to cycle the flush lock and flush the dirty in-core dquot to the buffer. Add a delwri queue buffer push mechanism to submit an individual buffer for I/O without losing the delwri queue status and use it from quotacheck to avoid the deadlock. This restores quotacheck behavior to as before the regression was introduced. Reported-by:
Martin Svec <martin.svec@zoner.cz> Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 09 Jun, 2017 1 commit
-
-
Christoph Hellwig authored
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 08 Jun, 2017 1 commit
-
-
Brian Foster authored
The 0-day kernel test robot reports assertion failures on !CONFIG_SMP kernels due to failed spin_is_locked() checks. As it turns out, spin_is_locked() is hardcoded to return zero on !CONFIG_SMP kernels and so this function cannot be relied on to verify spinlock state in this configuration. To avoid this problem, replace the associated asserts with lockdep variants that do the right thing regardless of kernel configuration. Drop the one assert that checks for an unlocked lock as there is no suitable lockdep variant for that case. This moves the spinlock checks from XFS debug code to lockdep, but generally provides the same level of protection. Reported-by:
kbuild test robot <fengguang.wu@intel.com> Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 31 May, 2017 1 commit
-
-
Brian Foster authored
We've had user reports of unmount hangs in xfs_wait_buftarg() that analysis shows is due to btp->bt_io_count == -1. bt_io_count represents the count of in-flight asynchronous buffers and thus should always be >= 0. xfs_wait_buftarg() waits for this value to stabilize to zero in order to ensure that all untracked (with respect to the lru) buffers have completed I/O processing before unmount proceeds to tear down in-core data structures. The value of -1 implies an I/O accounting decrement race. Indeed, the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele() (where the buffer lock is no longer held) means that bp->b_flags can be updated from an unsafe context. While a user-level reproducer is currently not available, some intrusive hacks to run racing buffer lookups/ioacct/releases from multiple threads was used to successfully manufacture this problem. Existing callers do not expect to acquire the buffer lock from xfs_buf_rele(). Therefore, we can not safely update ->b_flags from this context. It turns out that we already have separate buffer state bits and associated serialization for dealing with buffer LRU state in the form of ->b_state and ->b_lock. Therefore, replace the _XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O accounting wrappers appropriately and make sure they are used with the correct locking. This ensures that buffer in-flight state can be modified at buffer release time without racing with modifications from a buffer lock holder. Fixes: 9c7504aa ("xfs: track and serialize in-flight async buffers against unmount") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Tested-by:
Libor Pechacek <lpechacek@suse.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 03 May, 2017 1 commit
-
-
Michal Hocko authored
kmem_zalloc_large and _xfs_buf_map_pages use memalloc_noio_{save,restore} API to prevent from reclaim recursion into the fs because vmalloc can invoke unconditional GFP_KERNEL allocations and these functions might be called from the NOFS contexts. The memalloc_noio_save will enforce GFP_NOIO context which is even weaker than GFP_NOFS and that seems to be unnecessary. Let's use memalloc_nofs_{save,restore} instead as it should provide exactly what we need here - implicit GFP_NOFS context. Link: http://lkml.kernel.org/r/20170306131408.9828-6-mhocko@kernel.orgSigned-off-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 25 Apr, 2017 1 commit
-
-
Brian Foster authored
The quotacheck error handling of the delwri buffer list assumes the resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag on the buffers that are dequeued. This can lead to assert failures on buffer release and possibly other locking problems. Move this code to a delwri queue cancel helper function to encapsulate the logic required to properly release buffers from a delwri queue. Update the helper to clear the delwri queue flag and call it from quotacheck. Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 02 Mar, 2017 1 commit
-
-
Ingo Molnar authored
Update the .c files that depend on these APIs. Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 02 Feb, 2017 1 commit
-
-
Jan Kara authored
blk_get_backing_dev_info() is now a simple dereference. Remove that function and simplify some code around that. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 26 Jan, 2017 1 commit
-
-
Darrick J. Wong authored
If we try to allocate memory pages to back an xfs_buf that we're trying to read, it's possible that we'll be so short on memory that the page allocation fails. For a blocking read we'll just wait, but for readahead we simply dump all the pages we've collected so far. Unfortunately, after dumping the pages we neglect to clear the _XBF_PAGES state, which means that the subsequent call to xfs_buf_free thinks that b_pages still points to pages we own. It then double-frees the b_pages pages. This results in screaming about negative page refcounts from the memory manager, which xfs oughtn't be triggering. To reproduce this case, mount a filesystem where the size of the inodes far outweighs the availalble memory (a ~500M inode filesystem on a VM with 300MB memory did the trick here) and run bulkstat in parallel with other memory eating processes to put a huge load on the system. The "check summary" phase of xfs_scrub also works for this purpose. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Eric Sandeen <sandeen@redhat.com>
-
- 09 Dec, 2016 1 commit
-
-
Dave Chinner authored
There is no reason anymore for not issuing device integrity operations when teh filesystem requires ordering or data integrity guarantees. We should always issue cache flushes and FUA writes where necessary and let the underlying storage optimise them as necessary for correct integrity operation. Signed-Off-By:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 07 Dec, 2016 1 commit
-
-
Lucas Stach authored
On filesystems with a lot of metadata and in metadata intensive workloads xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of the CPU time is spent on CPU cache misses while traversing the rbtree. As the buffer cache does not need any kind of ordering, but fast lookups a hashtable is the natural data structure to use. The rhashtable infrastructure provides a self-scaling hashtable implementation and allows lookups to proceed while the table is going through a resize operation. This reduces the CPU-time spent for the lookups to 1/3 even for small filesystems with a relatively small number of cached buffers, with possibly much larger gains on higher loaded filesystems. [dchinner: reduce minimum hash size to an acceptable size for large filesystems with many AGs with no active use.] [dchinner: remove stale rbtree asserts.] [dchinner: use xfs_buf_map for compare function argument.] [dchinner: make functions static.] [dchinner: remove redundant comments.] Signed-off-by:
Lucas Stach <dev@lynxeye.de> Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 01 Nov, 2016 1 commit
-
-
Christoph Hellwig authored
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 26 Aug, 2016 1 commit
-
-
Brian Foster authored
xfs_wait_buftarg() waits for all pending I/O, drains the ioend completion workqueue and walks the LRU until all buffers in the cache have been released. This is traditionally an unmount operation` but the mechanism is also reused during filesystem freeze. xfs_wait_buftarg() invokes drain_workqueue() as part of the quiesce, which is intended more for a shutdown sequence in that it indicates to the queue that new operations are not expected once the drain has begun. New work jobs after this point result in a WARN_ON_ONCE() and are otherwise dropped. With filesystem freeze, however, read operations are allowed and can proceed during or after the workqueue drain. If such a read occurs during the drain sequence, the workqueue infrastructure complains about the queued ioend completion work item and drops it on the floor. As a result, the buffer remains on the LRU and the freeze never completes. Despite the fact that the overall buffer cache cleanup is not necessary during freeze, fix up this operation such that it is safe to invoke during non-unmount quiesce operations. Replace the drain_workqueue() call with flush_workqueue(), which runs a similar serialization on pending workqueue jobs without causing new jobs to be dropped. This is safe for unmount as unmount independently locks out new operations by the time xfs_wait_buftarg() is invoked. cc: <stable@vger.kernel.org> Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-