- 02 Oct, 2016 7 commits
-
-
Dave Chinner authored
-
Dave Chinner authored
-
Dave Chinner authored
-
Dave Chinner authored
-
Christoph Hellwig authored
After the call to ->direct_IO the final reference to the file might have been dropped by aio_complete already, and the call to file_accessed might cause a use after free. Instead update the access time before the I/O, similar to how we update the time stamps before writes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
After the call to __blkdev_direct_IO the final reference to the file might have been dropped by aio_complete already, and the call to file_accessed might cause a use after free. Instead update the access time before the I/O, similar to how we update the time stamps before writes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-and-tested-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
For 32-bit architectures we need to cast first_block to u64 before shifting it left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 19 Sep, 2016 24 commits
-
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Another users of buffer_heads bytes the dust. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Rename the current function to __xfs_setfilesize and add a non-static wrapper that also takes care of creating the transaction. This new helper will be used by the new iomap-based DAX path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
We always just read the extent first, and will later lock exlusively after first dropping the lock in case we actually allocate blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
So far DAX writes inherited the locking from direct I/O writes, but the direct I/O model of using shared locks for writes is actually wrong for DAX. For direct I/O we're out of any standards and don't have to provide the Posix required exclusion between writers, but for DAX which gets transparently enable on applications without any knowledge of it we can't simply drop the requirement. Even worse this only happens for aligned writes and thus doesn't show up for many typical use cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Very similar to the existing dax_fault function, but instead of using the get_block callback we rely on the iomap_ops vector from iomap.c. That also avoids having to do two calls into the file system for write faults. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
This is a much simpler implementation of the DAX read/write path that makes use of the iomap infrastructure. It does not try to mirror the direct I/O calling conventions and thus doesn't have to deal with i_dio_count or the end_io handler, but instead leaves locking and filesystem-specific I/O completion to the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
This way we can use this helper for the iomap based DAX implementation as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
This way we can use this helper for the iomap based DAX implementation as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
This allows the DAX code to use it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Currently xfs_iomap_write_delay does up to lookups in the inode extent tree, which is rather costly especially with the new iomap based write path and small write sizes. But it turns out that the low-level xfs_bmap_search_extents gives us all the information we need in the regular delalloc buffered write path: - it will return us an extent covering the block we are looking up if it exists. In that case we can simply return that extent to the caller and are done - it will tell us if we are beyoned the last current allocated block with an eof return parameter. In that case we can create a delalloc reservation and use the also returned information about the last extent in the file as the hint to size our delalloc reservation. - it can tell us that we are writing into a hole, but that there is an extent beyoned this hole. In this case we can create a delalloc reservation that covers the requested size (possible capped to the next existing allocation). All that can be done in one single routine instead of bouncing up and down a few layers. This reduced the CPU overhead of the block mapping routines and also simplified the code a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
For long growing file writes we will usually already have the eofblocks tag set when adding more speculative preallocations. Add a flag in the inode to allow us to skip the the fairly expensive AG-wide spinlocks and multiple radix tree operations in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
And drop the pointless mp argument to xfs_iomap_eof_align_last_fsb, while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
We'll need it earlier in the file soon, so the unchanged function to the top of xfs_iomap.c Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
One unfortunate quirk of the reference count and reverse mapping btrees -- they can expand in size when blocks are written to *other* allocation groups if, say, one large extent becomes a lot of tiny extents. Since we don't want to start throwing errors in the middle of CoWing, we need to reserve some blocks to handle future expansion. The transaction block reservation counters aren't sufficient here because we have to have a reserve of blocks in every AG, not just somewhere in the filesystem. Therefore, create two per-AG block reservation pools. One feeds the AGFL so that rmapbt expansion always succeeds, and the other feeds all other metadata so that refcountbt expansion never fails. Use the count of how many reserved blocks we need to have on hand to create a virtual reservation in the AG. Through selective clamping of the maximum length of allocation requests and of the length of the longest free extent, we can make it look like there's less free space in the AG unless the reservation owner is asking for blocks. In other words, play some accounting tricks in-core to make sure that we always have blocks available. On the plus side, there's nothing to clean up if we crash, which is contrast to the strategy that the rough draft used (actually removing extents from the freespace btrees). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
When xfs_defer_finish calls ->finish_item, it's possible that (refcount) won't be able to finish all the work in a single transaction. When this happens, the ->finish_item handler should shorten the log done item's list count, update the work item to reflect where work should continue, and return -EAGAIN so that defer_finish knows to retain the pending item on the pending list, roll the transaction, and restart processing where we left off. Plumb in the code and document how this mechanism is supposed to work. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Darrick J. Wong authored
Provide a helper method to count the number of blocks in a short form btree. The refcount and rmap btrees need to know the number of blocks already in use to set up their per-AG block reservations during mount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
Create a helper to generate AG btree height calculator functions. This will be used (much) later when we get to the refcount btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
Remove the xfs_btree_bigkey mess and simply make xfs_btree_key big enough to hold both keys in-core. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
Use variable length array declarations for RUI log items, and replace the open coded sizeof formulae with a single function. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Originally-From: Christoph Hellwig <hch@lst.de> This function uses the iomap infrastructure to re-write all pages in a given range. This is useful for doing a copy-up of COW ranges, and might be useful for scrubbing in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 18 Sep, 2016 1 commit
-
-
Carlos Maiolino authored
Document the implementation of error handlers into sysfs. [dchinner: Added lots more detail.] Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com>
-
- 13 Sep, 2016 5 commits
-
-
Eric Sandeen authored
As it stands today, the "fail immediately" vs. "retry forever" values for max_retries and retry_timeout_seconds in the xfs metadata error configurations are not consistent. A retry_timeout_seconds of 0 means "retry forever," but a max_retries of 0 means "fail immediately." retry_timeout_seconds < 0 is disallowed, while max_retries == -1 means "retry forever." Make this consistent across the error configs, such that a value of 0 means "fail immediately" (i.e. wait 0 seconds, or retry 0 times), and a value of -1 always means "retry forever." This makes retry_timeout a signed long to accommodate the -1, even though it stores jiffies. Given our limit of a 1 day maximum timeout, this should be sufficient even at much higher HZ values than we have available today. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Xie XiuQi authored
Use 1U for unsigned int to avoid a overflow warning from UBSAN. [ 31.910858] UBSAN: Undefined behaviour in fs/xfs/xfs_buf_item.c:889:25 [ 31.911252] signed integer overflow: [ 31.911478] -2147483648 - 1 cannot be represented in type 'int' [ 31.911846] CPU: 1 PID: 1011 Comm: tuned Tainted: G B ---- ------- 3.10.0-327.28.3.el7.x86_64 #1 [ 31.911857] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011 [ 31.911866] 1ffff1004069cd3b 0000000076bec3fd ffff8802034e69a0 ffffffff81ee3140 [ 31.911883] ffff8802034e69b8 ffffffff81ee31fd ffffffffa0ad79e0 ffff8802034e6b20 [ 31.911898] ffffffff81ee46e2 0000002d515470c0 0000000000000001 0000000041b58ab3 [ 31.911913] Call Trace: [ 31.911932] [<ffffffff81ee3140>] dump_stack+0x1e/0x20 [ 31.911947] [<ffffffff81ee31fd>] ubsan_epilogue+0x12/0x55 [ 31.911964] [<ffffffff81ee46e2>] handle_overflow+0x1ba/0x215 [ 31.912083] [<ffffffff81ee4798>] __ubsan_handle_sub_overflow+0x2a/0x31 [ 31.912204] [<ffffffffa08676fb>] xfs_buf_item_log+0x34b/0x3f0 [xfs] [ 31.912314] [<ffffffffa0880490>] xfs_trans_log_buf+0x120/0x260 [xfs] [ 31.912402] [<ffffffffa079a890>] xfs_btree_log_recs+0x80/0xc0 [xfs] [ 31.912490] [<ffffffffa07a29f8>] xfs_btree_delrec+0x11a8/0x2d50 [xfs] [ 31.913589] [<ffffffffa07a86f9>] xfs_btree_delete+0xc9/0x260 [xfs] [ 31.913762] [<ffffffffa075b5cf>] xfs_free_ag_extent+0x63f/0xe20 [xfs] [ 31.914339] [<ffffffffa075ec0f>] xfs_free_extent+0x2af/0x3e0 [xfs] [ 31.914641] [<ffffffffa0801b2b>] xfs_bmap_finish+0x32b/0x4b0 [xfs] [ 31.914841] [<ffffffffa083c2e7>] xfs_itruncate_extents+0x3b7/0x740 [xfs] [ 31.915216] [<ffffffffa08342fa>] xfs_setattr_size+0x60a/0x860 [xfs] [ 31.915471] [<ffffffffa08345ea>] xfs_vn_setattr+0x9a/0xe0 [xfs] [ 31.915590] [<ffffffff8149ad38>] notify_change+0x5c8/0x8a0 [ 31.915607] [<ffffffff81450f22>] do_truncate+0x122/0x1d0 [ 31.915640] [<ffffffff8147beee>] do_last+0x15de/0x2c80 [ 31.915707] [<ffffffff8147d777>] path_openat+0x1e7/0xcc0 [ 31.915802] [<ffffffff81480824>] do_filp_open+0xa4/0x160 [ 31.915848] [<ffffffff81453127>] do_sys_open+0x1b7/0x3f0 [ 31.915879] [<ffffffff81453392>] SyS_open+0x32/0x40 [ 31.915897] [<ffffffff81f08989>] system_call_fastpath+0x16/0x1b [ 240.086809] UBSAN: Undefined behaviour in fs/xfs/xfs_buf_item.c:866:34 [ 240.086820] signed integer overflow: [ 240.086830] -2147483648 - 1 cannot be represented in type 'int' [ 240.086846] CPU: 1 PID: 12969 Comm: rm Tainted: G B ---- ------- 3.10.0-327.28.3.el7.x86_64 #1 [ 240.086857] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011 [ 240.086868] 1ffff10040491def 00000000e2ea59c1 ffff88020248ef40 ffffffff81ee3140 [ 240.086885] ffff88020248ef58 ffffffff81ee31fd ffffffffa0ad79e0 ffff88020248f0c0 [ 240.086901] ffffffff81ee46e2 0000002d02488000 0000000000000001 0000000041b58ab3 [ 240.086915] Call Trace: [ 240.086938] [<ffffffff81ee3140>] dump_stack+0x1e/0x20 [ 240.086953] [<ffffffff81ee31fd>] ubsan_epilogue+0x12/0x55 [ 240.086971] [<ffffffff81ee46e2>] handle_overflow+0x1ba/0x215 ... Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Artem Savkov authored
Commit 2a6fba6d "xfs: only return -errno or success from attr ->put_listent" changes the returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient space in the buffer assuming that setting context->count to -1 would be enough, but all of the ->put_listent callers only check seen_enough. This results in a failed assertion: XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175 in insufficient buffer size case. This is only reproducible with at least 2 xattrs and only when the buffer gets depleted before the last one. Furthermore if buffersize is such that it is enough to hold the last xattr's name, but not enough to hold the sum of preceeding xattr names listxattr won't fail with ERANGE, but will suceed returning last xattr's name without the first character. The first character end's up overwriting data stored at (context->alist - 1). Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
oss.sgi.com is going away, move contact details over to vger. cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eryu Guan authored
"blocks" should be added back to fdblocks at undo time, not taken away, i.e. the minus sign should not be used. This is a regression introduced by commit 0d485ada ("xfs: use generic percpu counters for free block counter"). And it's found by code inspection, I didn't it in real world, so there's no reproducer. Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 30 Aug, 2016 1 commit
-
-
Darrick J. Wong authored
Christoph reports slab corruption when a deferred refcount update aborts during _defer_finish(). The cause of this was broken log item state tracking in xfs_defer_pending -- upon an abort, _defer_trans_abort() will call abort_intent on all intent items, including the ones that have already had a done item attached. This is incorrect because each intent item has 2 refcount: the first is released when the intent item is committed to the log; and the second is released when the _done_ item is committed to the log, or by the intent creator if there is no done item. In other words, once we log the done item, responsibility for releasing the intent item's second refcount is transferred to the done item and /must not/ be performed by anything else. The dfp_committed flag should have been tracking whether or not we had a done item so that _defer_trans_abort could decide if it needs to abort the intent item, but due to a thinko this was not the case. Rip it out and track the done item directly so that we do the right thing w.r.t. intent item freeing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 29 Aug, 2016 1 commit
-
-
Christoph Hellwig authored
Filesystems like XFS that use extents should not set the FIEMAP_EXTENT_MERGED flag in the fiemap extent structures. To allow for both behaviors for the upcoming gfs2 usage split the iomap type field into type and flags, and only set FIEMAP_EXTENT_MERGED if the IOMAP_F_MERGED flag is set. The flags field will also come in handy for future features such as shared extents on reflink-enabled file systems. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 26 Aug, 2016 1 commit
-
-
Brian Foster authored
xfs_wait_buftarg() waits for all pending I/O, drains the ioend completion workqueue and walks the LRU until all buffers in the cache have been released. This is traditionally an unmount operation` but the mechanism is also reused during filesystem freeze. xfs_wait_buftarg() invokes drain_workqueue() as part of the quiesce, which is intended more for a shutdown sequence in that it indicates to the queue that new operations are not expected once the drain has begun. New work jobs after this point result in a WARN_ON_ONCE() and are otherwise dropped. With filesystem freeze, however, read operations are allowed and can proceed during or after the workqueue drain. If such a read occurs during the drain sequence, the workqueue infrastructure complains about the queued ioend completion work item and drops it on the floor. As a result, the buffer remains on the LRU and the freeze never completes. Despite the fact that the overall buffer cache cleanup is not necessary during freeze, fix up this operation such that it is safe to invoke during non-unmount quiesce operations. Replace the drain_workqueue() call with flush_workqueue(), which runs a similar serialization on pending workqueue jobs without causing new jobs to be dropped. This is safe for unmount as unmount independently locks out new operations by the time xfs_wait_buftarg() is invoked. cc: <stable@vger.kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-