- 09 Nov, 2017 8 commits
-
-
Christoph Hellwig authored
Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Fix to check the correct value, and remove a duplicate handling of the uneven record number split algorith, Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Tim Hansen authored
kmem_cache_destroy already checks for null values. Signed-off-by: Tim Hansen <devtimhansen@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Darrick J. Wong authored
It turns out that we only started zeroing a new da btree node's block header on v5 filesystems. Prior to that, we just wouldn't set anything at all, which means that the pad field never got set and would retain whatever happened to be in memory. Therefore, we can only check the pad for zeroness on v5 filesystems. shared/006 on a v4 filesystem exposes this scrub bug. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
Darrick J. Wong authored
The btree scrubber has some custom code to retrieve and check a btree block via xfs_btree_lookup_get_block. This function will either return an error code (verifiers failed) or a *pblock will be untouched (bad pointer). Since we previously set *pblock to NULL, we need to check *pblock, not pblock, to trigger the early bailout. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
Darrick J. Wong authored
Fix smatch complaints about uninitialized return codes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
Darrick J. Wong authored
There are two ways to scrub an inode -- calling xfs_iget and checking the raw inode core, or by loading the inode cluster buffer and checking the on-disk contents directly. The second method is only useful if _iget fails the verifiers; when this is the case, sc->ip is NULL and calling the tracepoint will cause a system crash. Therefore, pass the raw inode number directly into the _preen and _warning functions. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
Darrick J. Wong authored
In a directory data block, the zeroth bestfree item must point to the longest free space. Therefore, when we check the bestfree block's records against the data blocks, we only need to compare with bf[0] and don't need the loop. The weird loop was most probably the result of an earlier refactoring gone bad. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
- 06 Nov, 2017 29 commits
-
-
Christoph Hellwig authored
We already did it in the forward declaration, but not for the function body itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
We already did it in the forward declaration, but not for the function body itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
[darrick: fix broken initializer in xfs_scrub_xattr] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Ever since we added the noinline tag there is no good reason to define away the static for debug builds - we'll get just as good debug information with our without it, so don't mess up sparse and other checkers due to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Neither defines an on-disk format, so move them out of xfs_format.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
This removed an unaligned load per extent, as well as the manual poking into the on-disk extent format. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
We only have two places that remove 2 extents at the same time, so unroll the loop there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
We only have two places that insert 2 extents at the same time, so unroll the loop there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Replace the current linear list and the indirection array for the in-core extent list with a b+tree to avoid the need for larger memory allocations for the indirection array when lots of extents are present. The current extent list implementations leads to heavy pressure on the memory allocator when modifying files with a high extent count, and can lead to high latencies because of that. The replacement is a b+tree with a few quirks. The leaf nodes directly store the extent record in two u64 values. The encoding is a little bit different from the existing in-core extent records so that the start offset and length which are required for lookups can be retreived with simple mask operations. The inner nodes store a 64-bit key containing the start offset in the first half of the node, and the pointers to the next lower level in the second half. In either case we walk the node from the beginninig to the end and do a linear search, as that is more efficient for the low number of cache lines touched during a search (2 for the inner nodes, 4 for the leaf nodes) than a binary search. We store termination markers (zero length for the leaf nodes, an otherwise impossible high bit for the inner nodes) to terminate the key list / records instead of storing a count to use the available cache lines as efficiently as possible. One quirk of the algorithm is that while we normally split a node half and half like usual btree implementations we just spill over entries added at the very end of the list to a new node on its own. This means we get a 100% fill grade for the common cases of bulk insertion when reading an inode into memory, and when only sequentially appending to a file. The downside is a slightly higher chance of splits on the first random insertions. Both insert and removal manually recurse into the lower levels, but the bulk deletion of the whole tree is still implemented as a recursive function call, although one limited by the overall depth and with very little stack usage in every iteration. For the first few extents we dynamically grow the list from a single extent to the next powers of two until we have a first full leaf block and that building the actual tree. The code started out based on the generic lib/btree.c code from Joern Engel based on earlier work from Peter Zijlstra, but has since been rewritten beyond recognition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
To make life a little simpler make xfs_bmbt_set_all unaligned access aware so that we can use it directly on the destination buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Supporting a small bit of data inside the inode fork blows up the fork size a lot, removing the 32 bytes of inline data halves the effective size of the inode fork (and it still has a lot of unused padding left), and the performance of a single kmalloc doesn't show up compared to the size to read an inode or create one. It also simplifies the fork management code a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Instead of looking up extents to convert and calling xfs_bmapi_write on each of them just let xfs_bmapi_write handle the full range. To make this robust add a new XFS_BMAPI_CONVERT_ONLY that only converts ranges and never allocates blocks. [darrick: shorten the stringified CONVERT_ONLY trace flag] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Match the iteration order for extent deletion in the truncate and reflink I/O completion path. This also happens to make implementing the new incore extent list a lot easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Add a new xfs_iext_cursor structure to hide the direct extent map index manipulations. In addition to the existing lookup/get/insert/ remove and update routines new primitives to get the first and last extent cursor, as well as moving up and down by one extent are provided. Also new are convenience to increment/decrement the cursor and retreive the new extent, as well as to peek into the previous/next extent without updating the cursor and last but not least a macro to iterate over all extents in a fork. [darrick: rename for_each_iext to for_each_xfs_iext] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
This actually makes the function very slightly less efficient for now as we detour through the expanded irect format between the in-core extent format and the on-disk one instead of just endian swapping them. But with the incore extent btree the in-core one will use a different format and the representation will be entirely hidden. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
This actually makes the function very slightly less efficient for now as we detour through the expanded irect format between the in-core extent format and the on-disk one instead of just endian swapping them. But with the incore extent btree the in-core one will use a different format and the representation will be entirely hidden. It also happens to make the function a whole more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
This prepares for getting rid of the current in-memory extent format. At the end of the series we will change the calling convention again to pass the xfs_bmbt_irec structure once it is available everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Two cases in xfs_bmap_add_extent_delay_real currently insert a new extent before updating the existing one that is being split. While this works fine with a simple extent list, a more complex tree can't easily cope with overlapping extent. Reshuffle the code a bit to update the slot of the existing delalloc extent to the new real extent before inserting the shortened delalloc extent before or after it. This avoids the overlapping extents while still allowing to update the br_startblock field of the delalloc extent with the updated indirect block reservation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
- 03 Nov, 2017 2 commits
-
-
Darrick J. Wong authored
The newly added xfs_scrub_da_btree_block() function has one code path that returns the 'error' variable without initializing it first, as shown by this compiler warning: fs/xfs/scrub/dabtree.c: In function 'xfs_scrub_da_btree_block': fs/xfs/scrub/dabtree.c:462:9: error: 'error' may be used uninitialized in this function [-Werror=maybe-uninitialized] Return zero since the caller will exit the scrub code if we don't produce a buffer pointer. Fixes: 7c4a07a4 ("xfs: scrub directory/attribute btrees") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Eryu Guan authored
On truncate down, if new size is not block size aligned, we zero the rest of block to avoid exposing stale data to user, and iomap_truncate_page() skips zeroing if the range is already in unwritten state or a hole. Then we writeback from on-disk i_size to the new size if this range hasn't been written to disk yet, and truncate page cache beyond new EOF and set in-core i_size. The problem is that we could write data between di_size and newsize before removing the page cache beyond newsize, as the extents may still be in unwritten state right after a buffer write. As such, the page of data that newsize lies in has not been zeroed by page cache invalidation before it is written, and xfs_do_writepage() hasn't triggered it's "zero data beyond EOF" case because we haven't updated in-core i_size yet. Then a subsequent mmap read could see non-zeros past EOF. I occasionally see this in fsx runs in fstests generic/112, a simplified fsx operation sequence is like (assuming 4k block size xfs): fallocate 0x0 0x1000 0x0 keep_size write 0x0 0x1000 0x0 truncate 0x0 0x800 0x1000 punch_hole 0x0 0x800 0x800 mapread 0x0 0x800 0x800 where fallocate allocates unwritten extent but doesn't update i_size, buffer write populates the page cache and extent is still unwritten, truncate skips zeroing page past new EOF and writes the page to disk, punch_hole invalidates the page cache, at last mapread reads the block back and sees non-zero beyond EOF. Fix it by moving truncate_setsize() to before writeback so the page cache invalidation zeros the partial page at the new EOF. This also triggers "zero data beyond EOF" in xfs_do_writepage() at writeback time, because newsize has been set and page straddles the newsize. Also fixed the wrong 'end' param of filemap_write_and_wait_range() call while we're at it, the 'end' is inclusive and should be 'newsize - 1'. Suggested-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eryu Guan <eguan@redhat.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
- 01 Nov, 2017 1 commit
-
-
Dave Chinner authored
Some were missed in the pass that converted the function return values from int to bool. Update the remaining ones for consistency. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-