- 13 Apr, 2015 7 commits
-
-
Dave Chinner authored
Conflicts: fs/xfs/xfs_iops.c
-
Christoph Hellwig authored
We want to drop all I/O path locks when recalling layouts, and that includes i_mutex for the write path. Without this we get stuck processe when recalls take too long. [dchinner: fix build with !CONFIG_PNFS] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
xfs_attr3_leaf_remove() removes an attribute from an attr leaf block. If the attribute nameval data happens to be at the start of the nameval region, a new start offset (firstused) for the region is calculated (since the region grows from the tail of the block to the start). Once the new firstused is calculated, it is checked for zero in an apparent overflow check. Now that the in-core firstused is 32-bit, overflow is not possible and this check can be removed. Since the purpose for this check is not documented and appears to exist since the port to Linux, be conservative and replace it with an assert. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
The on-disk xfs_attr3_leaf_hdr structure firstused field is 16-bit and subject to overflow when fs block size is 64k. The field is typically initialized to block size when an attr leaf block is initialized. This problem is demonstrated by assert failures when running xfstests generic/117 on an fs with 64k blocks. To support the existing attr leaf block algorithms for insertion, rebalance and entry movement, increase the size of the in-core firstused field to 32-bit and handle the potential overflow on conversion to/from the on-disk structure. If the overflow condition occurs, set a special value in the firstused field that is translated back on header read. The special value is only required in the case of an empty 64k attr block. A value of zero is used because firstused is initialized to the block size and grows backwards from there. Furthermore, the attribute block header occupies the first bytes of the block. Thus, a value of zero has no other legitimate meaning for this structure. Two new conversion helpers are created to manage the conversion of firstused to and from disk. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
The firstused field of the xfs_attr3_leaf_hdr structure is subject to an overflow when fs blocksize is 64k. In preparation to handle this overflow in the header conversion functions, pass the attribute geometry to the functions that convert the in-core structure to and from the on-disk structure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
There's a bit of a loophole in norecovery mount handling right now: an initial mount must be readonly, but nothing prevents a mount -o remount,rw from producing a writable, unrecovered xfs filesystem. It might be possible to try to perform a log recovery when this is requested, but I'm not sure it's worth the effort. For now, simply disallow this sort of transition. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
kbuild test robot authored
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 25 Mar, 2015 17 commits
-
-
Dave Chinner authored
-
Dave Chinner authored
Conflicts: fs/xfs/libxfs/xfs_bmap.c fs/xfs/xfs_inode.c
-
Namjae Jeon authored
This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS. 1) Make sure that both offset and len are block size aligned. 2) Update the i_size of inode by len bytes. 3) Compute the file's logical block number against offset. If the computed block number is not the starting block of the extent, split the extent such that the block number is the starting block of the extent. 4) Shift all the extents which are lying bewteen [offset, last allocated extent] towards right by len bytes. This step will make a hole of len bytes at offset. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Namjae Jeon authored
FALLOC_FL_INSERT_RANGE command is the opposite command of FALLOC_FL_COLLAPSE_RANGE that is needed for someone who wants to add some data in the middle of file. FALLOC_FL_INSERT_RANGE will create space for writing new data within a file after shifting extents to right as given length. This command also has same limitations as FALLOC_FL_COLLAPSE_RANGE in that operations need to be filesystem block boundary aligned and cannot cross the current EOF. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Joe Perches authored
added a positive error return value. This value filters up through the return layers and should be negative as the other return values are in the same function. Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Byoungyoung Lee authored
xfs_mru_cache_insert() can be called from within transaction context during block allocation like so: write(2) .... xfs_get_blocks xfs_iomap_write_direct start transaction xfs_bmapi_write xfs_bmapi_allocate xfs_bmap_btalloc xfs_bmap_btalloc_filestreams xfs_filestream_new_ag xfs_filestream_pick_ag xfs_mru_cache_insert radix_tree_preload(GFP_KERNEL) In this case, GFP_KERNEL is incorrect and can potentially lead to deadlocks in memory reclaim. It should use GFP_NOFS allocations to avoid lock recursion problems. [dchinner: rewrote commit message] Signed-off-by: Byoungyoung Lee <blee@gatech.edu> Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Scott Wood authored
Use %pS for actual addresses, otherwise you'll get bad output on arches like ppc64 where %pF expects a function descriptor. Signed-off-by: Scott Wood <scottwood@freescale.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Fabian Frederick authored
Use icnodehdr for struct xfs_da3_icnode_hdr instead of nodehdr (already declared above). Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Fabian Frederick authored
new_parent and src_is_directory are only used in 0/1 context. Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
If xfs_filestream_get_parent() fails, we have a null pip, goto out, and attempt to IRELE(NULL). This causes a null pointer dereference and BUG(). Fix this by directly returning NULLAGNUMBER in this case. Reported-by: Adrien Nader <adrien@notk.org> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
This code is redundant now that we have verifiers that sanity check the buffers as they are read from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Conflicts: fs/xfs/xfs_inode.c
-
Dave Chinner authored
Whiteouts are used by overlayfs - it has a crazy convention that a whiteout is a character device inode with a major:minor of 0:0. Because it's not documented anywhere, here's an example of what RENAME_WHITEOUT does on ext4: # echo foo > /mnt/scratch/foo # echo bar > /mnt/scratch/bar # ls -l /mnt/scratch total 24 -rw-r--r-- 1 root root 4 Feb 11 20:22 bar -rw-r--r-- 1 root root 4 Feb 11 20:22 foo drwx------ 2 root root 16384 Feb 11 20:18 lost+found # src/renameat2 -w /mnt/scratch/foo /mnt/scratch/bar # ls -l /mnt/scratch total 20 -rw-r--r-- 1 root root 4 Feb 11 20:22 bar c--------- 1 root root 0, 0 Feb 11 20:23 foo drwx------ 2 root root 16384 Feb 11 20:18 lost+found # cat /mnt/scratch/bar foo # In XFS rename terms, the operation that has been done is that source (foo) has been moved to the target (bar), which is like a nomal rename operation, but rather than the source being removed, it have been replaced with a whiteout. We can't allocate whiteout inodes within the rename transaction due to allocation being a multi-commit transaction: rename needs to be a single, atomic commit. Hence we have several options here, form most efficient to least efficient: - use DT_WHT in the target dirent and do no whiteout inode allocation. The main issue with this approach is that we need hooks in lookup to create a virtual chardev inode to present to userspace and in places where we might need to modify the dirent e.g. unlink. Overlayfs also needs to be taught about DT_WHT. Most invasive change, lowest overhead. - create a special whiteout inode in the root directory (e.g. a ".wino" dirent) and then hardlink every new whiteout to it. This means we only need to create a single whiteout inode, and rename simply creates a hardlink to it. We can use DT_WHT for these, though using DT_CHR means we won't have to modify overlayfs, nor anything in userspace. Downside is we have to look up the whiteout inode on every operation and create it if it doesn't exist. - copy ext4: create a special whiteout chardev inode for every whiteout. This is more complex than the above options because of the lack of atomicity between inode creation and the rename operation, requiring us to create a tmpfile inode and then linking it into the directory structure during the rename. At least with a tmpfile inode crashes between the create and rename doesn't leave unreferenced inodes or directory pollution around. By far the simplest thing to do in the short term is to copy ext4. While it is the most inefficient way of supporting whiteouts, but as an initial implementation we can simply reuse existing functions and add a small amount of extra code the the rename operation. When we get full whiteout support in the VFS (via the dentry cache) we can then look to supporting DT_WHT method outlined as the first method of supporting whiteouts. But until then, we'll stick with what overlayfs expects us to be: dumb and stupid. Signed-off-by: Dave Chinner <dchinner@redhat.com>
-
Dave Chinner authored
Now that xfs_finish_rename() exists, there is no reason for xfs_cross_rename() to return to xfs_rename() to finish off the rename transaction. Drive the completion code into xfs_cross_rename() and handle all errors there so as to simplify the xfs_rename() code. Further, push the rename exchange target_ip check to early in the rename code so as to make the error handling easy and obviously correct. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Rather than use a jump label for the final transaction commit in the rename, factor it into a simple helper function and call it appropriately. This slightly reduces the spaghetti nature of xfs_rename. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
The jump labels are ambiguous and unclear and some of the error paths are used inconsistently. Rules for error jumps are: - use out_trans_cancel for unmodified transaction context - use out_bmap_cancel on ENOSPC errors - use out_trans_abort when transaction is likely to be dirty. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
When doing RENAME_WHITEOUT, we now have to lock 5 inodes into the rename transaction. This means we need to update xfs_sort_for_rename() and xfs_lock_inodes() to handle up to 5 inodes. Because of the vagaries of rename, this means we could have anywhere between 3 and 5 inodes locked into the transaction.... While xfs_lock_inodes() does not need anything other than an assert telling us we are passing more inodes that we ever thought we should see, it could do with a logic rework to remove all the indenting. This is not a functional change - it just makes the code a lot easier to read. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 23 Feb, 2015 16 commits
-
-
Dave Chinner authored
-
Dave Chinner authored
Conflicts: fs/xfs/xfs_super.c
-
Dave Chinner authored
-
Eric Sandeen authored
We recently removed deprecated sysctls; may as well remove deprecated mount options as well, we've stated that they'd be gone by now in the docs. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Test generic/224 is failing with a corruption being detected on one of Michael's test boxes. Debug that Michael added is indicating that the minleft trimming is resulting in an underflow: ..... before fixup: rlen 1 args->len 0 after xfs_alloc_fix_len : rlen 1 args->len 1 before goto out_nominleft: rlen 1 args->len 0 before fixup: rlen 1 args->len 0 after xfs_alloc_fix_len : rlen 1 args->len 1 after fixup: rlen 1 args->len 1 before fixup: rlen 1 args->len 0 after xfs_alloc_fix_len : rlen 1 args->len 1 after fixup: rlen 4294967295 args->len 4294967295 XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 1424 The "goto out_nominleft:" indicates that we are getting close to ENOSPC in the AG, and a couple of allocations later we underflow and the corruption check fires in xfs_alloc_ag_vextent_size(). The issue is that the extent length fixups comaprisons are done with variables of xfs_extlen_t types. These are unsigned so an underflow looks like a really big value and hence is not detected as being smaller than the minimum length allowed for the extent. Hence the corruption check fires as it is noticing that the returned length is longer than the original extent length passed in. This can be easily fixed by ensuring we do the underflow test on signed values, the same way xfs_alloc_fix_len() prevents underflow. So we realise in future that these casts prevent underflows from going undetected, add comments to the code indicating this. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Tested-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
If xfs_trans_reserve fails we don't cancel the transaction, and we'll leak the allocated transaction pointer. Spotted by Coverity. Signed-off-by: Eric Sandeen <ssandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Wang Sheng-Hui authored
The error messages document the reason for the checks better than the comment and the comments about volume mounts date back to Irix and so aren't relevant any more. So just remove the old and redundant comment. Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
Today, when the "failing async writes" get ratelimited, we see: XFS:: 62836 callbacks suppressed Aside from the extra ":" it's not entirely clear which message is being suppressed, especially if other messages or ratelimits are happening at the same time. Clarify this as i.e.: XFS (dm-11): Failing async write on buffer block 0x140090. Retrying async write. XFS: Failing async write: 62836 callbacks suppressed Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
There are times, when doing triage and forensics, that we would like to know whether a filesystem was unmounted, or if the plug was pulled without a clean unmount. Log unmounts at the same level (NOTICE) as we log mounts. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
We shouldn't get here with RENAME_EXCHANGE set and no target_ip, but let's be defensive, because xfs_cross_rename() will dereference it. Spotted by Coverity. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
Today, if we hit an XFS_WANT_CORRUPTED_RETURN we don't print any information about which filesystem hit it. Passing in the mp allows us to print the filesystem (device) name, which is a pretty critical piece of information. Tested by running fsfuzzer 'til I hit some. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
Today, if we hit an XFS_WANT_CORRUPTED_GOTO we don't print any information about which filesystem hit it. Passing in the mp allows us to print the filesystem (device) name, which is a pretty critical piece of information. Tested by running fsfuzzer 'til I hit some. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Al Viro noticed a generic set of issues to do with filehandle lookup racing with dentry cache setup. They involve a filehandle lookup occurring while an inode is being created and the filehandle lookup racing with the dentry creation for the real file. This can lead to multiple dentries for the one path being instantiated. There are a host of other issues around this same set of paths. The underlying cause is that file handle lookup only waits on inode cache instantiation rather than full dentry cache instantiation. XFS is mostly immune to the problems discovered due to it's own internal inode cache, but there are a couple of corner cases where races can happen. We currently clear the XFS_INEW flag when the inode is fully set up after insertion into the cache. Newly allocated inodes are inserted locked and so aren't usable until the allocation transaction commits. This, however, occurs before the dentry and security information is fully initialised and hence the inode is unlocked and available for lookups to find too early. To solve the problem, only clear the XFS_INEW flag for newly created inodes once the dentry is fully instantiated. This means lookups will retry until the XFS_INEW flag is removed from the inode and hence avoids the race conditions in questions. THis also means that xfs_create(), xfs_create_tmpfile() and xfs_symlink() need to finish the setup of the inode in their error paths if we had allocated the inode but failed later in the creation process. xfs_symlink(), in particular, needed a lot of help to make it's error handling match that of xfs_create(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
A new fsync vs power fail test in xfstests indicated that XFS can have unreliable data consistency when doing extending truncates that require block zeroing. The blocks beyond EOF get zeroed in memory, but we never force those changes to disk before we run the transaction that extends the file size and exposes those blocks to userspace. This can result in the blocks not being correctly zeroed after a crash. Because in-memory behaviour is correct, tools like fsx don't pick up any coherency problems - it's not until the filesystem is shutdown or the system crashes after writing the truncate transaction to the journal but before the zeroed data in the page cache is flushed that the issue is exposed. Fix this by also flushing the dirty data in memory region between the old size and new size when we've found blocks that need zeroing in the truncate process. Reported-by: Liu Bo <bo.li.liu@oracle.com> cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Jan Kara authored
For filesystems without separate project quota inode field in the superblock we just reuse project quota file for group quotas (and vice versa) if project quota file is allocated and we need group quota file. When we reuse the file, quota structures on disk suddenly have wrong type stored in d_flags though. Nobody really cares about this (although structure type reported to userspace was wrong as well) except that after commit 14bf61ff (quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units) assertion in xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I was testing without XFS_DEBUG so I didn't notice when submitting the above commit). Fix the problem by properly resetting ddq->d_flags when running quotacheck for a quota file. CC: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Extent swap operations are another extent manipulation operation that we need to ensure does not race against mmap page faults. The current code returns if the file is mapped prior to the swap being done, but it could potentially race against new page faults while the swap is in progress. Hence we should use the XFS_MMAPLOCK_EXCL for this operation, too. While there, fix the error path handling that can result in double unlocks of the inodes when cancelling the swapext transaction. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-