- 30 Oct, 2008 40 commits
-
-
David Chinner authored
If we get a race looking up a reclaimable inode, we can end up with the winner proceeding to use the inode before it has been completely re-initialised. This is a Bad Thing. Fix the race by checking whether we are still initialising the inod eonce we have a reference to it, and if so wait for the initialisation to complete before continuing. While there, fix a leaked reference count in the same code when encountering an unlinked inode and we are not doing a lookup for a create operation. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32429a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Tim Shimmin authored
On Linux all filesystems are supposed to be operating under Posix' restricted chown. Restricted chown means it restricts chown to the owner unless you have CAP_FOWNER. NOTE: that 2 files outside of fs/xfs have been modified too for this change. Reviewed-by: Dave Chinner <david@fromorbit.com> SGI-PV: 988919 SGI-Modid: xfs-linux-melb:xfs-kern:32413a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Tim Shimmin authored
On Linux all filesystems are supposed to be operating under Posix' restricted chown. Restricted chown means it restricts chown to the owner unless you have CAP_FOWNER. NOTE: that 2 files outside of fs/xfs have been modified too for this change. Reviewed-by: Dave Chinner <david@fromorbit.com> SGI-PV: 988919 SGI-Modid: 2.6.x-xfs-melb:linux:32413b Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Christoph Hellwig authored
capable_cred has been unused for a while so we can kill it and sys_cred. That also means the cred argument to xfs_setattr and xfs_change_file_space can be removed now. SGI-PV: 988918 SGI-Modid: xfs-linux-melb:xfs-kern:32412a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
Under heavy metadata load we are seeing log hangs. The AIL has items in it ready to be pushed, and they are within the push target window. However, we are not pushing them when the last pushed LSN is less than the LSN of the first log item on the AIL. This is a regression introduced by the AIL push cursor modifications. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32409a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
-
Christoph Hellwig authored
To make sure we free the security data inodes need to be freed using the proper VFS helper (which we also need to export for this). We mark these inodes bad so we can skip the flush path for them. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32398a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
To make sure we free the security data inodes need to be freed using the proper VFS helper (which we also need to export for this). We mark these inodes bad so we can skip the flush path for them. Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
xfs_bulkstat only wants the dinode, offset and buffer from a given inode number. Instead of using xfs_itobp on a fake inode which is complicated and currently leads to leaks of the security data just use xfs_inotobp which is designed to do exactly the kind of lookup xfs_bulkstat wants. The only thing that's missing in xfs_inotobp is a flags paramter that let's us pass down XFS_IMAP_BULKSTAT, but that can easily added. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32397a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
-
David Chinner authored
If we are syncing data in xfs_sync_inodes_ag(), the VFS inode must still be referencable as the dirty data state is carried on the VFS inode. hence if we can't get a reference via igrab(), the inode must be in reclaim which implies that it has no dirty data attached. Leave such inodes to the reclaim code to flush the dirty inode state to disk and so avoid attempting to access the VFS inode when it may not exist in xfs_sync_inodes_ag(). Version 4: o don't reference linux inode until after igrab() succeeds Version 3: o converted unlock/rele to an xfs_iput() call. Version 2: o change igrab logic to be more linear o remove initial reclaimable inode check now that we are using igrab() failure to find reclaimable inodes o assert that igrab failure occurs only on reclaimable inodes o clean up inode locking - only grab the iolock if we are doing a SYNC_DELWRI call and we have a dirty inode. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32391a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Peter Leckie <pleckie@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
When we are inside a radix tree preload region, we cannot sleep. Recently we moved the inode locking inside the preload region for the inode radix tree. Fix that, and fix a missed unlock in another error path in the same code at the same time. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32385a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
Christoph Hellwig authored
The dp to ip comment should be for the unconditional xfs_droplink call, and the "." link obviously only exists for directories, so it should be in the is_dir conditional. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32374a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Christoph Hellwig authored
iosizelog shouldn't be the same as iosize but the logarithm of it. Then again the current biosize option doesn't make much sense to me as it doesn't set the preferred I/O size as mentioned in the comment next to it but rather the allocation size and thus is identical to the allocsize option (except for the missing logarithm). It's also not documented in Documentation/filesystems/xfs.txt or the mount manpage. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32373a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Christoph Hellwig authored
Noquota should clear all mount options, and not just user and group quota. Probably doesn't matter very much in real life. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32372a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Christoph Hellwig authored
No need to parse the mount option into a structure before applying it to struct xfs_mount. The content of xfs_start_flags gets merged into xfs_parseargs. Calls inbetween don't care and can use mount members instead of the args struct. This patch uncovered that the mount option for shared filesystems wasn't ever exposed on Linux. The code to handle it is #if 0'ed in this patch pending a decision on this feature. I'll send a writeup about it to the list soon. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32371a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
When we are about to add a new item to a transaction in recovery, we need to check that it is valid first. Currently we just assert that header magic number matches, but in production systems that is not present and we add a corrupted transaction to the list to be processed. This results in a kernel oops later when processing the corrupted transaction. Instead, if we detect a corrupted transaction, abort recovery and leave the user to clean up the mess that has occurred. SGI-PV: 988145 SGI-Modid: xfs-linux-melb:xfs-kern:32356a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
Change all the remaining AIL API functions that are passed struct xfs_mount pointers to pass pointers directly to the struct xfs_ail being used. With this conversion, all external access to the AIL is via the struct xfs_ail. Hence the operation and referencing of the AIL is almost entirely independent of the xfs_mount that is using it - it is now much more tightly tied to the log and the items it is tracking in the log than it is tied to the xfs_mount. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32353a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Add an xfs_ail pointer to log items so that the log items can reference the AIL directly during callbacks without needed a struct xfs_mount. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32352a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
When we need to go from the log to the AIL, we have to go via the xfs_mount. Add a xfs_ail pointer to the log so we can go directly to the AIL associated with the log. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32351a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Bring the ail lock inside the struct xfs_ail. This means the AIL can be entirely manipulated via the struct xfs_ail rather than needing both the struct xfs_mount and the struct xfs_ail. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32350a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
When copying lsn's from the log item to the inode or dquot flush lsn, we currently grab the AIL lock. We do this because the LSN is a 64 bit quantity and it needs to be read atomically. The lock is used to guarantee atomicity for 32 bit platforms. Make the LSN copying a small function, and make the function used conditional on BITS_PER_LONG so that 64 bit machines don't need to take the AIL lock in these places. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32349a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
With the new cursor interface, it makes sense to make all the traversing code use the cursor interface and make the old one go away. This means more of the AIL interfacing is done by passing struct xfs_ail pointers around the place instead of struct xfs_mount pointers. We can replace the use of xfs_trans_first_ail() in xfs_log_need_covered() as it is only checking if the AIL is empty. We can do that with a call to xfs_trans_ail_tail() instead, where a zero LSN returned indicates and empty AIL... SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32348a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
To replace the current generation number ensuring sanity of the AIL traversal, replace it with an external cursor that is linked to the AIL. Basically, we store the next item in the cursor whenever we want to drop the AIL lock to do something to the current item. When we regain the lock. the current item may already be free, so we can't reference it, but the next item in the traversal is already held in the cursor. When we move or delete an object, we search all the active cursors and if there is an item match we clear the cursor(s) that point to the object. This forces the traversal to restart transparently. We don't invalidate the cursor on insert because the cursor still points to a valid item. If the intem is inserted between the current item and the cursor it does not matter; the traversal is considered to be past the insertion point so it will be picked up in the next traversal. Hence traversal restarts pretty much disappear altogether with this method of traversal, which should substantially reduce the overhead of pushing on a busy AIL. Version 2 o add restart logic o comment cursor interface o minor cleanups SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32347a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Rather than embedding the struct xfs_ail in the struct xfs_mount, allocate it during AIL initialisation. Add a back pointer to the struct xfs_ail so that we can pass around the xfs_ail and still be able to access the xfs_mount if need be. This is th first step involved in isolating the AIL implementation from the surrounding filesystem code. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32346a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
When we create a directory, we reserve a number of blocks for the maximum possible expansion of of the directory due to various btree splits, freespace allocation, etc. Unfortunately, each allocation is not reflected in the total number of blocks still available to the transaction, so the maximal reservation is used over and over again. This leads to problems where an allocation group has only enough blocks for *some* of the allocations required for the directory modification. After the first N allocations, the remaining blocks in the allocation group drops below the total reservation, and subsequent allocations fail because the allocator will not allow the allocation to proceed if the AG does not have the enough blocks available for the entire allocation total. This results in an ENOSPC occurring after an allocation has already occurred. This results in aborting the directory operation (leaving the directory in an inconsistent state) and cancelling a dirty transaction, which results in a filesystem shutdown. Avoid the problem by reflecting the number of blocks allocated in any directory expansion in the total number of blocks available to the modification in progress. This prevents a directory modification from being aborted part way through with an ENOSPC. SGI-PV: 988144 SGI-Modid: xfs-linux-melb:xfs-kern:32340a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
If the last block of the AG has inodes in it and the AG is an exactly power-of-2 size then the last inode in the AG points to the last block in the AG. If we try to find the next inode in the AG by adding one to the inode number, we increment the inode number past the size of the AG. The result is that the macro XFS_INO_TO_AGINO() will strip the AG portion of the inode number and return an inode number of zero. That is, instead of terminating the lookup loop because we hit the inode number went outside the valid range for the AG, the search index returns to zero and we start traversing the radix tree from the start again. This results in an endless loop in xfs_sync_inodes_ag(). Fix it be detecting if the new search index decreases as a result of incrementing the current inode number. That indicate an overflow and hence that we have finished processing the AG so we can terminate the loop. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32335a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Now that the deleted inodes list is unused, kill it. This also removes the i_reclaim list head from the xfs_inode, shrinking it by two pointers. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32334a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Use the reclaim tag to walk the radix tree and find the inodes under reclaim. This was the only user of the deleted inode list. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32333a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Prepare for removing the deleted inode list by marking inodes for reclaim in the inode radix trees so that we can use the radix trees to find reclaimable inodes. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32331a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
The function names xfs_finish_reclaim and xfs_finish_reclaim_all are not very descriptive of what they are reclaiming. Rename to xfs_reclaim_inode[s] to match the xfs_sync_inodes() function. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32330a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Background inode reclaim is run by the xfssyncd. Move the reclaim worker functions to be close to the sync code as the are very similar in structure and are both run from the same background thread. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32329a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
Lachlan McIlroy authored
SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32325a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
With the combined linux and XFS inode, we need to ensure that the combined structure is not freed before the generic code is finished with the inode. As it turns out, there is a case where the XFS inode is freed before the linux inode - when xfs_reclaim() is called from ->clear_inode() on a clean inode, the xfs inode is freed during that call. The generic code references the inode after the ->clear_inode() call, so this is a use after free situation. Fix the problem by moving the xfs_reclaim() call to ->destroy_inode() instead of in ->clear_inode(). This ensures the combined inode structure is not freed until after the generic code has finished with it. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32324a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
To avoid issues with different lifecycles of XFS and Linux inodes, embedd the linux inode inside the XFS inode. This means that the linux inode has the same lifecycle as the XFS inode, even when it has been released by the OS. XFS inodes don't live much longer than this (a short stint in reclaim at most), so there isn't significant memory usage penalties here. Version 3 o kill xfs_icount() Version 2 o remove unused commented out code from xfs_iget(). o kill useless cast in VFS_I() SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32323a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
To allow XFS to combine the XFS and linux inodes into a single structure, we need to drive inode lookup from the XFS inode cache, not the generic inode cache. This means that we need initialise a struct inode from a context outside alloc_inode() as it is no longer used by XFS. After inode allocation and initialisation, we need to add the inode to the superblock list, the in-use list, hash it and do some accounting. This all needs to be done with the inode_lock held and there are already several places in fs/inode.c that do this list manipulation. Factor out the common code, add a locking wrapper and export the function so ti can be called from XFS. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
To allow XFS to combine the XFS and linux inodes into a single structure, we need to drive inode lookup from the XFS inode cache, not the generic inode cache. This means that we need initialise a struct inode from a context outside alloc_inode() as it is no longer used by XFS. Factor and export the struct inode initialisation code from alloc_inode() to inode_init_always() as a counterpart to inode_init_once(). i.e. we have to call this init function for each inode instantiation (always), as opposed inode_init_once() which is only called on slab object instantiation (once). Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
Once the Linux inode and the XFS inode are combined, we cannot rely on just check if the linux inode exists as a method of determining if it is valid or not. Hence we should always call xfs_mark_inode_dirty_sync() instead as it does the correct checks to determine if the liinux inode is in a valid state or not. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32318a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
There are really two cases in xfs_iget_core(). The first is the cache hit case, the second is the miss case. They share very little code, and hence can easily be factored out into separate functions. This makes the code much easier to understand and subsequently modify. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32317a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
Christoph Hellwig authored
We can only read inode->i_count if the inode is actually there and not a NULL pointer. This was introduced in one of the recent sync patches. SGI-PV: 988255 SGI-Modid: xfs-linux-melb:xfs-kern:32315a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
With all the other filesystem sync code it in xfs_sync.c including the data quiesce code, it makes sense to move the remaining quiesce code to the same place. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32312a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
There are no more callers to xfs_sync() now, so remove the function altogther. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32311a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-