Commit 309c8480 authored by Dave Chinner's avatar Dave Chinner Committed by Alex Elder

xfs: delayed alloc blocks beyond EOF are valid after writeback

There is an assumption in the parts of XFS that flushing a dirty
file will make all the delayed allocation blocks disappear from an
inode. That is, that after calling xfs_flush_pages() then
ip->i_delayed_blks will be zero.

This is an invalid assumption as we may have specualtive
preallocation beyond EOF and they are recorded in
ip->i_delayed_blks. A flush of the dirty pages of an inode will not
change the state of these blocks beyond EOF, so a non-zero
deeelalloc block count after a flush is valid.

The bmap code has an invalid ASSERT() that needs to be removed, and
the swapext code has a bug in that while it swaps the data forks
around, it fails to swap the i_delayed_blks counter associated with
the fork and hence can get the block accounting wrong.
Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
parent 90810b9e
...@@ -5471,8 +5471,13 @@ xfs_getbmap( ...@@ -5471,8 +5471,13 @@ xfs_getbmap(
if (error) if (error)
goto out_unlock_iolock; goto out_unlock_iolock;
} }
/*
ASSERT(ip->i_delayed_blks == 0); * even after flushing the inode, there can still be delalloc
* blocks on the inode beyond EOF due to speculative
* preallocation. These are not removed until the release
* function is called or the inode is inactivated. Hence we
* cannot assert here that ip->i_delayed_blks == 0.
*/
} }
lock = xfs_ilock_map_shared(ip); lock = xfs_ilock_map_shared(ip);
......
...@@ -377,6 +377,19 @@ xfs_swap_extents( ...@@ -377,6 +377,19 @@ xfs_swap_extents(
ip->i_d.di_format = tip->i_d.di_format; ip->i_d.di_format = tip->i_d.di_format;
tip->i_d.di_format = tmp; tip->i_d.di_format = tmp;
/*
* The extents in the source inode could still contain speculative
* preallocation beyond EOF (e.g. the file is open but not modified
* while defrag is in progress). In that case, we need to copy over the
* number of delalloc blocks the data fork in the source inode is
* tracking beyond EOF so that when the fork is truncated away when the
* temporary inode is unlinked we don't underrun the i_delayed_blks
* counter on that inode.
*/
ASSERT(tip->i_delayed_blks == 0);
tip->i_delayed_blks = ip->i_delayed_blks;
ip->i_delayed_blks = 0;
ilf_fields = XFS_ILOG_CORE; ilf_fields = XFS_ILOG_CORE;
switch(ip->i_d.di_format) { switch(ip->i_d.di_format) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment