Commit b3998900 authored by Dave Chinner's avatar Dave Chinner

xfs: fix data corruption w/ unaligned reflink ranges

When reflinking sub-file ranges, a data corruption can occur when
the source file range includes a partial EOF block. This shares the
unknown data beyond EOF into the second file at a position inside
EOF, exposing stale data in the second file.

XFS only supports whole block sharing, but we still need to
support whole file reflink correctly.  Hence if the reflink
request includes the last block of the souce file, only proceed with
the reflink operation if it lands at or past the destination file's
current EOF. If it lands within the destination file EOF, reject the
entire request with -EINVAL and make the caller go the hard way.

This avoids the data corruption vector, but also avoids disruption
of returning EINVAL to userspace for the common case of whole file
cloning.
Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
parent dceeb47b
...@@ -1262,22 +1262,32 @@ xfs_reflink_zero_posteof( ...@@ -1262,22 +1262,32 @@ xfs_reflink_zero_posteof(
/* /*
* Prepare two files for range cloning. Upon a successful return both inodes * Prepare two files for range cloning. Upon a successful return both inodes
* will have the iolock and mmaplock held, the page cache of the out file * will have the iolock and mmaplock held, the page cache of the out file will
* will be truncated, and any leases on the out file will have been broken. * be truncated, and any leases on the out file will have been broken. This
* This function borrows heavily from xfs_file_aio_write_checks. * function borrows heavily from xfs_file_aio_write_checks.
* *
* The VFS allows partial EOF blocks to "match" for dedupe even though it hasn't * The VFS allows partial EOF blocks to "match" for dedupe even though it hasn't
* checked that the bytes beyond EOF physically match. Hence we cannot use the * checked that the bytes beyond EOF physically match. Hence we cannot use the
* EOF block in the source dedupe range because it's not a complete block match, * EOF block in the source dedupe range because it's not a complete block match,
* hence can introduce a corruption into the file that has it's * hence can introduce a corruption into the file that has it's block replaced.
* block replaced.
* *
* Despite this issue, we still need to report that range as successfully * In similar fashion, the VFS file cloning also allows partial EOF blocks to be
* deduped to avoid confusing userspace with EINVAL errors on completely * "block aligned" for the purposes of cloning entire files. However, if the
* matching file data. The only time that an unaligned length will be passed to * source file range includes the EOF block and it lands within the existing EOF
* us is when it spans the EOF block of the source file, so if we simply mask it * of the destination file, then we can expose stale data from beyond the source
* down to be block aligned here the we will dedupe everything but that partial * file EOF in the destination file.
* EOF block. *
* XFS doesn't support partial block sharing, so in both cases we have check
* these cases ourselves. For dedupe, we can simply round the length to dedupe
* down to the previous whole block and ignore the partial EOF block. While this
* means we can't dedupe the last block of a file, this is an acceptible
* tradeoff for simplicity on implementation.
*
* For cloning, we want to share the partial EOF block if it is also the new EOF
* block of the destination file. If the partial EOF block lies inside the
* existing destination EOF, then we have to abort the clone to avoid exposing
* stale data in the destination file. Hence we reject these clone attempts with
* -EINVAL in this case.
*/ */
STATIC int STATIC int
xfs_reflink_remap_prep( xfs_reflink_remap_prep(
...@@ -1293,6 +1303,7 @@ xfs_reflink_remap_prep( ...@@ -1293,6 +1303,7 @@ xfs_reflink_remap_prep(
struct inode *inode_out = file_inode(file_out); struct inode *inode_out = file_inode(file_out);
struct xfs_inode *dest = XFS_I(inode_out); struct xfs_inode *dest = XFS_I(inode_out);
bool same_inode = (inode_in == inode_out); bool same_inode = (inode_in == inode_out);
u64 blkmask = i_blocksize(inode_in) - 1;
ssize_t ret; ssize_t ret;
/* Lock both files against IO */ /* Lock both files against IO */
...@@ -1325,8 +1336,18 @@ xfs_reflink_remap_prep( ...@@ -1325,8 +1336,18 @@ xfs_reflink_remap_prep(
* from the source file so we don't try to dedupe the partial * from the source file so we don't try to dedupe the partial
* EOF block. * EOF block.
*/ */
if (is_dedupe) if (is_dedupe) {
*len &= ~((u64)i_blocksize(inode_in) - 1); *len &= ~blkmask;
} else if (*len & blkmask) {
/*
* The user is attempting to share a partial EOF block,
* if it's inside the destination EOF then reject it.
*/
if (pos_out + *len < i_size_read(inode_out)) {
ret = -EINVAL;
goto out_unlock;
}
}
/* Attach dquots to dest inode before changing block map */ /* Attach dquots to dest inode before changing block map */
ret = xfs_qm_dqattach(dest); ret = xfs_qm_dqattach(dest);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment