Commit f783529b authored by Darrick J. Wong's avatar Darrick J. Wong

docs: update swapext -> exchmaps language

Start reworking the atomic swapext design documentation to refer to its
new file contents/mapping exchange name.
Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
parent 14f19991
...@@ -2167,7 +2167,7 @@ The ``xfblob_free`` function frees a specific blob, and the ``xfblob_truncate`` ...@@ -2167,7 +2167,7 @@ The ``xfblob_free`` function frees a specific blob, and the ``xfblob_truncate``
function frees them all because compaction is not needed. function frees them all because compaction is not needed.
The details of repairing directories and extended attributes will be discussed The details of repairing directories and extended attributes will be discussed
in a subsequent section about atomic extent swapping. in a subsequent section about atomic file content exchanges.
However, it should be noted that these repair functions only use blob storage However, it should be noted that these repair functions only use blob storage
to cache a small number of entries before adding them to a temporary ondisk to cache a small number of entries before adding them to a temporary ondisk
file, which is why compaction is not required. file, which is why compaction is not required.
...@@ -2802,7 +2802,8 @@ follows this format: ...@@ -2802,7 +2802,8 @@ follows this format:
Repairs for file-based metadata such as extended attributes, directories, Repairs for file-based metadata such as extended attributes, directories,
symbolic links, quota files and realtime bitmaps are performed by building a symbolic links, quota files and realtime bitmaps are performed by building a
new structure attached to a temporary file and swapping the forks. new structure attached to a temporary file and exchanging all mappings in the
file forks.
Afterward, the mappings in the old file fork are the candidate blocks for Afterward, the mappings in the old file fork are the candidate blocks for
disposal. disposal.
...@@ -3851,8 +3852,8 @@ Because file forks can consume as much space as the entire filesystem, repairs ...@@ -3851,8 +3852,8 @@ Because file forks can consume as much space as the entire filesystem, repairs
cannot be staged in memory, even when a paging scheme is available. cannot be staged in memory, even when a paging scheme is available.
Therefore, online repair of file-based metadata createas a temporary file in Therefore, online repair of file-based metadata createas a temporary file in
the XFS filesystem, writes a new structure at the correct offsets into the the XFS filesystem, writes a new structure at the correct offsets into the
temporary file, and atomically swaps the fork mappings (and hence the fork temporary file, and atomically exchanges all file fork mappings (and hence the
contents) to commit the repair. fork contents) to commit the repair.
Once the repair is complete, the old fork can be reaped as necessary; if the Once the repair is complete, the old fork can be reaped as necessary; if the
system goes down during the reap, the iunlink code will delete the blocks system goes down during the reap, the iunlink code will delete the blocks
during log recovery. during log recovery.
...@@ -3862,10 +3863,11 @@ consistent to use a temporary file safely! ...@@ -3862,10 +3863,11 @@ consistent to use a temporary file safely!
This dependency is the reason why online repair can only use pageable kernel This dependency is the reason why online repair can only use pageable kernel
memory to stage ondisk space usage information. memory to stage ondisk space usage information.
Swapping metadata extents with a temporary file requires the owner field of the Exchanging metadata file mappings with a temporary file requires the owner
block headers to match the file being repaired and not the temporary file. The field of the block headers to match the file being repaired and not the
directory, extended attribute, and symbolic link functions were all modified to temporary file.
allow callers to specify owner numbers explicitly. The directory, extended attribute, and symbolic link functions were all
modified to allow callers to specify owner numbers explicitly.
There is a downside to the reaping process -- if the system crashes during the There is a downside to the reaping process -- if the system crashes during the
reap phase and the fork extents are crosslinked, the iunlink processing will reap phase and the fork extents are crosslinked, the iunlink processing will
...@@ -3974,8 +3976,8 @@ The proposed patches are in the ...@@ -3974,8 +3976,8 @@ The proposed patches are in the
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_ <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_
series. series.
Atomic Extent Swapping Logged File Content Exchanges
---------------------- -----------------------------
Once repair builds a temporary file with a new data structure written into Once repair builds a temporary file with a new data structure written into
it, it must commit the new changes into the existing file. it, it must commit the new changes into the existing file.
...@@ -4010,17 +4012,21 @@ e. Old blocks in the file may be cross-linked with another structure and must ...@@ -4010,17 +4012,21 @@ e. Old blocks in the file may be cross-linked with another structure and must
These problems are overcome by creating a new deferred operation and a new type These problems are overcome by creating a new deferred operation and a new type
of log intent item to track the progress of an operation to exchange two file of log intent item to track the progress of an operation to exchange two file
ranges. ranges.
The new deferred operation type chains together the same transactions used by The new exchange operation type chains together the same transactions used by
the reverse-mapping extent swap code. the reverse-mapping extent swap code, but records intermedia progress in the
log so that operations can be restarted after a crash.
This new functionality is called the file contents exchange (xfs_exchrange)
code.
The underlying implementation exchanges file fork mappings (xfs_exchmaps).
The new log item records the progress of the exchange to ensure that once an The new log item records the progress of the exchange to ensure that once an
exchange begins, it will always run to completion, even there are exchange begins, it will always run to completion, even there are
interruptions. interruptions.
The new ``XFS_SB_FEAT_INCOMPAT_LOG_ATOMIC_SWAP`` log-incompatible feature flag The new ``XFS_SB_FEAT_INCOMPAT_EXCHRANGE`` incompatible feature flag
in the superblock protects these new log item records from being replayed on in the superblock protects these new log item records from being replayed on
old kernels. old kernels.
The proposed patchset is the The proposed patchset is the
`atomic extent swap `file contents exchange
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_ <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_
series. series.
...@@ -4061,72 +4067,73 @@ series. ...@@ -4061,72 +4067,73 @@ series.
| The feature bit will not be cleared from the superblock until the log | | The feature bit will not be cleared from the superblock until the log |
| becomes clean. | | becomes clean. |
| | | |
| Log-assisted extended attribute updates and atomic extent swaps both use | | Log-assisted extended attribute updates and file content exchanges bothe |
| log incompat features and provide convenience wrappers around the | | use log incompat features and provide convenience wrappers around the |
| functionality. | | functionality. |
+--------------------------------------------------------------------------+ +--------------------------------------------------------------------------+
Mechanics of an Atomic Extent Swap Mechanics of a Logged File Content Exchange
`````````````````````````````````` ```````````````````````````````````````````
Swapping entire file forks is a complex task. Exchanging contents between file forks is a complex task.
The goal is to exchange all file fork mappings between two file fork offset The goal is to exchange all file fork mappings between two file fork offset
ranges. ranges.
There are likely to be many extent mappings in each fork, and the edges of There are likely to be many extent mappings in each fork, and the edges of
the mappings aren't necessarily aligned. the mappings aren't necessarily aligned.
Furthermore, there may be other updates that need to happen after the swap, Furthermore, there may be other updates that need to happen after the exchange,
such as exchanging file sizes, inode flags, or conversion of fork data to local such as exchanging file sizes, inode flags, or conversion of fork data to local
format. format.
This is roughly the format of the new deferred extent swap work item: This is roughly the format of the new deferred exchange-mapping work item:
.. code-block:: c .. code-block:: c
struct xfs_swapext_intent { struct xfs_exchmaps_intent {
/* Inodes participating in the operation. */ /* Inodes participating in the operation. */
struct xfs_inode *sxi_ip1; struct xfs_inode *xmi_ip1;
struct xfs_inode *sxi_ip2; struct xfs_inode *xmi_ip2;
/* File offset range information. */ /* File offset range information. */
xfs_fileoff_t sxi_startoff1; xfs_fileoff_t xmi_startoff1;
xfs_fileoff_t sxi_startoff2; xfs_fileoff_t xmi_startoff2;
xfs_filblks_t sxi_blockcount; xfs_filblks_t xmi_blockcount;
/* Set these file sizes after the operation, unless negative. */ /* Set these file sizes after the operation, unless negative. */
xfs_fsize_t sxi_isize1; xfs_fsize_t xmi_isize1;
xfs_fsize_t sxi_isize2; xfs_fsize_t xmi_isize2;
/* XFS_SWAP_EXT_* log operation flags */ /* XFS_EXCHMAPS_* log operation flags */
uint64_t sxi_flags; uint64_t xmi_flags;
}; };
The new log intent item contains enough information to track two logical fork The new log intent item contains enough information to track two logical fork
offset ranges: ``(inode1, startoff1, blockcount)`` and ``(inode2, startoff2, offset ranges: ``(inode1, startoff1, blockcount)`` and ``(inode2, startoff2,
blockcount)``. blockcount)``.
Each step of a swap operation exchanges the largest file range mapping possible Each step of an exchange operation exchanges the largest file range mapping
from one file to the other. possible from one file to the other.
After each step in the swap operation, the two startoff fields are incremented After each step in the exchange operation, the two startoff fields are
and the blockcount field is decremented to reflect the progress made. incremented and the blockcount field is decremented to reflect the progress
The flags field captures behavioral parameters such as swapping the attr fork made.
instead of the data fork and other work to be done after the extent swap. The flags field captures behavioral parameters such as exchanging attr fork
The two isize fields are used to swap the file size at the end of the operation mappings instead of the data fork and other work to be done after the exchange.
if the file data fork is the target of the swap operation. The two isize fields are used to exchange the file sizes at the end of the
operation if the file data fork is the target of the operation.
When the extent swap is initiated, the sequence of operations is as follows:
When the exchange is initiated, the sequence of operations is as follows:
1. Create a deferred work item for the extent swap.
At the start, it should contain the entirety of the file ranges to be 1. Create a deferred work item for the file mapping exchange.
swapped. At the start, it should contain the entirety of the file block ranges to be
exchanged.
2. Call ``xfs_defer_finish`` to process the exchange. 2. Call ``xfs_defer_finish`` to process the exchange.
This is encapsulated in ``xrep_tempswap_contents`` for scrub operations. This is encapsulated in ``xrep_tempexch_contents`` for scrub operations.
This will log an extent swap intent item to the transaction for the deferred This will log an extent swap intent item to the transaction for the deferred
extent swap work item. mapping exchange work item.
3. Until ``sxi_blockcount`` of the deferred extent swap work item is zero, 3. Until ``xmi_blockcount`` of the deferred mapping exchange work item is zero,
a. Read the block maps of both file ranges starting at ``sxi_startoff1`` and a. Read the block maps of both file ranges starting at ``xmi_startoff1`` and
``sxi_startoff2``, respectively, and compute the longest extent that can ``xmi_startoff2``, respectively, and compute the longest extent that can
be swapped in a single step. be exchanged in a single step.
This is the minimum of the two ``br_blockcount`` s in the mappings. This is the minimum of the two ``br_blockcount`` s in the mappings.
Keep advancing through the file forks until at least one of the mappings Keep advancing through the file forks until at least one of the mappings
contains written blocks. contains written blocks.
...@@ -4148,20 +4155,20 @@ When the extent swap is initiated, the sequence of operations is as follows: ...@@ -4148,20 +4155,20 @@ When the extent swap is initiated, the sequence of operations is as follows:
g. Extend the ondisk size of either file if necessary. g. Extend the ondisk size of either file if necessary.
h. Log an extent swap done log item for the extent swap intent log item h. Log a mapping exchange done log item for th mapping exchange intent log
that was read at the start of step 3. item that was read at the start of step 3.
i. Compute the amount of file range that has just been covered. i. Compute the amount of file range that has just been covered.
This quantity is ``(map1.br_startoff + map1.br_blockcount - This quantity is ``(map1.br_startoff + map1.br_blockcount -
sxi_startoff1)``, because step 3a could have skipped holes. xmi_startoff1)``, because step 3a could have skipped holes.
j. Increase the starting offsets of ``sxi_startoff1`` and ``sxi_startoff2`` j. Increase the starting offsets of ``xmi_startoff1`` and ``xmi_startoff2``
by the number of blocks computed in the previous step, and decrease by the number of blocks computed in the previous step, and decrease
``sxi_blockcount`` by the same quantity. ``xmi_blockcount`` by the same quantity.
This advances the cursor. This advances the cursor.
k. Log a new extent swap intent log item reflecting the advanced state of k. Log a new mapping exchange intent log item reflecting the advanced state
the work item. of the work item.
l. Return the proper error code (EAGAIN) to the deferred operation manager l. Return the proper error code (EAGAIN) to the deferred operation manager
to inform it that there is more work to be done. to inform it that there is more work to be done.
...@@ -4172,22 +4179,23 @@ When the extent swap is initiated, the sequence of operations is as follows: ...@@ -4172,22 +4179,23 @@ When the extent swap is initiated, the sequence of operations is as follows:
This will be discussed in more detail in subsequent sections. This will be discussed in more detail in subsequent sections.
If the filesystem goes down in the middle of an operation, log recovery will If the filesystem goes down in the middle of an operation, log recovery will
find the most recent unfinished extent swap log intent item and restart from find the most recent unfinished maping exchange log intent item and restart
there. from there.
This is how extent swapping guarantees that an outside observer will either see This is how atomic file mapping exchanges guarantees that an outside observer
the old broken structure or the new one, and never a mismash of both. will either see the old broken structure or the new one, and never a mismash of
both.
Preparation for Extent Swapping Preparation for File Content Exchanges
``````````````````````````````` ``````````````````````````````````````
There are a few things that need to be taken care of before initiating an There are a few things that need to be taken care of before initiating an
atomic extent swap operation. atomic file mapping exchange operation.
First, regular files require the page cache to be flushed to disk before the First, regular files require the page cache to be flushed to disk before the
operation begins, and directio writes to be quiesced. operation begins, and directio writes to be quiesced.
Like any filesystem operation, extent swapping must determine the maximum Like any filesystem operation, file mapping exchanges must determine the
amount of disk space and quota that can be consumed on behalf of both files in maximum amount of disk space and quota that can be consumed on behalf of both
the operation, and reserve that quantity of resources to avoid an unrecoverable files in the operation, and reserve that quantity of resources to avoid an
out of space failure once it starts dirtying metadata. unrecoverable out of space failure once it starts dirtying metadata.
The preparation step scans the ranges of both files to estimate: The preparation step scans the ranges of both files to estimate:
- Data device blocks needed to handle the repeated updates to the fork - Data device blocks needed to handle the repeated updates to the fork
...@@ -4201,56 +4209,59 @@ The preparation step scans the ranges of both files to estimate: ...@@ -4201,56 +4209,59 @@ The preparation step scans the ranges of both files to estimate:
to different extents on the realtime volume, which could happen if the to different extents on the realtime volume, which could happen if the
operation fails to run to completion. operation fails to run to completion.
The need for precise estimation increases the run time of the swap operation, The need for precise estimation increases the run time of the exchange
but it is very important to maintain correct accounting. operation, but it is very important to maintain correct accounting.
The filesystem must not run completely out of free space, nor can the extent The filesystem must not run completely out of free space, nor can the mapping
swap ever add more extent mappings to a fork than it can support. exchange ever add more extent mappings to a fork than it can support.
Regular users are required to abide the quota limits, though metadata repairs Regular users are required to abide the quota limits, though metadata repairs
may exceed quota to resolve inconsistent metadata elsewhere. may exceed quota to resolve inconsistent metadata elsewhere.
Special Features for Swapping Metadata File Extents Special Features for Exchanging Metadata File Contents
``````````````````````````````````````````````````` ``````````````````````````````````````````````````````
Extended attributes, symbolic links, and directories can set the fork format to Extended attributes, symbolic links, and directories can set the fork format to
"local" and treat the fork as a literal area for data storage. "local" and treat the fork as a literal area for data storage.
Metadata repairs must take extra steps to support these cases: Metadata repairs must take extra steps to support these cases:
- If both forks are in local format and the fork areas are large enough, the - If both forks are in local format and the fork areas are large enough, the
swap is performed by copying the incore fork contents, logging both forks, exchange is performed by copying the incore fork contents, logging both
and committing. forks, and committing.
The atomic extent swap mechanism is not necessary, since this can be done The atomic file mapping exchange mechanism is not necessary, since this can
with a single transaction. be done with a single transaction.
- If both forks map blocks, then the regular atomic extent swap is used. - If both forks map blocks, then the regular atomic file mapping exchange is
used.
- Otherwise, only one fork is in local format. - Otherwise, only one fork is in local format.
The contents of the local format fork are converted to a block to perform the The contents of the local format fork are converted to a block to perform the
swap. exchange.
The conversion to block format must be done in the same transaction that The conversion to block format must be done in the same transaction that
logs the initial extent swap intent log item. logs the initial mapping exchange intent log item.
The regular atomic extent swap is used to exchange the mappings. The regular atomic mapping exchange is used to exchange the metadata file
Special flags are set on the swap operation so that the transaction can be mappings.
rolled one more time to convert the second file's fork back to local format Special flags are set on the exchange operation so that the transaction can
so that the second file will be ready to go as soon as the ILOCK is dropped. be rolled one more time to convert the second file's fork back to local
format so that the second file will be ready to go as soon as the ILOCK is
dropped.
Extended attributes and directories stamp the owning inode into every block, Extended attributes and directories stamp the owning inode into every block,
but the buffer verifiers do not actually check the inode number! but the buffer verifiers do not actually check the inode number!
Although there is no verification, it is still important to maintain Although there is no verification, it is still important to maintain
referential integrity, so prior to performing the extent swap, online repair referential integrity, so prior to performing the mapping exchange, online
builds every block in the new data structure with the owner field of the file repair builds every block in the new data structure with the owner field of the
being repaired. file being repaired.
After a successful swap operation, the repair operation must reap the old fork After a successful exchange operation, the repair operation must reap the old
blocks by processing each fork mapping through the standard :ref:`file extent fork blocks by processing each fork mapping through the standard :ref:`file
reaping <reaping>` mechanism that is done post-repair. extent reaping <reaping>` mechanism that is done post-repair.
If the filesystem should go down during the reap part of the repair, the If the filesystem should go down during the reap part of the repair, the
iunlink processing at the end of recovery will free both the temporary file and iunlink processing at the end of recovery will free both the temporary file and
whatever blocks were not reaped. whatever blocks were not reaped.
However, this iunlink processing omits the cross-link detection of online However, this iunlink processing omits the cross-link detection of online
repair, and is not completely foolproof. repair, and is not completely foolproof.
Swapping Temporary File Extents Exchanging Temporary File Contents
``````````````````````````````` ``````````````````````````````````
To repair a metadata file, online repair proceeds as follows: To repair a metadata file, online repair proceeds as follows:
...@@ -4260,14 +4271,14 @@ To repair a metadata file, online repair proceeds as follows: ...@@ -4260,14 +4271,14 @@ To repair a metadata file, online repair proceeds as follows:
file. file.
The same fork must be written to as is being repaired. The same fork must be written to as is being repaired.
3. Commit the scrub transaction, since the swap estimation step must be 3. Commit the scrub transaction, since the exchange resource estimation step
completed before transaction reservations are made. must be completed before transaction reservations are made.
4. Call ``xrep_tempswap_trans_alloc`` to allocate a new scrub transaction with 4. Call ``xrep_tempexch_trans_alloc`` to allocate a new scrub transaction with
the appropriate resource reservations, locks, and fill out a ``struct the appropriate resource reservations, locks, and fill out a ``struct
xfs_swapext_req`` with the details of the swap operation. xfs_exchmaps_req`` with the details of the exchange operation.
5. Call ``xrep_tempswap_contents`` to swap the contents. 5. Call ``xrep_tempexch_contents`` to exchange the contents.
6. Commit the transaction to complete the repair. 6. Commit the transaction to complete the repair.
...@@ -4309,7 +4320,7 @@ To check the summary file against the bitmap: ...@@ -4309,7 +4320,7 @@ To check the summary file against the bitmap:
3. Compare the contents of the xfile against the ondisk file. 3. Compare the contents of the xfile against the ondisk file.
To repair the summary file, write the xfile contents into the temporary file To repair the summary file, write the xfile contents into the temporary file
and use atomic extent swap to commit the new contents. and use atomic mapping exchange to commit the new contents.
The temporary file is then reaped. The temporary file is then reaped.
The proposed patchset is the The proposed patchset is the
...@@ -4352,8 +4363,8 @@ Salvaging extended attributes is done as follows: ...@@ -4352,8 +4363,8 @@ Salvaging extended attributes is done as follows:
memory or there are no more attr fork blocks to examine, unlock the file and memory or there are no more attr fork blocks to examine, unlock the file and
add the staged extended attributes to the temporary file. add the staged extended attributes to the temporary file.
3. Use atomic extent swapping to exchange the new and old extended attribute 3. Use atomic file mapping exchange to exchange the new and old extended
structures. attribute structures.
The old attribute blocks are now attached to the temporary file. The old attribute blocks are now attached to the temporary file.
4. Reap the temporary file. 4. Reap the temporary file.
...@@ -4410,7 +4421,8 @@ salvaging directories is straightforward: ...@@ -4410,7 +4421,8 @@ salvaging directories is straightforward:
directory and add the staged dirents into the temporary directory. directory and add the staged dirents into the temporary directory.
Truncate the staging files. Truncate the staging files.
4. Use atomic extent swapping to exchange the new and old directory structures. 4. Use atomic file mapping exchange to exchange the new and old directory
structures.
The old directory blocks are now attached to the temporary file. The old directory blocks are now attached to the temporary file.
5. Reap the temporary file. 5. Reap the temporary file.
...@@ -4542,7 +4554,7 @@ a :ref:`directory entry live update hook <liveupdate>` as follows: ...@@ -4542,7 +4554,7 @@ a :ref:`directory entry live update hook <liveupdate>` as follows:
Instead, we stash updates in the xfarray and rely on the scanner thread Instead, we stash updates in the xfarray and rely on the scanner thread
to apply the stashed updates to the temporary directory. to apply the stashed updates to the temporary directory.
5. When the scan is complete, atomically swap the contents of the temporary 5. When the scan is complete, atomically exchange the contents of the temporary
directory and the directory being repaired. directory and the directory being repaired.
The temporary directory now contains the damaged directory structure. The temporary directory now contains the damaged directory structure.
...@@ -4629,8 +4641,8 @@ directory reconstruction: ...@@ -4629,8 +4641,8 @@ directory reconstruction:
5. Copy all non-parent pointer extended attributes to the temporary file. 5. Copy all non-parent pointer extended attributes to the temporary file.
6. When the scan is complete, atomically swap the attribute fork of the 6. When the scan is complete, atomically exchange the mappings of the attribute
temporary file and the file being repaired. forks of the temporary file and the file being repaired.
The temporary file now contains the damaged extended attribute structure. The temporary file now contains the damaged extended attribute structure.
7. Reap the temporary file. 7. Reap the temporary file.
...@@ -5105,18 +5117,18 @@ make it easier for code readers to understand what has been built, for whom it ...@@ -5105,18 +5117,18 @@ make it easier for code readers to understand what has been built, for whom it
has been built, and why. has been built, and why.
Please feel free to contact the XFS mailing list with questions. Please feel free to contact the XFS mailing list with questions.
FIEXCHANGE_RANGE XFS_IOC_EXCHANGE_RANGE
---------------- ----------------------
As discussed earlier, a second frontend to the atomic extent swap mechanism is As discussed earlier, a second frontend to the atomic file mapping exchange
a new ioctl call that userspace programs can use to commit updates to files mechanism is a new ioctl call that userspace programs can use to commit updates
atomically. to files atomically.
This frontend has been out for review for several years now, though the This frontend has been out for review for several years now, though the
necessary refinements to online repair and lack of customer demand mean that necessary refinements to online repair and lack of customer demand mean that
the proposal has not been pushed very hard. the proposal has not been pushed very hard.
Extent Swapping with Regular User Files File Content Exchanges with Regular User Files
``````````````````````````````````````` ``````````````````````````````````````````````
As mentioned earlier, XFS has long had the ability to swap extents between As mentioned earlier, XFS has long had the ability to swap extents between
files, which is used almost exclusively by ``xfs_fsr`` to defragment files. files, which is used almost exclusively by ``xfs_fsr`` to defragment files.
...@@ -5131,12 +5143,12 @@ the consistency of the fork mappings with the reverse mapping index was to ...@@ -5131,12 +5143,12 @@ the consistency of the fork mappings with the reverse mapping index was to
develop an iterative mechanism that used deferred bmap and rmap operations to develop an iterative mechanism that used deferred bmap and rmap operations to
swap mappings one at a time. swap mappings one at a time.
This mechanism is identical to steps 2-3 from the procedure above except for This mechanism is identical to steps 2-3 from the procedure above except for
the new tracking items, because the atomic extent swap mechanism is an the new tracking items, because the atomic file mapping exchange mechanism is
iteration of an existing mechanism and not something totally novel. an iteration of an existing mechanism and not something totally novel.
For the narrow case of file defragmentation, the file contents must be For the narrow case of file defragmentation, the file contents must be
identical, so the recovery guarantees are not much of a gain. identical, so the recovery guarantees are not much of a gain.
Atomic extent swapping is much more flexible than the existing swapext Atomic file content exchanges are much more flexible than the existing swapext
implementations because it can guarantee that the caller never sees a mix of implementations because it can guarantee that the caller never sees a mix of
old and new contents even after a crash, and it can operate on two arbitrary old and new contents even after a crash, and it can operate on two arbitrary
file fork ranges. file fork ranges.
...@@ -5147,11 +5159,11 @@ The extra flexibility enables several new use cases: ...@@ -5147,11 +5159,11 @@ The extra flexibility enables several new use cases:
Next, it opens a temporary file and calls the file clone operation to reflink Next, it opens a temporary file and calls the file clone operation to reflink
the first file's contents into the temporary file. the first file's contents into the temporary file.
Writes to the original file should instead be written to the temporary file. Writes to the original file should instead be written to the temporary file.
Finally, the process calls the atomic extent swap system call Finally, the process calls the atomic file mapping exchange system call
(``FIEXCHANGE_RANGE``) to exchange the file contents, thereby committing all (``XFS_IOC_EXCHANGE_RANGE``) to exchange the file contents, thereby
of the updates to the original file, or none of them. committing all of the updates to the original file, or none of them.
.. _swapext_if_unchanged: .. _exchrange_if_unchanged:
- **Transactional file updates**: The same mechanism as above, but the caller - **Transactional file updates**: The same mechanism as above, but the caller
only wants the commit to occur if the original file's contents have not only wants the commit to occur if the original file's contents have not
...@@ -5160,16 +5172,17 @@ The extra flexibility enables several new use cases: ...@@ -5160,16 +5172,17 @@ The extra flexibility enables several new use cases:
change timestamps of the original file before reflinking its data to the change timestamps of the original file before reflinking its data to the
temporary file. temporary file.
When the program is ready to commit the changes, it passes the timestamps When the program is ready to commit the changes, it passes the timestamps
into the kernel as arguments to the atomic extent swap system call. into the kernel as arguments to the atomic file mapping exchange system call.
The kernel only commits the changes if the provided timestamps match the The kernel only commits the changes if the provided timestamps match the
original file. original file.
A new ioctl (``XFS_IOC_COMMIT_RANGE``) is provided to perform this.
- **Emulation of atomic block device writes**: Export a block device with a - **Emulation of atomic block device writes**: Export a block device with a
logical sector size matching the filesystem block size to force all writes logical sector size matching the filesystem block size to force all writes
to be aligned to the filesystem block size. to be aligned to the filesystem block size.
Stage all writes to a temporary file, and when that is complete, call the Stage all writes to a temporary file, and when that is complete, call the
atomic extent swap system call with a flag to indicate that holes in the atomic file mapping exchange system call with a flag to indicate that holes
temporary file should be ignored. in the temporary file should be ignored.
This emulates an atomic device write in software, and can support arbitrary This emulates an atomic device write in software, and can support arbitrary
scattered writes. scattered writes.
...@@ -5251,8 +5264,8 @@ of the file to try to share the physical space with a dummy file. ...@@ -5251,8 +5264,8 @@ of the file to try to share the physical space with a dummy file.
Cloning the extent means that the original owners cannot overwrite the Cloning the extent means that the original owners cannot overwrite the
contents; any changes will be written somewhere else via copy-on-write. contents; any changes will be written somewhere else via copy-on-write.
Clearspace makes its own copy of the frozen extent in an area that is not being Clearspace makes its own copy of the frozen extent in an area that is not being
cleared, and uses ``FIEDEUPRANGE`` (or the :ref:`atomic extent swap cleared, and uses ``FIEDEUPRANGE`` (or the :ref:`atomic file content exchanges
<swapext_if_unchanged>` feature) to change the target file's data extent <exchrange_if_unchanged>` feature) to change the target file's data extent
mapping away from the area being cleared. mapping away from the area being cleared.
When all other mappings have been moved, clearspace reflinks the space into the When all other mappings have been moved, clearspace reflinks the space into the
space collector file so that it becomes unavailable. space collector file so that it becomes unavailable.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment