mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-03 19:55:31 +00:00
docs: update swapext -> exchmaps language
Start reworking the atomic swapext design documentation to refer to its new file contents/mapping exchange name. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
This commit is contained in:
parent
14f1999102
commit
f783529bee
@ -2167,7 +2167,7 @@ The ``xfblob_free`` function frees a specific blob, and the ``xfblob_truncate``
|
|||||||
function frees them all because compaction is not needed.
|
function frees them all because compaction is not needed.
|
||||||
|
|
||||||
The details of repairing directories and extended attributes will be discussed
|
The details of repairing directories and extended attributes will be discussed
|
||||||
in a subsequent section about atomic extent swapping.
|
in a subsequent section about atomic file content exchanges.
|
||||||
However, it should be noted that these repair functions only use blob storage
|
However, it should be noted that these repair functions only use blob storage
|
||||||
to cache a small number of entries before adding them to a temporary ondisk
|
to cache a small number of entries before adding them to a temporary ondisk
|
||||||
file, which is why compaction is not required.
|
file, which is why compaction is not required.
|
||||||
@ -2802,7 +2802,8 @@ follows this format:
|
|||||||
|
|
||||||
Repairs for file-based metadata such as extended attributes, directories,
|
Repairs for file-based metadata such as extended attributes, directories,
|
||||||
symbolic links, quota files and realtime bitmaps are performed by building a
|
symbolic links, quota files and realtime bitmaps are performed by building a
|
||||||
new structure attached to a temporary file and swapping the forks.
|
new structure attached to a temporary file and exchanging all mappings in the
|
||||||
|
file forks.
|
||||||
Afterward, the mappings in the old file fork are the candidate blocks for
|
Afterward, the mappings in the old file fork are the candidate blocks for
|
||||||
disposal.
|
disposal.
|
||||||
|
|
||||||
@ -3851,8 +3852,8 @@ Because file forks can consume as much space as the entire filesystem, repairs
|
|||||||
cannot be staged in memory, even when a paging scheme is available.
|
cannot be staged in memory, even when a paging scheme is available.
|
||||||
Therefore, online repair of file-based metadata createas a temporary file in
|
Therefore, online repair of file-based metadata createas a temporary file in
|
||||||
the XFS filesystem, writes a new structure at the correct offsets into the
|
the XFS filesystem, writes a new structure at the correct offsets into the
|
||||||
temporary file, and atomically swaps the fork mappings (and hence the fork
|
temporary file, and atomically exchanges all file fork mappings (and hence the
|
||||||
contents) to commit the repair.
|
fork contents) to commit the repair.
|
||||||
Once the repair is complete, the old fork can be reaped as necessary; if the
|
Once the repair is complete, the old fork can be reaped as necessary; if the
|
||||||
system goes down during the reap, the iunlink code will delete the blocks
|
system goes down during the reap, the iunlink code will delete the blocks
|
||||||
during log recovery.
|
during log recovery.
|
||||||
@ -3862,10 +3863,11 @@ consistent to use a temporary file safely!
|
|||||||
This dependency is the reason why online repair can only use pageable kernel
|
This dependency is the reason why online repair can only use pageable kernel
|
||||||
memory to stage ondisk space usage information.
|
memory to stage ondisk space usage information.
|
||||||
|
|
||||||
Swapping metadata extents with a temporary file requires the owner field of the
|
Exchanging metadata file mappings with a temporary file requires the owner
|
||||||
block headers to match the file being repaired and not the temporary file. The
|
field of the block headers to match the file being repaired and not the
|
||||||
directory, extended attribute, and symbolic link functions were all modified to
|
temporary file.
|
||||||
allow callers to specify owner numbers explicitly.
|
The directory, extended attribute, and symbolic link functions were all
|
||||||
|
modified to allow callers to specify owner numbers explicitly.
|
||||||
|
|
||||||
There is a downside to the reaping process -- if the system crashes during the
|
There is a downside to the reaping process -- if the system crashes during the
|
||||||
reap phase and the fork extents are crosslinked, the iunlink processing will
|
reap phase and the fork extents are crosslinked, the iunlink processing will
|
||||||
@ -3974,8 +3976,8 @@ The proposed patches are in the
|
|||||||
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_
|
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_
|
||||||
series.
|
series.
|
||||||
|
|
||||||
Atomic Extent Swapping
|
Logged File Content Exchanges
|
||||||
----------------------
|
-----------------------------
|
||||||
|
|
||||||
Once repair builds a temporary file with a new data structure written into
|
Once repair builds a temporary file with a new data structure written into
|
||||||
it, it must commit the new changes into the existing file.
|
it, it must commit the new changes into the existing file.
|
||||||
@ -4010,17 +4012,21 @@ e. Old blocks in the file may be cross-linked with another structure and must
|
|||||||
These problems are overcome by creating a new deferred operation and a new type
|
These problems are overcome by creating a new deferred operation and a new type
|
||||||
of log intent item to track the progress of an operation to exchange two file
|
of log intent item to track the progress of an operation to exchange two file
|
||||||
ranges.
|
ranges.
|
||||||
The new deferred operation type chains together the same transactions used by
|
The new exchange operation type chains together the same transactions used by
|
||||||
the reverse-mapping extent swap code.
|
the reverse-mapping extent swap code, but records intermedia progress in the
|
||||||
|
log so that operations can be restarted after a crash.
|
||||||
|
This new functionality is called the file contents exchange (xfs_exchrange)
|
||||||
|
code.
|
||||||
|
The underlying implementation exchanges file fork mappings (xfs_exchmaps).
|
||||||
The new log item records the progress of the exchange to ensure that once an
|
The new log item records the progress of the exchange to ensure that once an
|
||||||
exchange begins, it will always run to completion, even there are
|
exchange begins, it will always run to completion, even there are
|
||||||
interruptions.
|
interruptions.
|
||||||
The new ``XFS_SB_FEAT_INCOMPAT_LOG_ATOMIC_SWAP`` log-incompatible feature flag
|
The new ``XFS_SB_FEAT_INCOMPAT_EXCHRANGE`` incompatible feature flag
|
||||||
in the superblock protects these new log item records from being replayed on
|
in the superblock protects these new log item records from being replayed on
|
||||||
old kernels.
|
old kernels.
|
||||||
|
|
||||||
The proposed patchset is the
|
The proposed patchset is the
|
||||||
`atomic extent swap
|
`file contents exchange
|
||||||
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_
|
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_
|
||||||
series.
|
series.
|
||||||
|
|
||||||
@ -4061,72 +4067,73 @@ series.
|
|||||||
| The feature bit will not be cleared from the superblock until the log |
|
| The feature bit will not be cleared from the superblock until the log |
|
||||||
| becomes clean. |
|
| becomes clean. |
|
||||||
| |
|
| |
|
||||||
| Log-assisted extended attribute updates and atomic extent swaps both use |
|
| Log-assisted extended attribute updates and file content exchanges bothe |
|
||||||
| log incompat features and provide convenience wrappers around the |
|
| use log incompat features and provide convenience wrappers around the |
|
||||||
| functionality. |
|
| functionality. |
|
||||||
+--------------------------------------------------------------------------+
|
+--------------------------------------------------------------------------+
|
||||||
|
|
||||||
Mechanics of an Atomic Extent Swap
|
Mechanics of a Logged File Content Exchange
|
||||||
``````````````````````````````````
|
```````````````````````````````````````````
|
||||||
|
|
||||||
Swapping entire file forks is a complex task.
|
Exchanging contents between file forks is a complex task.
|
||||||
The goal is to exchange all file fork mappings between two file fork offset
|
The goal is to exchange all file fork mappings between two file fork offset
|
||||||
ranges.
|
ranges.
|
||||||
There are likely to be many extent mappings in each fork, and the edges of
|
There are likely to be many extent mappings in each fork, and the edges of
|
||||||
the mappings aren't necessarily aligned.
|
the mappings aren't necessarily aligned.
|
||||||
Furthermore, there may be other updates that need to happen after the swap,
|
Furthermore, there may be other updates that need to happen after the exchange,
|
||||||
such as exchanging file sizes, inode flags, or conversion of fork data to local
|
such as exchanging file sizes, inode flags, or conversion of fork data to local
|
||||||
format.
|
format.
|
||||||
This is roughly the format of the new deferred extent swap work item:
|
This is roughly the format of the new deferred exchange-mapping work item:
|
||||||
|
|
||||||
.. code-block:: c
|
.. code-block:: c
|
||||||
|
|
||||||
struct xfs_swapext_intent {
|
struct xfs_exchmaps_intent {
|
||||||
/* Inodes participating in the operation. */
|
/* Inodes participating in the operation. */
|
||||||
struct xfs_inode *sxi_ip1;
|
struct xfs_inode *xmi_ip1;
|
||||||
struct xfs_inode *sxi_ip2;
|
struct xfs_inode *xmi_ip2;
|
||||||
|
|
||||||
/* File offset range information. */
|
/* File offset range information. */
|
||||||
xfs_fileoff_t sxi_startoff1;
|
xfs_fileoff_t xmi_startoff1;
|
||||||
xfs_fileoff_t sxi_startoff2;
|
xfs_fileoff_t xmi_startoff2;
|
||||||
xfs_filblks_t sxi_blockcount;
|
xfs_filblks_t xmi_blockcount;
|
||||||
|
|
||||||
/* Set these file sizes after the operation, unless negative. */
|
/* Set these file sizes after the operation, unless negative. */
|
||||||
xfs_fsize_t sxi_isize1;
|
xfs_fsize_t xmi_isize1;
|
||||||
xfs_fsize_t sxi_isize2;
|
xfs_fsize_t xmi_isize2;
|
||||||
|
|
||||||
/* XFS_SWAP_EXT_* log operation flags */
|
/* XFS_EXCHMAPS_* log operation flags */
|
||||||
uint64_t sxi_flags;
|
uint64_t xmi_flags;
|
||||||
};
|
};
|
||||||
|
|
||||||
The new log intent item contains enough information to track two logical fork
|
The new log intent item contains enough information to track two logical fork
|
||||||
offset ranges: ``(inode1, startoff1, blockcount)`` and ``(inode2, startoff2,
|
offset ranges: ``(inode1, startoff1, blockcount)`` and ``(inode2, startoff2,
|
||||||
blockcount)``.
|
blockcount)``.
|
||||||
Each step of a swap operation exchanges the largest file range mapping possible
|
Each step of an exchange operation exchanges the largest file range mapping
|
||||||
from one file to the other.
|
possible from one file to the other.
|
||||||
After each step in the swap operation, the two startoff fields are incremented
|
After each step in the exchange operation, the two startoff fields are
|
||||||
and the blockcount field is decremented to reflect the progress made.
|
incremented and the blockcount field is decremented to reflect the progress
|
||||||
The flags field captures behavioral parameters such as swapping the attr fork
|
made.
|
||||||
instead of the data fork and other work to be done after the extent swap.
|
The flags field captures behavioral parameters such as exchanging attr fork
|
||||||
The two isize fields are used to swap the file size at the end of the operation
|
mappings instead of the data fork and other work to be done after the exchange.
|
||||||
if the file data fork is the target of the swap operation.
|
The two isize fields are used to exchange the file sizes at the end of the
|
||||||
|
operation if the file data fork is the target of the operation.
|
||||||
|
|
||||||
When the extent swap is initiated, the sequence of operations is as follows:
|
When the exchange is initiated, the sequence of operations is as follows:
|
||||||
|
|
||||||
1. Create a deferred work item for the extent swap.
|
1. Create a deferred work item for the file mapping exchange.
|
||||||
At the start, it should contain the entirety of the file ranges to be
|
At the start, it should contain the entirety of the file block ranges to be
|
||||||
swapped.
|
exchanged.
|
||||||
|
|
||||||
2. Call ``xfs_defer_finish`` to process the exchange.
|
2. Call ``xfs_defer_finish`` to process the exchange.
|
||||||
This is encapsulated in ``xrep_tempswap_contents`` for scrub operations.
|
This is encapsulated in ``xrep_tempexch_contents`` for scrub operations.
|
||||||
This will log an extent swap intent item to the transaction for the deferred
|
This will log an extent swap intent item to the transaction for the deferred
|
||||||
extent swap work item.
|
mapping exchange work item.
|
||||||
|
|
||||||
3. Until ``sxi_blockcount`` of the deferred extent swap work item is zero,
|
3. Until ``xmi_blockcount`` of the deferred mapping exchange work item is zero,
|
||||||
|
|
||||||
a. Read the block maps of both file ranges starting at ``sxi_startoff1`` and
|
a. Read the block maps of both file ranges starting at ``xmi_startoff1`` and
|
||||||
``sxi_startoff2``, respectively, and compute the longest extent that can
|
``xmi_startoff2``, respectively, and compute the longest extent that can
|
||||||
be swapped in a single step.
|
be exchanged in a single step.
|
||||||
This is the minimum of the two ``br_blockcount`` s in the mappings.
|
This is the minimum of the two ``br_blockcount`` s in the mappings.
|
||||||
Keep advancing through the file forks until at least one of the mappings
|
Keep advancing through the file forks until at least one of the mappings
|
||||||
contains written blocks.
|
contains written blocks.
|
||||||
@ -4148,20 +4155,20 @@ When the extent swap is initiated, the sequence of operations is as follows:
|
|||||||
|
|
||||||
g. Extend the ondisk size of either file if necessary.
|
g. Extend the ondisk size of either file if necessary.
|
||||||
|
|
||||||
h. Log an extent swap done log item for the extent swap intent log item
|
h. Log a mapping exchange done log item for th mapping exchange intent log
|
||||||
that was read at the start of step 3.
|
item that was read at the start of step 3.
|
||||||
|
|
||||||
i. Compute the amount of file range that has just been covered.
|
i. Compute the amount of file range that has just been covered.
|
||||||
This quantity is ``(map1.br_startoff + map1.br_blockcount -
|
This quantity is ``(map1.br_startoff + map1.br_blockcount -
|
||||||
sxi_startoff1)``, because step 3a could have skipped holes.
|
xmi_startoff1)``, because step 3a could have skipped holes.
|
||||||
|
|
||||||
j. Increase the starting offsets of ``sxi_startoff1`` and ``sxi_startoff2``
|
j. Increase the starting offsets of ``xmi_startoff1`` and ``xmi_startoff2``
|
||||||
by the number of blocks computed in the previous step, and decrease
|
by the number of blocks computed in the previous step, and decrease
|
||||||
``sxi_blockcount`` by the same quantity.
|
``xmi_blockcount`` by the same quantity.
|
||||||
This advances the cursor.
|
This advances the cursor.
|
||||||
|
|
||||||
k. Log a new extent swap intent log item reflecting the advanced state of
|
k. Log a new mapping exchange intent log item reflecting the advanced state
|
||||||
the work item.
|
of the work item.
|
||||||
|
|
||||||
l. Return the proper error code (EAGAIN) to the deferred operation manager
|
l. Return the proper error code (EAGAIN) to the deferred operation manager
|
||||||
to inform it that there is more work to be done.
|
to inform it that there is more work to be done.
|
||||||
@ -4172,22 +4179,23 @@ When the extent swap is initiated, the sequence of operations is as follows:
|
|||||||
This will be discussed in more detail in subsequent sections.
|
This will be discussed in more detail in subsequent sections.
|
||||||
|
|
||||||
If the filesystem goes down in the middle of an operation, log recovery will
|
If the filesystem goes down in the middle of an operation, log recovery will
|
||||||
find the most recent unfinished extent swap log intent item and restart from
|
find the most recent unfinished maping exchange log intent item and restart
|
||||||
there.
|
from there.
|
||||||
This is how extent swapping guarantees that an outside observer will either see
|
This is how atomic file mapping exchanges guarantees that an outside observer
|
||||||
the old broken structure or the new one, and never a mismash of both.
|
will either see the old broken structure or the new one, and never a mismash of
|
||||||
|
both.
|
||||||
|
|
||||||
Preparation for Extent Swapping
|
Preparation for File Content Exchanges
|
||||||
```````````````````````````````
|
``````````````````````````````````````
|
||||||
|
|
||||||
There are a few things that need to be taken care of before initiating an
|
There are a few things that need to be taken care of before initiating an
|
||||||
atomic extent swap operation.
|
atomic file mapping exchange operation.
|
||||||
First, regular files require the page cache to be flushed to disk before the
|
First, regular files require the page cache to be flushed to disk before the
|
||||||
operation begins, and directio writes to be quiesced.
|
operation begins, and directio writes to be quiesced.
|
||||||
Like any filesystem operation, extent swapping must determine the maximum
|
Like any filesystem operation, file mapping exchanges must determine the
|
||||||
amount of disk space and quota that can be consumed on behalf of both files in
|
maximum amount of disk space and quota that can be consumed on behalf of both
|
||||||
the operation, and reserve that quantity of resources to avoid an unrecoverable
|
files in the operation, and reserve that quantity of resources to avoid an
|
||||||
out of space failure once it starts dirtying metadata.
|
unrecoverable out of space failure once it starts dirtying metadata.
|
||||||
The preparation step scans the ranges of both files to estimate:
|
The preparation step scans the ranges of both files to estimate:
|
||||||
|
|
||||||
- Data device blocks needed to handle the repeated updates to the fork
|
- Data device blocks needed to handle the repeated updates to the fork
|
||||||
@ -4201,56 +4209,59 @@ The preparation step scans the ranges of both files to estimate:
|
|||||||
to different extents on the realtime volume, which could happen if the
|
to different extents on the realtime volume, which could happen if the
|
||||||
operation fails to run to completion.
|
operation fails to run to completion.
|
||||||
|
|
||||||
The need for precise estimation increases the run time of the swap operation,
|
The need for precise estimation increases the run time of the exchange
|
||||||
but it is very important to maintain correct accounting.
|
operation, but it is very important to maintain correct accounting.
|
||||||
The filesystem must not run completely out of free space, nor can the extent
|
The filesystem must not run completely out of free space, nor can the mapping
|
||||||
swap ever add more extent mappings to a fork than it can support.
|
exchange ever add more extent mappings to a fork than it can support.
|
||||||
Regular users are required to abide the quota limits, though metadata repairs
|
Regular users are required to abide the quota limits, though metadata repairs
|
||||||
may exceed quota to resolve inconsistent metadata elsewhere.
|
may exceed quota to resolve inconsistent metadata elsewhere.
|
||||||
|
|
||||||
Special Features for Swapping Metadata File Extents
|
Special Features for Exchanging Metadata File Contents
|
||||||
```````````````````````````````````````````````````
|
``````````````````````````````````````````````````````
|
||||||
|
|
||||||
Extended attributes, symbolic links, and directories can set the fork format to
|
Extended attributes, symbolic links, and directories can set the fork format to
|
||||||
"local" and treat the fork as a literal area for data storage.
|
"local" and treat the fork as a literal area for data storage.
|
||||||
Metadata repairs must take extra steps to support these cases:
|
Metadata repairs must take extra steps to support these cases:
|
||||||
|
|
||||||
- If both forks are in local format and the fork areas are large enough, the
|
- If both forks are in local format and the fork areas are large enough, the
|
||||||
swap is performed by copying the incore fork contents, logging both forks,
|
exchange is performed by copying the incore fork contents, logging both
|
||||||
and committing.
|
forks, and committing.
|
||||||
The atomic extent swap mechanism is not necessary, since this can be done
|
The atomic file mapping exchange mechanism is not necessary, since this can
|
||||||
with a single transaction.
|
be done with a single transaction.
|
||||||
|
|
||||||
- If both forks map blocks, then the regular atomic extent swap is used.
|
- If both forks map blocks, then the regular atomic file mapping exchange is
|
||||||
|
used.
|
||||||
|
|
||||||
- Otherwise, only one fork is in local format.
|
- Otherwise, only one fork is in local format.
|
||||||
The contents of the local format fork are converted to a block to perform the
|
The contents of the local format fork are converted to a block to perform the
|
||||||
swap.
|
exchange.
|
||||||
The conversion to block format must be done in the same transaction that
|
The conversion to block format must be done in the same transaction that
|
||||||
logs the initial extent swap intent log item.
|
logs the initial mapping exchange intent log item.
|
||||||
The regular atomic extent swap is used to exchange the mappings.
|
The regular atomic mapping exchange is used to exchange the metadata file
|
||||||
Special flags are set on the swap operation so that the transaction can be
|
mappings.
|
||||||
rolled one more time to convert the second file's fork back to local format
|
Special flags are set on the exchange operation so that the transaction can
|
||||||
so that the second file will be ready to go as soon as the ILOCK is dropped.
|
be rolled one more time to convert the second file's fork back to local
|
||||||
|
format so that the second file will be ready to go as soon as the ILOCK is
|
||||||
|
dropped.
|
||||||
|
|
||||||
Extended attributes and directories stamp the owning inode into every block,
|
Extended attributes and directories stamp the owning inode into every block,
|
||||||
but the buffer verifiers do not actually check the inode number!
|
but the buffer verifiers do not actually check the inode number!
|
||||||
Although there is no verification, it is still important to maintain
|
Although there is no verification, it is still important to maintain
|
||||||
referential integrity, so prior to performing the extent swap, online repair
|
referential integrity, so prior to performing the mapping exchange, online
|
||||||
builds every block in the new data structure with the owner field of the file
|
repair builds every block in the new data structure with the owner field of the
|
||||||
being repaired.
|
file being repaired.
|
||||||
|
|
||||||
After a successful swap operation, the repair operation must reap the old fork
|
After a successful exchange operation, the repair operation must reap the old
|
||||||
blocks by processing each fork mapping through the standard :ref:`file extent
|
fork blocks by processing each fork mapping through the standard :ref:`file
|
||||||
reaping <reaping>` mechanism that is done post-repair.
|
extent reaping <reaping>` mechanism that is done post-repair.
|
||||||
If the filesystem should go down during the reap part of the repair, the
|
If the filesystem should go down during the reap part of the repair, the
|
||||||
iunlink processing at the end of recovery will free both the temporary file and
|
iunlink processing at the end of recovery will free both the temporary file and
|
||||||
whatever blocks were not reaped.
|
whatever blocks were not reaped.
|
||||||
However, this iunlink processing omits the cross-link detection of online
|
However, this iunlink processing omits the cross-link detection of online
|
||||||
repair, and is not completely foolproof.
|
repair, and is not completely foolproof.
|
||||||
|
|
||||||
Swapping Temporary File Extents
|
Exchanging Temporary File Contents
|
||||||
```````````````````````````````
|
``````````````````````````````````
|
||||||
|
|
||||||
To repair a metadata file, online repair proceeds as follows:
|
To repair a metadata file, online repair proceeds as follows:
|
||||||
|
|
||||||
@ -4260,14 +4271,14 @@ To repair a metadata file, online repair proceeds as follows:
|
|||||||
file.
|
file.
|
||||||
The same fork must be written to as is being repaired.
|
The same fork must be written to as is being repaired.
|
||||||
|
|
||||||
3. Commit the scrub transaction, since the swap estimation step must be
|
3. Commit the scrub transaction, since the exchange resource estimation step
|
||||||
completed before transaction reservations are made.
|
must be completed before transaction reservations are made.
|
||||||
|
|
||||||
4. Call ``xrep_tempswap_trans_alloc`` to allocate a new scrub transaction with
|
4. Call ``xrep_tempexch_trans_alloc`` to allocate a new scrub transaction with
|
||||||
the appropriate resource reservations, locks, and fill out a ``struct
|
the appropriate resource reservations, locks, and fill out a ``struct
|
||||||
xfs_swapext_req`` with the details of the swap operation.
|
xfs_exchmaps_req`` with the details of the exchange operation.
|
||||||
|
|
||||||
5. Call ``xrep_tempswap_contents`` to swap the contents.
|
5. Call ``xrep_tempexch_contents`` to exchange the contents.
|
||||||
|
|
||||||
6. Commit the transaction to complete the repair.
|
6. Commit the transaction to complete the repair.
|
||||||
|
|
||||||
@ -4309,7 +4320,7 @@ To check the summary file against the bitmap:
|
|||||||
3. Compare the contents of the xfile against the ondisk file.
|
3. Compare the contents of the xfile against the ondisk file.
|
||||||
|
|
||||||
To repair the summary file, write the xfile contents into the temporary file
|
To repair the summary file, write the xfile contents into the temporary file
|
||||||
and use atomic extent swap to commit the new contents.
|
and use atomic mapping exchange to commit the new contents.
|
||||||
The temporary file is then reaped.
|
The temporary file is then reaped.
|
||||||
|
|
||||||
The proposed patchset is the
|
The proposed patchset is the
|
||||||
@ -4352,8 +4363,8 @@ Salvaging extended attributes is done as follows:
|
|||||||
memory or there are no more attr fork blocks to examine, unlock the file and
|
memory or there are no more attr fork blocks to examine, unlock the file and
|
||||||
add the staged extended attributes to the temporary file.
|
add the staged extended attributes to the temporary file.
|
||||||
|
|
||||||
3. Use atomic extent swapping to exchange the new and old extended attribute
|
3. Use atomic file mapping exchange to exchange the new and old extended
|
||||||
structures.
|
attribute structures.
|
||||||
The old attribute blocks are now attached to the temporary file.
|
The old attribute blocks are now attached to the temporary file.
|
||||||
|
|
||||||
4. Reap the temporary file.
|
4. Reap the temporary file.
|
||||||
@ -4410,7 +4421,8 @@ salvaging directories is straightforward:
|
|||||||
directory and add the staged dirents into the temporary directory.
|
directory and add the staged dirents into the temporary directory.
|
||||||
Truncate the staging files.
|
Truncate the staging files.
|
||||||
|
|
||||||
4. Use atomic extent swapping to exchange the new and old directory structures.
|
4. Use atomic file mapping exchange to exchange the new and old directory
|
||||||
|
structures.
|
||||||
The old directory blocks are now attached to the temporary file.
|
The old directory blocks are now attached to the temporary file.
|
||||||
|
|
||||||
5. Reap the temporary file.
|
5. Reap the temporary file.
|
||||||
@ -4542,7 +4554,7 @@ a :ref:`directory entry live update hook <liveupdate>` as follows:
|
|||||||
Instead, we stash updates in the xfarray and rely on the scanner thread
|
Instead, we stash updates in the xfarray and rely on the scanner thread
|
||||||
to apply the stashed updates to the temporary directory.
|
to apply the stashed updates to the temporary directory.
|
||||||
|
|
||||||
5. When the scan is complete, atomically swap the contents of the temporary
|
5. When the scan is complete, atomically exchange the contents of the temporary
|
||||||
directory and the directory being repaired.
|
directory and the directory being repaired.
|
||||||
The temporary directory now contains the damaged directory structure.
|
The temporary directory now contains the damaged directory structure.
|
||||||
|
|
||||||
@ -4629,8 +4641,8 @@ directory reconstruction:
|
|||||||
|
|
||||||
5. Copy all non-parent pointer extended attributes to the temporary file.
|
5. Copy all non-parent pointer extended attributes to the temporary file.
|
||||||
|
|
||||||
6. When the scan is complete, atomically swap the attribute fork of the
|
6. When the scan is complete, atomically exchange the mappings of the attribute
|
||||||
temporary file and the file being repaired.
|
forks of the temporary file and the file being repaired.
|
||||||
The temporary file now contains the damaged extended attribute structure.
|
The temporary file now contains the damaged extended attribute structure.
|
||||||
|
|
||||||
7. Reap the temporary file.
|
7. Reap the temporary file.
|
||||||
@ -5105,18 +5117,18 @@ make it easier for code readers to understand what has been built, for whom it
|
|||||||
has been built, and why.
|
has been built, and why.
|
||||||
Please feel free to contact the XFS mailing list with questions.
|
Please feel free to contact the XFS mailing list with questions.
|
||||||
|
|
||||||
FIEXCHANGE_RANGE
|
XFS_IOC_EXCHANGE_RANGE
|
||||||
----------------
|
----------------------
|
||||||
|
|
||||||
As discussed earlier, a second frontend to the atomic extent swap mechanism is
|
As discussed earlier, a second frontend to the atomic file mapping exchange
|
||||||
a new ioctl call that userspace programs can use to commit updates to files
|
mechanism is a new ioctl call that userspace programs can use to commit updates
|
||||||
atomically.
|
to files atomically.
|
||||||
This frontend has been out for review for several years now, though the
|
This frontend has been out for review for several years now, though the
|
||||||
necessary refinements to online repair and lack of customer demand mean that
|
necessary refinements to online repair and lack of customer demand mean that
|
||||||
the proposal has not been pushed very hard.
|
the proposal has not been pushed very hard.
|
||||||
|
|
||||||
Extent Swapping with Regular User Files
|
File Content Exchanges with Regular User Files
|
||||||
```````````````````````````````````````
|
``````````````````````````````````````````````
|
||||||
|
|
||||||
As mentioned earlier, XFS has long had the ability to swap extents between
|
As mentioned earlier, XFS has long had the ability to swap extents between
|
||||||
files, which is used almost exclusively by ``xfs_fsr`` to defragment files.
|
files, which is used almost exclusively by ``xfs_fsr`` to defragment files.
|
||||||
@ -5131,12 +5143,12 @@ the consistency of the fork mappings with the reverse mapping index was to
|
|||||||
develop an iterative mechanism that used deferred bmap and rmap operations to
|
develop an iterative mechanism that used deferred bmap and rmap operations to
|
||||||
swap mappings one at a time.
|
swap mappings one at a time.
|
||||||
This mechanism is identical to steps 2-3 from the procedure above except for
|
This mechanism is identical to steps 2-3 from the procedure above except for
|
||||||
the new tracking items, because the atomic extent swap mechanism is an
|
the new tracking items, because the atomic file mapping exchange mechanism is
|
||||||
iteration of an existing mechanism and not something totally novel.
|
an iteration of an existing mechanism and not something totally novel.
|
||||||
For the narrow case of file defragmentation, the file contents must be
|
For the narrow case of file defragmentation, the file contents must be
|
||||||
identical, so the recovery guarantees are not much of a gain.
|
identical, so the recovery guarantees are not much of a gain.
|
||||||
|
|
||||||
Atomic extent swapping is much more flexible than the existing swapext
|
Atomic file content exchanges are much more flexible than the existing swapext
|
||||||
implementations because it can guarantee that the caller never sees a mix of
|
implementations because it can guarantee that the caller never sees a mix of
|
||||||
old and new contents even after a crash, and it can operate on two arbitrary
|
old and new contents even after a crash, and it can operate on two arbitrary
|
||||||
file fork ranges.
|
file fork ranges.
|
||||||
@ -5147,11 +5159,11 @@ The extra flexibility enables several new use cases:
|
|||||||
Next, it opens a temporary file and calls the file clone operation to reflink
|
Next, it opens a temporary file and calls the file clone operation to reflink
|
||||||
the first file's contents into the temporary file.
|
the first file's contents into the temporary file.
|
||||||
Writes to the original file should instead be written to the temporary file.
|
Writes to the original file should instead be written to the temporary file.
|
||||||
Finally, the process calls the atomic extent swap system call
|
Finally, the process calls the atomic file mapping exchange system call
|
||||||
(``FIEXCHANGE_RANGE``) to exchange the file contents, thereby committing all
|
(``XFS_IOC_EXCHANGE_RANGE``) to exchange the file contents, thereby
|
||||||
of the updates to the original file, or none of them.
|
committing all of the updates to the original file, or none of them.
|
||||||
|
|
||||||
.. _swapext_if_unchanged:
|
.. _exchrange_if_unchanged:
|
||||||
|
|
||||||
- **Transactional file updates**: The same mechanism as above, but the caller
|
- **Transactional file updates**: The same mechanism as above, but the caller
|
||||||
only wants the commit to occur if the original file's contents have not
|
only wants the commit to occur if the original file's contents have not
|
||||||
@ -5160,16 +5172,17 @@ The extra flexibility enables several new use cases:
|
|||||||
change timestamps of the original file before reflinking its data to the
|
change timestamps of the original file before reflinking its data to the
|
||||||
temporary file.
|
temporary file.
|
||||||
When the program is ready to commit the changes, it passes the timestamps
|
When the program is ready to commit the changes, it passes the timestamps
|
||||||
into the kernel as arguments to the atomic extent swap system call.
|
into the kernel as arguments to the atomic file mapping exchange system call.
|
||||||
The kernel only commits the changes if the provided timestamps match the
|
The kernel only commits the changes if the provided timestamps match the
|
||||||
original file.
|
original file.
|
||||||
|
A new ioctl (``XFS_IOC_COMMIT_RANGE``) is provided to perform this.
|
||||||
|
|
||||||
- **Emulation of atomic block device writes**: Export a block device with a
|
- **Emulation of atomic block device writes**: Export a block device with a
|
||||||
logical sector size matching the filesystem block size to force all writes
|
logical sector size matching the filesystem block size to force all writes
|
||||||
to be aligned to the filesystem block size.
|
to be aligned to the filesystem block size.
|
||||||
Stage all writes to a temporary file, and when that is complete, call the
|
Stage all writes to a temporary file, and when that is complete, call the
|
||||||
atomic extent swap system call with a flag to indicate that holes in the
|
atomic file mapping exchange system call with a flag to indicate that holes
|
||||||
temporary file should be ignored.
|
in the temporary file should be ignored.
|
||||||
This emulates an atomic device write in software, and can support arbitrary
|
This emulates an atomic device write in software, and can support arbitrary
|
||||||
scattered writes.
|
scattered writes.
|
||||||
|
|
||||||
@ -5251,8 +5264,8 @@ of the file to try to share the physical space with a dummy file.
|
|||||||
Cloning the extent means that the original owners cannot overwrite the
|
Cloning the extent means that the original owners cannot overwrite the
|
||||||
contents; any changes will be written somewhere else via copy-on-write.
|
contents; any changes will be written somewhere else via copy-on-write.
|
||||||
Clearspace makes its own copy of the frozen extent in an area that is not being
|
Clearspace makes its own copy of the frozen extent in an area that is not being
|
||||||
cleared, and uses ``FIEDEUPRANGE`` (or the :ref:`atomic extent swap
|
cleared, and uses ``FIEDEUPRANGE`` (or the :ref:`atomic file content exchanges
|
||||||
<swapext_if_unchanged>` feature) to change the target file's data extent
|
<exchrange_if_unchanged>` feature) to change the target file's data extent
|
||||||
mapping away from the area being cleared.
|
mapping away from the area being cleared.
|
||||||
When all other mappings have been moved, clearspace reflinks the space into the
|
When all other mappings have been moved, clearspace reflinks the space into the
|
||||||
space collector file so that it becomes unavailable.
|
space collector file so that it becomes unavailable.
|
||||||
|
Loading…
Reference in New Issue
Block a user