mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-03 19:55:31 +00:00
xfs: document the user interface for online fsck
Start the fourth chapter of the online fsck design documentation, which discusses the user interface and the background scrubbing service. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
This commit is contained in:
parent
9a30b5b521
commit
4f7f646970
@ -800,3 +800,116 @@ Proposed patchsets include `general stress testing
|
|||||||
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
|
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
|
||||||
and the `evolution of existing per-function stress testing
|
and the `evolution of existing per-function stress testing
|
||||||
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
|
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
|
||||||
|
|
||||||
|
4. User Interface
|
||||||
|
=================
|
||||||
|
|
||||||
|
The primary user of online fsck is the system administrator, just like offline
|
||||||
|
repair.
|
||||||
|
Online fsck presents two modes of operation to administrators:
|
||||||
|
A foreground CLI process for online fsck on demand, and a background service
|
||||||
|
that performs autonomous checking and repair.
|
||||||
|
|
||||||
|
Checking on Demand
|
||||||
|
------------------
|
||||||
|
|
||||||
|
For administrators who want the absolute freshest information about the
|
||||||
|
metadata in a filesystem, ``xfs_scrub`` can be run as a foreground process on
|
||||||
|
a command line.
|
||||||
|
The program checks every piece of metadata in the filesystem while the
|
||||||
|
administrator waits for the results to be reported, just like the existing
|
||||||
|
``xfs_repair`` tool.
|
||||||
|
Both tools share a ``-n`` option to perform a read-only scan, and a ``-v``
|
||||||
|
option to increase the verbosity of the information reported.
|
||||||
|
|
||||||
|
A new feature of ``xfs_scrub`` is the ``-x`` option, which employs the error
|
||||||
|
correction capabilities of the hardware to check data file contents.
|
||||||
|
The media scan is not enabled by default because it may dramatically increase
|
||||||
|
program runtime and consume a lot of bandwidth on older storage hardware.
|
||||||
|
|
||||||
|
The output of a foreground invocation is captured in the system log.
|
||||||
|
|
||||||
|
The ``xfs_scrub_all`` program walks the list of mounted filesystems and
|
||||||
|
initiates ``xfs_scrub`` for each of them in parallel.
|
||||||
|
It serializes scans for any filesystems that resolve to the same top level
|
||||||
|
kernel block device to prevent resource overconsumption.
|
||||||
|
|
||||||
|
Background Service
|
||||||
|
------------------
|
||||||
|
|
||||||
|
To reduce the workload of system administrators, the ``xfs_scrub`` package
|
||||||
|
provides a suite of `systemd <https://systemd.io/>`_ timers and services that
|
||||||
|
run online fsck automatically on weekends by default.
|
||||||
|
The background service configures scrub to run with as little privilege as
|
||||||
|
possible, the lowest CPU and IO priority, and in a CPU-constrained single
|
||||||
|
threaded mode.
|
||||||
|
This can be tuned by the systemd administrator at any time to suit the latency
|
||||||
|
and throughput requirements of customer workloads.
|
||||||
|
|
||||||
|
The output of the background service is also captured in the system log.
|
||||||
|
If desired, reports of failures (either due to inconsistencies or mere runtime
|
||||||
|
errors) can be emailed automatically by setting the ``EMAIL_ADDR`` environment
|
||||||
|
variable in the following service files:
|
||||||
|
|
||||||
|
* ``xfs_scrub_fail@.service``
|
||||||
|
* ``xfs_scrub_media_fail@.service``
|
||||||
|
* ``xfs_scrub_all_fail.service``
|
||||||
|
|
||||||
|
The decision to enable the background scan is left to the system administrator.
|
||||||
|
This can be done by enabling either of the following services:
|
||||||
|
|
||||||
|
* ``xfs_scrub_all.timer`` on systemd systems
|
||||||
|
* ``xfs_scrub_all.cron`` on non-systemd systems
|
||||||
|
|
||||||
|
This automatic weekly scan is configured out of the box to perform an
|
||||||
|
additional media scan of all file data once per month.
|
||||||
|
This is less foolproof than, say, storing file data block checksums, but much
|
||||||
|
more performant if application software provides its own integrity checking,
|
||||||
|
redundancy can be provided elsewhere above the filesystem, or the storage
|
||||||
|
device's integrity guarantees are deemed sufficient.
|
||||||
|
|
||||||
|
The systemd unit file definitions have been subjected to a security audit
|
||||||
|
(as of systemd 249) to ensure that the xfs_scrub processes have as little
|
||||||
|
access to the rest of the system as possible.
|
||||||
|
This was performed via ``systemd-analyze security``, after which privileges
|
||||||
|
were restricted to the minimum required, sandboxing was set up to the maximal
|
||||||
|
extent possible with sandboxing and system call filtering; and access to the
|
||||||
|
filesystem tree was restricted to the minimum needed to start the program and
|
||||||
|
access the filesystem being scanned.
|
||||||
|
The service definition files restrict CPU usage to 80% of one CPU core, and
|
||||||
|
apply as nice of a priority to IO and CPU scheduling as possible.
|
||||||
|
This measure was taken to minimize delays in the rest of the filesystem.
|
||||||
|
No such hardening has been performed for the cron job.
|
||||||
|
|
||||||
|
Proposed patchset:
|
||||||
|
`Enabling the xfs_scrub background service
|
||||||
|
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_.
|
||||||
|
|
||||||
|
Health Reporting
|
||||||
|
----------------
|
||||||
|
|
||||||
|
XFS caches a summary of each filesystem's health status in memory.
|
||||||
|
The information is updated whenever ``xfs_scrub`` is run, or whenever
|
||||||
|
inconsistencies are detected in the filesystem metadata during regular
|
||||||
|
operations.
|
||||||
|
System administrators should use the ``health`` command of ``xfs_spaceman`` to
|
||||||
|
download this information into a human-readable format.
|
||||||
|
If problems have been observed, the administrator can schedule a reduced
|
||||||
|
service window to run the online repair tool to correct the problem.
|
||||||
|
Failing that, the administrator can decide to schedule a maintenance window to
|
||||||
|
run the traditional offline repair tool to correct the problem.
|
||||||
|
|
||||||
|
**Future Work Question**: Should the health reporting integrate with the new
|
||||||
|
inotify fs error notification system?
|
||||||
|
Would it be helpful for sysadmins to have a daemon to listen for corruption
|
||||||
|
notifications and initiate a repair?
|
||||||
|
|
||||||
|
*Answer*: These questions remain unanswered, but should be a part of the
|
||||||
|
conversation with early adopters and potential downstream users of XFS.
|
||||||
|
|
||||||
|
Proposed patchsets include
|
||||||
|
`wiring up health reports to correction returns
|
||||||
|
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=corruption-health-reports>`_
|
||||||
|
and
|
||||||
|
`preservation of sickness info during memory reclaim
|
||||||
|
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=indirect-health-reporting>`_.
|
||||||
|
Loading…
Reference in New Issue
Block a user