Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 15. Checking and repairing a file system

RHEL provides file system administration utilities which are capable of checking and repairing file systems. These tools are often referred to as fsck tools, where fsck is a shortened version of file system check. In most cases, these utilities are run automatically during system boot, if needed, but can also be manually invoked if required.

Important

File system checkers guarantee only metadata consistency across the file system. They have no awareness of the actual data contained within the file system and are not data recovery tools.

15.1. Scenarios that require a file system check

The relevant fsck tools can be used to check your system if any of the following occurs:

  • System fails to boot
  • Files on a specific disk become corrupt
  • The file system shuts down or changes to read-only due to inconsistencies
  • A file on the file system is inaccessible

File system inconsistencies can occur for various reasons, including but not limited to hardware errors, storage administration errors, and software bugs.

Important

File system check tools cannot repair hardware problems. A file system must be fully readable and writable if repair is to operate successfully. If a file system was corrupted due to a hardware error, the file system must first be moved to a good disk, for example with the dd(8) utility.

For journaling file systems, all that is normally required at boot time is to replay the journal if required and this is usually a very short operation.

However, if a file system inconsistency or corruption occurs, even for journaling file systems, then the file system checker must be used to repair the file system.

Important

It is possible to disable file system check at boot by setting the sixth field in /etc/fstab to 0. However, Red Hat does not recommend doing so unless you are having issues with fsck at boot time, for example with extremely large or remote file systems.

Additional resources

  • fstab(5) man page.
  • fsck(8) man page.
  • dd(8) man page.

15.2. Potential side effects of running fsck

Generally, running the file system check and repair tool can be expected to automatically repair at least some of the inconsistencies it finds. In some cases, the following issues can arise:

  • Severely damaged inodes or directories may be discarded if they cannot be repaired.
  • Significant changes to the file system may occur.

To ensure that unexpected or undesirable changes are not permanently made, ensure you follow any precautionary steps outlined in the procedure.

15.3. Error-handling mechanisms in XFS

This section describes how XFS handles various kinds of errors in the file system.

Unclean unmounts

Journalling maintains a transactional record of metadata changes that happen on the file system.

In the event of a system crash, power failure, or other unclean unmount, XFS uses the journal (also called log) to recover the file system. The kernel performs journal recovery when mounting the XFS file system.

Corruption

In this context, corruption means errors on the file system caused by, for example:

  • Hardware faults
  • Bugs in storage firmware, device drivers, the software stack, or the file system itself
  • Problems that cause parts of the file system to be overwritten by something outside of the file system

When XFS detects corruption in the file system or the file-system metadata, it may shut down the file system and report the incident in the system log. Note that if the corruption occurred on the file system hosting the /var directory, these logs will not be available after a reboot.

Example 15.1. System log entry reporting an XFS corruption

# dmesg --notime | tail -15

XFS (loop0): Mounting V5 Filesystem
XFS (loop0): Metadata CRC error detected at xfs_agi_read_verify+0xcb/0xf0 [xfs], xfs_agi block 0x2
XFS (loop0): Unmount and run xfs_repair
XFS (loop0): First 128 bytes of corrupted metadata buffer:
00000000027b3b56: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000005f9abc7a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000005b0aef35: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000da9d2ded: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000001e265b07: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000006a40df69: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000000b272907: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000e484aac5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
XFS (loop0): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x2 len 1 error 74
XFS (loop0): xfs_imap_lookup: xfs_ialloc_read_agi() returned error -117, agno 0
XFS (loop0): Failed to read root inode 0x80, error 11

User-space utilities usually report the Input/output error message when trying to access a corrupted XFS file system. Mounting an XFS file system with a corrupted log results in a failed mount and the following error message:

mount: /mount-point: mount(2) system call failed: Structure needs cleaning.

You must manually use the xfs_repair utility to repair the corruption.

Additional resources

  • xfs_repair(8) man page.

15.4. Checking an XFS file system with xfs_repair

This procedure performs a read-only check of an XFS file system using the xfs_repair utility. You must manually use the xfs_repair utility to repair any corruption. Unlike other file system repair utilities, xfs_repair does not run at boot time, even when an XFS file system was not cleanly unmounted. In the event of an unclean unmount, XFS simply replays the log at mount time, ensuring a consistent file system; xfs_repair cannot repair an XFS file system with a dirty log without remounting it first.

Note

Although an fsck.xfs binary is present in the xfsprogs package, this is present only to satisfy initscripts that look for an fsck.file system binary at boot time. fsck.xfs immediately exits with an exit code of 0.

Procedure

  1. Replay the log by mounting and unmounting the file system:

    # mount file-system
    # umount file-system
    Note

    If the mount fails with a structure needs cleaning error, the log is corrupted and cannot be replayed. The dry run should discover and report more on-disk corruption as a result.

  2. Use the xfs_repair utility to perform a dry run to check the file system. Any errors are printed and an indication of the actions that would be taken, without modifying the file system.

    # xfs_repair -n block-device
  3. Mount the file system:

    # mount file-system

Additional resources

  • xfs_repair(8) man page.
  • xfs_metadump(8) man page.

15.5. Repairing an XFS file system with xfs_repair

This procedure repairs a corrupted XFS file system using the xfs_repair utility.

Procedure

  1. Create a metadata image prior to repair for diagnostic or testing purposes using the xfs_metadump utility. A pre-repair file system metadata image can be useful for support investigations if the corruption is due to a software bug. Patterns of corruption present in the pre-repair image can aid in root-cause analysis.

    • Use the xfs_metadump debugging tool to copy the metadata from an XFS file system to a file. The resulting metadump file can be compressed using standard compression utilities to reduce the file size if large metadump files need to be sent to support.

      # xfs_metadump block-device metadump-file
  2. Replay the log by remounting the file system:

    # mount file-system
    # umount file-system
  3. Use the xfs_repair utility to repair the unmounted file system:

    • If the mount succeeded, no additional options are required:

      # xfs_repair block-device
    • If the mount failed with the Structure needs cleaning error, the log is corrupted and cannot be replayed. Use the -L option (force log zeroing) to clear the log:

      Warning

      This command causes all metadata updates in progress at the time of the crash to be lost, which might cause significant file system damage and data loss. This should be used only as a last resort if the log cannot be replayed.

      # xfs_repair -L block-device
  4. Mount the file system:

    # mount file-system

Additional resources

  • xfs_repair(8) man page.

15.6. Error handling mechanisms in ext2, ext3, and ext4

The ext2, ext3, and ext4 file systems use the e2fsck utility to perform file system checks and repairs. The file names fsck.ext2, fsck.ext3, and fsck.ext4 are hardlinks to the e2fsck utility. These binaries are run automatically at boot time and their behavior differs based on the file system being checked and the state of the file system.

A full file system check and repair is invoked for ext2, which is not a metadata journaling file system, and for ext4 file systems without a journal.

For ext3 and ext4 file systems with metadata journaling, the journal is replayed in userspace and the utility exits. This is the default action because journal replay ensures a consistent file system after a crash.

If these file systems encounter metadata inconsistencies while mounted, they record this fact in the file system superblock. If e2fsck finds that a file system is marked with such an error, e2fsck performs a full check after replaying the journal (if present).

Additional resources

  • fsck(8) man page.
  • e2fsck(8) man page.

15.7. Checking an ext2, ext3, or ext4 file system with e2fsck

This procedure checks an ext2, ext3, or ext4 file system using the e2fsck utility.

Procedure

  1. Replay the log by remounting the file system:

    # mount file-system
    # umount file-system
  2. Perform a dry run to check the file system.

    # e2fsck -n block-device
    Note

    Any errors are printed and an indication of the actions that would be taken, without modifying the file system. Later phases of consistency checking may print extra errors as it discovers inconsistencies which would have been fixed in early phases if it were running in repair mode.

Additional resources

  • e2image(8) man page.
  • e2fsck(8) man page.

15.8. Repairing an ext2, ext3, or ext4 file system with e2fsck

This procedure repairs a corrupted ext2, ext3, or ext4 file system using the e2fsck utility.

Procedure

  1. Save a file system image for support investigations. A pre-repair file system metadata image can be useful for support investigations if the corruption is due to a software bug. Patterns of corruption present in the pre-repair image can aid in root-cause analysis.

    Note

    Severely damaged file systems may cause problems with metadata image creation.

    • If you are creating the image for testing purposes, use the -r option to create a sparse file of the same size as the file system itself. e2fsck can then operate directly on the resulting file.

      # e2image -r block-device image-file
    • If you are creating the image to be archived or provided for diagnostic, use the -Q option, which creates a more compact file format suitable for transfer.

      # e2image -Q block-device image-file
  2. Replay the log by remounting the file system:

    # mount file-system
    # umount file-system
  3. Automatically repair the file system. If user intervention is required, e2fsck indicates the unfixed problem in its output and reflects this status in the exit code.

    # e2fsck -p block-device

    Additional resources

    • e2image(8) man page.
    • e2fsck(8) man page.