Chapter 12. Configuring XFS error behavior

You can configure how an XFS file system behaves when it encounters different I/O errors.

12.1. Configurable error handling in XFS

The XFS file system responds in one of the following ways when an error occurs during an I/O operation:

  • XFS repeatedly retries the I/O operation until the operation succeeds or XFS reaches a set limit.

    The limit is based either on a maximum number of retries or a maximum time for retries.

  • XFS considers the error permanent and stops the operation on the file system.

You can configure how XFS reacts to the following error conditions:

EIO
Error when reading or writing
ENOSPC
No space left on the device
ENODEV
Device cannot be found

You can set the maximum number of retries and the maximum time in seconds until XFS considers an error permanent. XFS stops retrying the operation when it reaches either of the limits.

You can also configure XFS so that when unmounting a file system, XFS immediately cancels the retries regardless of any other configuration. This configuration enables the unmount operation to succeed despite persistent errors.

Default behavior

The default behavior for each XFS error condition depends on the error context. Some XFS errors such as ENODEV are considered to be fatal and unrecoverable, regardless of the retry count. Their default retry limit is 0.

12.2. Configuration files for specific and undefined XFS error conditions

The following directories store configuration files that control XFS error behavior for different error conditions:

/sys/fs/xfs/device/error/metadata/EIO/
For the EIO error condition
/sys/fs/xfs/device/error/metadata/ENODEV/
For the ENODEV error condition
/sys/fs/xfs/device/error/metadata/ENOSPC/
For the ENOSPC error condition
/sys/fs/xfs/device/error/default/
Common configuration for all other, undefined error conditions

Each directory contains the following configuration files for configuring retry limits:

max_retries
Controls the maximum number of times that XFS retries the operation.
retry_timeout_seconds
Specifies the time limit in seconds after which XFS stops retrying the operation.

12.3. Setting XFS behavior for specific conditions

This procedure configures how XFS reacts to specific error conditions.

Procedure

  • Set the maximum number of retries, the retry time limit, or both:

    • To set the maximum number of retries, write the desired number to the max_retries file:

      # echo value > /sys/fs/xfs/device/error/metadata/condition/max_retries
    • To set the time limit, write the desired number of seconds to the retry_timeout_seconds file:

      # echo value > /sys/fs/xfs/device/error/metadata/condition/retry_timeout_second

    value is a number between -1 and the maximum possible value of the C signed integer type. This is 2147483647 on 64-bit Linux.

    In both limits, the value -1 is used for continuous retries and 0 to stop immediately.

    device is the name of the device, as found in the /dev/ directory; for example, sda.

12.4. Setting XFS behavior for undefined conditions

This procedure configures how XFS reacts to all undefined error conditions, which share a common configuration.

Procedure

  • Set the maximum number of retries, the retry time limit, or both:

    • To set the maximum number of retries, write the desired number to the max_retries file:

      # echo value > /sys/fs/xfs/device/error/metadata/default/max_retries
    • To set the time limit, write the desired number of seconds to the retry_timeout_seconds file:

      # echo value > /sys/fs/xfs/device/error/metadata/default/retry_timeout_seconds

    value is a number between -1 and the maximum possible value of the C signed integer type. This is 2147483647 on 64-bit Linux.

    In both limits, the value -1 is used for continuous retries and 0 to stop immediately.

    device is the name of the device, as found in the /dev/ directory; for example, sda.

12.5. Setting the XFS unmount behavior

This procedure configures how XFS reacts to error conditions when unmounting the file system.

If you set the fail_at_unmount option in the file system, it overrides all other error configurations during unmount, and immediately unmounts the file system without retrying the I/O operation. This allows the unmount operation to succeed even in case of persistent errors.

Warning

You cannot change the fail_at_unmount value after the unmount process starts, because the unmount process removes the configuration files from the sysfs interface for the respective file system. You must configure the unmount behavior before the file system starts unmounting.

Procedure

  • Enable or disable the fail_at_unmount option:

    • To cancel retrying all operations when the file system unmounts, enable the option:

      # echo 1 > /sys/fs/xfs/device/error/fail_at_unmount
    • To respect the max_retries and retry_timeout_seconds retry limits when the file system unmounts, disable the option:

      # echo 0 > /sys/fs/xfs/device/error/fail_at_unmount

    device is the name of the device, as found in the /dev/ directory; for example, sda.