Red Hat Training
A Red Hat training course is available for Red Hat Enterprise Linux
3.8. Configuring Error Behavior
When an error occurs during an I/O operation, the XFS driver responds in one of two ways:
- Continue retries until either:
- the I/O operation succeeds, or
- an I/O operation retry count or time limit is exceeded.
- Consider the error permanent and halt the system.
XFS currently recognizes the following error conditions for which you can configure the desired behavior specifically:
EIO
: Error while trying to write to the deviceENOSPC
: No space left on the deviceENODEV
: Device cannot be found
All other possible error conditions, which do not have specific handlers defined, share a single, global configuration.
You can set the conditions under which XFS deems the errors permanent, both in the maximum number of retries and the maximum time in seconds. XFS stops retrying when any one of the conditions is met.
There is also an option to immediately cancel the retries when unmounting the file system, regardless of any other configuration. This allows the unmount operation to succeed even in case of persistent errors.
3.8.1. Configuration Files for Specific and Undefined Conditions
Configuration files controlling error behavior are located in the
/sys/fs/xfs/device/error/
directory.
The
/sys/fs/xfs/device/error/metadata/
directory contains subdirectories for each specific error condition:
/sys/fs/xfs/device/error/metadata/EIO/
for theEIO
error condition/sys/fs/xfs/device/error/metadata/ENODEV/
for theENODEV
error condition/sys/fs/xfs/device/error/metadata/ENOSPC/
for theENOSPC
error condition
Each one then contains the following configuration files:
/sys/fs/xfs/device/error/metadata/condition/max_retries
: controls the maximum number of times that XFS retries the operation./sys/fs/xfs/device/error/metadata/condition/retry_timeout_seconds
: the time limit in seconds after which XFS will stop retrying the operation
All other possible error conditions, apart from those described in the previous section, share a common configuration in these files:
/sys/fs/xfs/device/error/metadata/default/max_retries
: controls the maximum number of retries/sys/fs/xfs/device/error/metadata/default/retry_timeout_seconds
: controls the time limit for retrying
3.8.2. Setting File System Behavior for Specific and Undefined Conditions
To set the maximum number of retries, write the desired number to the
max_retries
file.
- For specific conditions:
#
echo value > /sys/fs/xfs/device/error/metadata/condition/max_retries
- For undefined conditions:
#
echo value > /sys/fs/xfs/device/error/metadata/default/max_retries
value is a number between
-1
and the maximum possible value of int
, the C signed integer type. This is 2147483647
on 64-bit Linux.
To set the time limit, write the desired number of seconds to the
retry_timeout_seconds
file.
- For specific conditions:
#
echo value > /sys/fs/xfs/device/error/metadata/condition/retry_timeout_seconds
- For undefined conditions:
#
echo value > /sys/fs/xfs/device/error/metadata/default/retry_timeout_seconds
value is a number between
-1
and 86400
, which is the number of seconds in a day.
In both the
max_retries
and retry_timeout_seconds
options, -1
means to retry forever and 0
to stop immediately.
device is the name of the device, as found in the
/dev/
directory; for example, sda
.
Note
The default behavior for a each error condition is dependent on the error context. Some errors, like
ENODEV
, are considered to be fatal and unrecoverable, regardless of the retry count, so their default value is 0
.
3.8.3. Setting Unmount Behavior
If the
fail_at_unmount
option is set, the file system overrides all other error configurations during unmount, and immediately umnounts the file system without retrying the I/O operation. This allows the unmount operation to succeed even in case of persistent errors.
To set the unmount behavior:
#
echo value > /sys/fs/xfs/device/error/fail_at_unmount
value is either
1
or 0
:
1
means to cancel retrying immediately if an error is found.0
means to respect themax_retries
andretry_timeout_seconds
options.
device is the name of the device, as found in the
/dev/
directory; for example, sda
.
Important
The
fail_at_unmount
option has to be set as desired before attempting to unmount the file system. After an unmount operation has started, the configuration files and directories may be unavailable.