What is the meaning of "lost async page write" or "lost page write due to I/O error" ?
Environment
- Red Hat Enterprise Linux (RHEL) 5, 6, 7, 8
Issue
-
Messages similar to the following are seen in the logs:
kernel: lost page write due to I/O error on dm-29 --- kernel: Buffer I/O error on dev sdc, logical block 4, lost async page write kernel: Buffer I/O error on dev sdc, logical block 5, lost async page write
-
Should I be concerned about messages indicating
lost page write due to I/O error
?
Resolution
- Check the switch and storage array controllers for errors or link failures
- Review the messages prior to these in
/var/log/messages
for clues as to what may have caused the lost page write (these are usually accompanied by more descriptive errors).
Root Cause
This is a serious error and potentially indicates data loss has occurred. There can be many root causes, depending on the specific storage configuration:
-
If SAN storage was involved, paths to LUNs may have been lost from a cable pull, storage reconfiguration, switches, link failure, or disk overprovisioning.
-
If
device-mapper-multipath
was in use, all paths may have been lost, andqueue_if_no_path
was not explicitly set on the multipath map, orno_path_retry
exhausted all retries. To determine if one of these was the case:
queue_if_no_path
will be displayed in the "features" in the output ofmultipath -ll
if it is configured for that device:mpatha (360014380125989a10000400001300000) dm-6 HP,HSV300 size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=50 status=active |- 0:0:0:7 sdg 8:96 active ready running `- 0:0:3:7 sdah 66:16 active ready running
no_path_retry
may be specified in/etc/multipath.conf
in a device section. It may also be a default setting for that device type, and if so, should be listed in/usr/share/doc/device-mapper-multipath-<version>/multipath.conf.defaults
. Search for your device vendor/product and see if that device block contains a value forno_path_retry
. If it is not listed in either location, thenno_path_retry
is not enabled on the device.
-
If multipath is not in use, a SCSI device
WRITE
command timed out. -
An
async
read-operation would be serviced from the page cache and if the page which needs to be read from the cache is not yet marked asPG_uptodate
then such read operation would fail and would need filesystem's journaling capabilities to sync the read-cache for marking those pages as valid once again.
Diagnostic Steps
Follow basic steps for data recovery, such as umounting any filesystem involved with the device, and running fsck
.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments