What is the meaning of "lost async page write" or "lost page write due to I/O error" ?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 5, 6, 7, 8

Issue

  • Messages similar to the following are seen in the logs:

    kernel: lost page write due to I/O error on dm-29
    ---
    kernel: Buffer I/O error on dev sdc, logical block 4, lost async page write
    kernel: Buffer I/O error on dev sdc, logical block 5, lost async page write
    
  • Should I be concerned about messages indicating lost page write due to I/O error?

Resolution

  • Check the switch and storage array controllers for errors or link failures
  • Review the messages prior to these in /var/log/messages for clues as to what may have caused the lost page write (these are usually accompanied by more descriptive errors).

Root Cause

This is a serious error and potentially indicates data loss has occurred. There can be many root causes, depending on the specific storage configuration:

  • If SAN storage was involved, paths to LUNs may have been lost from a cable pull, storage reconfiguration, switches, link failure, or disk overprovisioning.

  • If device-mapper-multipath was in use, all paths may have been lost, and queue_if_no_path was not explicitly set on the multipath map, or no_path_retry exhausted all retries. To determine if one of these was the case:
    queue_if_no_path will be displayed in the "features" in the output of multipath -ll if it is configured for that device:

    mpatha (360014380125989a10000400001300000) dm-6 HP,HSV300
    size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
    `-+- policy='round-robin 0' prio=50 status=active
    |- 0:0:0:7 sdg  8:96   active ready running
    `- 0:0:3:7 sdah 66:16  active ready running
    
    • no_path_retry may be specified in /etc/multipath.conf in a device section. It may also be a default setting for that device type, and if so, should be listed in /usr/share/doc/device-mapper-multipath-<version>/multipath.conf.defaults. Search for your device vendor/product and see if that device block contains a value for no_path_retry. If it is not listed in either location, then no_path_retry is not enabled on the device.
  • If multipath is not in use, a SCSI device WRITE command timed out.

  • An async read-operation would be serviced from the page cache and if the page which needs to be read from the cache is not yet marked as PG_uptodate then such read operation would fail and would need filesystem's journaling capabilities to sync the read-cache for marking those pages as valid once again.

Diagnostic Steps

Follow basic steps for data recovery, such as umounting any filesystem involved with the device, and running fsck.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments