What do cciss CHECK CONDITION sense key = 0x3 errors in /var/log/messages mean?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 4
  • HP Smart Array controller with cciss driver

Issue

  • What do the following cciss CHECK CONDITION errors in the system logs mean:
cciss: cmd 0000010000540000 has CHECK CONDITION sense key = 0x3
  • File system goes read-only after CHECK CONDITION sense key 0x3 and Buffer I/O error
kernel: cciss 0000:05:00.0: cciss: c ffff810037e00000 has CHECK CONDITION sense key = 0x3
kernel: Buffer I/O error on device dm-9, logical block 11064
kernel: lost page write due to I/O error on dm-9
kernel: Aborting journal on device dm-9.
kernel: ext3_abort called.
kernel: EXT3-fs error (device dm-9): ext3_journal_start_sb: Detected aborted journal
kernel: Remounting filesystem read-only

Resolution

  • Verify the hardware (controller, cable, disk etc). Ensure there is a good back-up of any data on this device and run hardware diagnostics.
  • Engage hardware vendor, typically one or more disks are faulty and need to be replaced.

Note:

Some older Smart Array firmware versions may only report the first faulty drive found within a set of drives even though other additional drives have also failed. Updating the controller firmware will not "fix" a faulty drive reporting a medium error (sense key=0x3). However, to ensure all faulty drives get reported properly, the controller firmware revision should be checked and the firmware updated if not up to the current revision level as recommended by HP. The issue of not having all faulty drives reported properly happened with a model P410i controller while running 5.14 f/w. At that time, f/w 6.00B was the latest available revision. Again, engaging the hardware vendor can provide appropriate guidance on the proper firmware revision level.

Root Cause

  • A sense key of 0x3 is defined by the SCSI standard as "Medium error" and relates to a hardware defect in the block device:
Sense Key
3h           MEDIUM ERROR.  Indicates that the command terminated with a non-recovered
             error condition that was probably caused by a flaw in the medium or an error
             in the recorded data.  This sense key may also be returned if the target is
             unable to distinguish between a flaw in the medium and a specific hardware 
             failure (sense key 4h).

Diagnostic Steps

  • Check that errors are present in /var/log/message similar to the following:
cciss: cmd 0000010000540000 has CHECK CONDITION sense key = 0x3

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments