What do cciss CHECK CONDITION sense key = 0x3 errors in /var/log/messages mean?
Environment
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 4
- HP Smart Array controller with cciss driver
Issue
- What do the following cciss CHECK CONDITION errors in the system logs mean:
cciss: cmd 0000010000540000 has CHECK CONDITION sense key = 0x3
- File system goes read-only after CHECK CONDITION sense key 0x3 and Buffer I/O error
kernel: cciss 0000:05:00.0: cciss: c ffff810037e00000 has CHECK CONDITION sense key = 0x3
kernel: Buffer I/O error on device dm-9, logical block 11064
kernel: lost page write due to I/O error on dm-9
kernel: Aborting journal on device dm-9.
kernel: ext3_abort called.
kernel: EXT3-fs error (device dm-9): ext3_journal_start_sb: Detected aborted journal
kernel: Remounting filesystem read-only
Resolution
- Verify the hardware (controller, cable, disk etc). Ensure there is a good back-up of any data on this device and run hardware diagnostics.
- Engage hardware vendor, typically one or more disks are faulty and need to be replaced.
Note:
Some older Smart Array firmware versions may only report the first faulty drive found within a set of drives even though other additional drives have also failed. Updating the controller firmware will not "fix" a faulty drive reporting a medium error (sense key=0x3). However, to ensure all faulty drives get reported properly, the controller firmware revision should be checked and the firmware updated if not up to the current revision level as recommended by HP. The issue of not having all faulty drives reported properly happened with a model P410i controller while running 5.14 f/w. At that time, f/w 6.00B was the latest available revision. Again, engaging the hardware vendor can provide appropriate guidance on the proper firmware revision level.
Root Cause
- A sense key of 0x3 is defined by the SCSI standard as "Medium error" and relates to a hardware defect in the block device:
Sense Key 3h MEDIUM ERROR. Indicates that the command terminated with a non-recovered error condition that was probably caused by a flaw in the medium or an error in the recorded data. This sense key may also be returned if the target is unable to distinguish between a flaw in the medium and a specific hardware failure (sense key 4h).
Diagnostic Steps
- Check that errors are present in
/var/log/message
similar to the following:
cciss: cmd 0000010000540000 has CHECK CONDITION sense key = 0x3
-
Generally a CHECK CONDITION involves some sort of hardware error. Use the "sense key" value and map it to http://www.t10.org/lists/2sensekey.htm (or http://tldp.org/HOWTO/archived/SCSI-Programming-HOWTO/SCSI-Programming-HOWTO-21.html)
-
Additional information can be collected for submission when engaging vendor hardware support. For example see "How do I check my Smart Array for logged hardware problems?".
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments