What do cciss CHECK CONDITION sense key = 0x3 errors in /var/log/messages mean?

Solution Verified - Updated August 7 2024 at 7:26 AM -

Environment

Red Hat Enterprise Linux 5
Red Hat Enterprise Linux 4
HP Smart Array controller with cciss driver

Issue

What do the following cciss CHECK CONDITION errors in the system logs mean:

cciss: cmd 0000010000540000 has CHECK CONDITION sense key = 0x3

File system goes read-only after CHECK CONDITION sense key 0x3 and Buffer I/O error

kernel: cciss 0000:05:00.0: cciss: c ffff810037e00000 has CHECK CONDITION sense key = 0x3
kernel: Buffer I/O error on device dm-9, logical block 11064
kernel: lost page write due to I/O error on dm-9
kernel: Aborting journal on device dm-9.
kernel: ext3_abort called.
kernel: EXT3-fs error (device dm-9): ext3_journal_start_sb: Detected aborted journal
kernel: Remounting filesystem read-only

Resolution

Verify the hardware (controller, cable, disk etc). Ensure there is a good back-up of any data on this device and run hardware diagnostics.
Engage hardware vendor, typically one or more disks are faulty and need to be replaced.

Note:

Some older Smart Array firmware versions may only report the first faulty drive found within a set of drives even though other additional drives have also failed. Updating the controller firmware will not "fix" a faulty drive reporting a medium error (sense key=0x3). However, to ensure all faulty drives get reported properly, the controller firmware revision should be checked and the firmware updated if not up to the current revision level as recommended by HP. The issue of not having all faulty drives reported properly happened with a model P410i controller while running 5.14 f/w. At that time, f/w 6.00B was the latest available revision. Again, engaging the hardware vendor can provide appropriate guidance on the proper firmware revision level.

Root Cause

A sense key of 0x3 is defined by the SCSI standard as "Medium error" and relates to a hardware defect in the block device:

Sense Key
3h           MEDIUM ERROR.  Indicates that the command terminated with a non-recovered
             error condition that was probably caused by a flaw in the medium or an error
             in the recorded data.  This sense key may also be returned if the target is
             unable to distinguish between a flaw in the medium and a specific hardware 
             failure (sense key 4h).

Diagnostic Steps

Check that errors are present in /var/log/message similar to the following:

cciss: cmd 0000010000540000 has CHECK CONDITION sense key = 0x3

Generally a CHECK CONDITION involves some sort of hardware error. Use the "sense key" value and map it to http://www.t10.org/lists/2sensekey.htm (or http://tldp.org/HOWTO/archived/SCSI-Programming-HOWTO/SCSI-Programming-HOWTO-21.html)
Additional information can be collected for submission when engaging vendor hardware support. For example see "How do I check my Smart Array for logged hardware problems?".

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Ansible.com

Red Hat Ecosystem Catalog

Red Hat Hybrid Cloud Console

Red Hat Store

Red Hat Summit and AnsibleFest

What do cciss CHECK CONDITION sense key = 0x3 errors in /var/log/messages mean?

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links