System hangs with I/O stuck in Smart Array adapter (cciss driver) on Red Hat Enterprise Linux
Issue
- "INFO: task <taskname>:<pid> blocked for more than 120 seconds." messages
-
We are informed by sybase administrator, they could not properly stop sybase, and even kill is not working with sybase id. We even tried to kill process with root id, but failed to do so. Even sync is not able to properly sync the server to proper stop the server
-
Many processes in State D waiting on IO against cciss backed disks for many hours.
- Server is unresponsive lot of process in D state
- Oracle Host hang and fence frequently
- Server became unresponsive multiple times
- provides disk via NFS to a number of servers, hung during Netbackup use to backup modified files
- Server becomes unresponsive. No ssh, however nfs access functions for a short period of time before also becoming unresponsive.
- The hangs were initially infrequent and the frequency has been increasing as the server has been put under greater load.
- The system does respond to pings when it hangs.
- We have a three node oracle RAC with Red Hat 5U5 64bit running. Nodes keep hanging about every 1-2 hour.
- When the systems hangs you are not able to login anymore also not over the console
- System hang on access to cciss drive
- Run the box as NFS server, and when three NFS clients are reading/writing on the NFS file system at the same time, NFS server gets hang.
Environment
-
Red Hat Enterprise Linux 4 and 5 (reported on at least 4.3, 4.4, 4.6, 4.8, 5.2, 5.3, 5.5)
-
HP Smart Array Controllers including:
-
P400, P410, P410i
$ cat /proc/driver/cciss/cciss0 | grep -e Smart -e Version cciss0: HP Smart Array P400 Controller Firmware Version: 5.26
-
-
P800
-
CCISS driver provided by Red Hat Enterprise Linux kernel and/or cciss driver provided by HP
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.