Cluster node becomes unresponsive and services fail after executing SysRq-T in RHEL 5
Issue
- A node stopped responding temporarily minutes and services in our cluster went down after running a SysRq-T to dump process states while troubleshooting another problem.
- A status check on an
ip
resource failed after triggering asysrq
key
Jun 13 16:49:52 node1 clurgmgrd[16908]: <notice> status on ip "192.168.10.10" returned 1 (generic error)
qdiskd
starting reporting the other node was missing updates when I executed asysrq
key on that other node:
Jun 13 16:49:05 node2 qdiskd[8138]: <debug> Node 1 missed an update (2/20)
Jun 13 16:49:13 node2 qdiskd[8138]: <debug> Node 1 missed an update (3/20)
Jun 13 16:49:21 node2 qdiskd[8138]: <debug> Node 1 missed an update (4/20)
Jun 13 16:49:29 node2 qdiskd[8138]: <debug> Node 1 missed an update (5/20)
Jun 13 16:49:37 node2 qdiskd[8138]: <debug> Node 1 missed an update (6/20)
Jun 13 16:49:45 node2 qdiskd[8138]: <debug> Node 1 missed an update (7/20)
Environment
- Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
- Executing
SysRq
keys such as T or P
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.