Cluster node becomes unresponsive and services fail after executing SysRq-T in RHEL 5
Issue
- A node stopped responding temporarily minutes and services in our cluster went down after running a SysRq-T to dump process states while troubleshooting another problem.
- A status check on an
ipresource failed after triggering asysrqkey
Jun 13 16:49:52 node1 clurgmgrd[16908]: <notice> status on ip "192.168.10.10" returned 1 (generic error)
qdiskdstarting reporting the other node was missing updates when I executed asysrqkey on that other node:
Jun 13 16:49:05 node2 qdiskd[8138]: <debug> Node 1 missed an update (2/20)
Jun 13 16:49:13 node2 qdiskd[8138]: <debug> Node 1 missed an update (3/20)
Jun 13 16:49:21 node2 qdiskd[8138]: <debug> Node 1 missed an update (4/20)
Jun 13 16:49:29 node2 qdiskd[8138]: <debug> Node 1 missed an update (5/20)
Jun 13 16:49:37 node2 qdiskd[8138]: <debug> Node 1 missed an update (6/20)
Jun 13 16:49:45 node2 qdiskd[8138]: <debug> Node 1 missed an update (7/20)
Environment
- Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
- Executing
SysRqkeys such as T or P
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.