Cluster node becomes unresponsive and services fail after executing SysRq-T in RHEL 5

Solution In Progress - Updated -

Issue

  • A node stopped responding temporarily minutes and services in our cluster went down after running a SysRq-T to dump process states while troubleshooting another problem.
  • A status check on an ip resource failed after triggering a sysrq key
Jun 13 16:49:52 node1 clurgmgrd[16908]: <notice> status on ip "192.168.10.10" returned 1 (generic error) 
  • qdiskd starting reporting the other node was missing updates when I executed a sysrq key on that other node:
  Jun 13 16:49:05 node2 qdiskd[8138]: <debug> Node 1 missed an update (2/20)
  Jun 13 16:49:13 node2 qdiskd[8138]: <debug> Node 1 missed an update (3/20)
  Jun 13 16:49:21 node2 qdiskd[8138]: <debug> Node 1 missed an update (4/20)
  Jun 13 16:49:29 node2 qdiskd[8138]: <debug> Node 1 missed an update (5/20)
  Jun 13 16:49:37 node2 qdiskd[8138]: <debug> Node 1 missed an update (6/20)
  Jun 13 16:49:45 node2 qdiskd[8138]: <debug> Node 1 missed an update (7/20)

Environment

  • Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
  • Executing SysRq keys such as T or P

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.