Cluster node becomes unresponsive and services fail after executing SysRq-T in RHEL 5

Solution In Progress - Updated -

Issue

  • A node stopped responding temporarily minutes and services in our cluster went down after running a SysRq-T to dump process states while troubleshooting another problem.
  • A status check on an ip resource failed after triggering a sysrq key
Jun 13 16:49:52 node1 clurgmgrd[16908]: <notice> status on ip "192.168.10.10" returned 1 (generic error) 
  • qdiskd starting reporting the other node was missing updates when I executed a sysrq key on that other node:
  Jun 13 16:49:05 node2 qdiskd[8138]: <debug> Node 1 missed an update (2/20)
  Jun 13 16:49:13 node2 qdiskd[8138]: <debug> Node 1 missed an update (3/20)
  Jun 13 16:49:21 node2 qdiskd[8138]: <debug> Node 1 missed an update (4/20)
  Jun 13 16:49:29 node2 qdiskd[8138]: <debug> Node 1 missed an update (5/20)
  Jun 13 16:49:37 node2 qdiskd[8138]: <debug> Node 1 missed an update (6/20)
  Jun 13 16:49:45 node2 qdiskd[8138]: <debug> Node 1 missed an update (7/20)

Environment

  • Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
  • Executing SysRq keys such as T or P

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content