How can I prevent segfaults, general protection faults, 'Watchdog: Daemon died, rebooting...', and unexpected reboots from clurgmgrd or rgmanager in a RHEL 4, 5, or 6 cluster
Issue
- Why do I see "Watchdog: Daemon died, rebooting..." on a cluster node?
- cluster node unexpected reboot by
clurgmgrd watchdog - A general protection fault occurs and watchdog message is printed which resulted in the cluster node rebooting:
Oct 30 18:30:04 node1 kernel: clurgmgrd[9805] general protection rip:3e9e2729ed rsp:43bc4b50 error:0
Oct 30 18:30:04 node1 clurgmgrd[10484]: <crit> Watchdog: Daemon died, rebooting...
- An update to our
/etc/cluster/cluster.conffile results in a segfault and reboot of a server.
Jun 20 10:46:21 node1 clurgmgrd[12354]: <notice> Reconfiguring
Jun 20 10:46:21 node1 kernel: clurgmgrd[11153]: segfault at 00000000000000c0 rip 0000003d5f842b59 rsp 00000000405f5a20
error 4
Jun 20 10:46:21 node1 clurgmgrd[12353]: <crit> Watchdog: Daemon died, rebooting...
rgmanagersegfaultsclurgmgrdcrashesrgmanagerhas a general protection fault
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
- Red Hat Cluster Suite (RHCS) 4
rgmanager
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.