HP service guard daemon, cmcld caused both cluster nodes to reboot
Issue
- HP service guard daemon, cmcld caused both cluster nodes to reboot due to loss of heartbeat connection.
# cat /var/log/messages | grep -i cmcld
17:52:41 host02 cmcld[9928]: Timed out node host01. It may have failed.
17:52:41 host02 cmcld[9928]: Attempting to form a new cluster
17:52:41 host02 cmcld[9928]: Beginning standard election
17:52:42 host02 cmcld[9928]: Sending file $SGRUN/frdump.cmcld.1 (512104 bytes) to file assistant daemon.
17:52:42 host02 cmfileassistd[18983]: Updated file /usr/local/cmcluster/run/frdump.cmcld.1 (length = 512104).
17:52:42 host02 cmcld[9928]: Subnet 10.x.x.x in package pkg_drgmgr is down. <-----------------"Subnet down"
17:53:36 host02 cmcld[9928]: Attempting to form a new cluster
17:53:36 host02 cmcld[9928]: Beginning standard election
17:53:42 host02 cmcld[9928]: Service cmfileassistd terminated due to an exit(0).
17:53:47 host02 cmcld[9928]: Attempting to form a new cluster
17:53:47 host02 cmcld[9928]: Beginning standard election
21:42:51 host02 cmcld[9361]: Logging level changed to level 0.
21:42:51 host02 cmcld[9361]: Daemon Initialization - Maximum number of packages supported for this incarnation is 150.
21:42:51 host02 cmcld[9361]: Global Cluster Information:
21:42:51 host02 cmcld[9361]: Heartbeat Interval is 2.00 seconds.
21:42:51 host02 cmcld[9361]: Node Timeout is 8.00 seconds.
21:42:51 host02 cmcld[9361]: Max reformation duration is 100.00 seconds.
21:42:51 host02 cmcld[9361]: Network Polling Interval is 2.00 seconds.
21:42:51 host02 cmcld[9361]: IO Timeout Extension is 0.00 seconds.
21:42:51 host02 cmcld[9361]: Auto Start Timeout is 600.00 seconds.
21:42:51 host02 cmcld[9361]: Failover Optimization is disabled.
21:42:51 host02 cmcld[9361]: Information Specific to node host02:
21:42:51 host02 cmcld[9361]: bond0 0x00:xx:xx:xx:xx:xx 10.x.x.x bridged net:1
21:42:52 host02 cmcld[9361]: Heartbeat Subnet: 10.168.x.x
21:42:52 host02 cmcld[9361]: The maximum # of concurrent local connections to the daemon that will be supported is 962.
21:42:52 host02 cmcld[9361]: rcomm health: Initializing timeout to 142500000 microseconds
21:42:52 host02 cmcld[9361]: Warning. No cluster lock is configured.
21:42:52 host02 cmcld[9361]: Total allocated: 24625152 bytes, used: 121920 bytes, unused 24503232 bytes
21:42:52 host02 cmcld[9361]: wait event from serv_assistant_port
21:42:52 host02 cmcld[9361]: Starting cluster management protocols.
21:42:52 host02 cmcld[9361]: Attempting to form a new cluster
21:42:52 host02 cmcld[9361]: Beginning standard election
21:42:58 host02 cmcld[9361]: Turning on safety time protection
21:42:58 host02 cmcld[9361]: 2 nodes have formed a new cluster, sequence #2
21:42:58 host02 cmcld[9361]: The new active cluster membership is: host01(id=2), host02(id=3)
21:42:59 host02 cmcld[9361]: Request from node host01 to start package pkg_drgmgr on node host02.
21:42:59 host02 cmcld[9361]: Executing '/usr/local/cmcluster/conf/pkg_drgmgr/drgmgr.cntl start' for package pkg_drgmgr, as service PKG*11010.
21:43:19 host02 cmcld[9361]: Service PKG*11010 terminated due to an exit(0).
21:43:19 host02 cmcld[9361]: Started package pkg_drgmgr on node host02.
Environment
- Red Hat Enterprise Linux
- HP ServiceGuard software
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.