HP service guard daemon, cmcld caused both cluster nodes to reboot

Solution Unverified - Updated -

Issue

  • HP service guard daemon, cmcld caused both cluster nodes to reboot due to loss of heartbeat connection.
# cat /var/log/messages | grep -i cmcld 
 17:52:41 host02 cmcld[9928]: Timed out node host01. It may have failed. 
 17:52:41 host02 cmcld[9928]: Attempting to form a new cluster 
 17:52:41 host02 cmcld[9928]: Beginning standard election 
 17:52:42 host02 cmcld[9928]: Sending file $SGRUN/frdump.cmcld.1 (512104 bytes) to file assistant daemon. 
 17:52:42 host02 cmfileassistd[18983]: Updated file /usr/local/cmcluster/run/frdump.cmcld.1 (length = 512104). 
 17:52:42 host02 cmcld[9928]: Subnet 10.x.x.x in package pkg_drgmgr is down.     <-----------------"Subnet down"
 17:53:36 host02 cmcld[9928]: Attempting to form a new cluster 
 17:53:36 host02 cmcld[9928]: Beginning standard election 
 17:53:42 host02 cmcld[9928]: Service cmfileassistd terminated due to an exit(0). 
 17:53:47 host02 cmcld[9928]: Attempting to form a new cluster 
 17:53:47 host02 cmcld[9928]: Beginning standard election 
 21:42:51 host02 cmcld[9361]: Logging level changed to level 0. 
 21:42:51 host02 cmcld[9361]: Daemon Initialization - Maximum number of packages supported for this incarnation is 150. 
 21:42:51 host02 cmcld[9361]: Global Cluster Information: 
 21:42:51 host02 cmcld[9361]: Heartbeat Interval is 2.00 seconds. 
 21:42:51 host02 cmcld[9361]: Node Timeout is 8.00 seconds. 
 21:42:51 host02 cmcld[9361]: Max reformation duration is 100.00 seconds. 
 21:42:51 host02 cmcld[9361]: Network Polling Interval is 2.00 seconds. 
 21:42:51 host02 cmcld[9361]: IO Timeout Extension is 0.00 seconds. 
 21:42:51 host02 cmcld[9361]: Auto Start Timeout is 600.00 seconds. 
 21:42:51 host02 cmcld[9361]: Failover Optimization is disabled. 
 21:42:51 host02 cmcld[9361]: Information Specific to node host02: 
 21:42:51 host02 cmcld[9361]: bond0  0x00:xx:xx:xx:xx:xx   10.x.x.x bridged net:1 
 21:42:52 host02 cmcld[9361]: Heartbeat Subnet: 10.168.x.x
 21:42:52 host02 cmcld[9361]: The maximum # of concurrent local connections to the daemon that will be supported is 962. 
 21:42:52 host02 cmcld[9361]: rcomm health:  Initializing timeout to 142500000 microseconds 
 21:42:52 host02 cmcld[9361]: Warning. No cluster lock is configured. 
 21:42:52 host02 cmcld[9361]: Total allocated: 24625152 bytes, used: 121920 bytes, unused 24503232 bytes 
 21:42:52 host02 cmcld[9361]: wait event from serv_assistant_port 
 21:42:52 host02 cmcld[9361]: Starting cluster management protocols. 
 21:42:52 host02 cmcld[9361]: Attempting to form a new cluster 
 21:42:52 host02 cmcld[9361]: Beginning standard election 
 21:42:58 host02 cmcld[9361]: Turning on safety time protection 
 21:42:58 host02 cmcld[9361]: 2 nodes have formed a new cluster, sequence #2 
 21:42:58 host02 cmcld[9361]: The new active cluster membership is: host01(id=2), host02(id=3) 
 21:42:59 host02 cmcld[9361]: Request from node host01 to start package pkg_drgmgr on node host02. 
 21:42:59 host02 cmcld[9361]: Executing '/usr/local/cmcluster/conf/pkg_drgmgr/drgmgr.cntl  start' for package pkg_drgmgr, as service PKG*11010. 
 21:43:19 host02 cmcld[9361]: Service PKG*11010 terminated due to an exit(0). 
 21:43:19 host02 cmcld[9361]: Started package pkg_drgmgr on node host02. 

Environment

  • Red Hat Enterprise Linux
  • HP ServiceGuard software

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content