HP service guard daemon, cmcld caused both cluster nodes to reboot

Solution Unverified - Updated 2025-05-09T01:38:38+00:00 -

Issue

HP service guard daemon, cmcld caused both cluster nodes to reboot due to loss of heartbeat connection.

# cat /var/log/messages | grep -i cmcld 
 17:52:41 host02 cmcld[9928]: Timed out node host01. It may have failed. 
 17:52:41 host02 cmcld[9928]: Attempting to form a new cluster 
 17:52:41 host02 cmcld[9928]: Beginning standard election 
 17:52:42 host02 cmcld[9928]: Sending file $SGRUN/frdump.cmcld.1 (512104 bytes) to file assistant daemon. 
 17:52:42 host02 cmfileassistd[18983]: Updated file /usr/local/cmcluster/run/frdump.cmcld.1 (length = 512104). 
 17:52:42 host02 cmcld[9928]: Subnet 10.x.x.x in package pkg_drgmgr is down.     <-----------------"Subnet down"
 17:53:36 host02 cmcld[9928]: Attempting to form a new cluster 
 17:53:36 host02 cmcld[9928]: Beginning standard election 
 17:53:42 host02 cmcld[9928]: Service cmfileassistd terminated due to an exit(0). 
 17:53:47 host02 cmcld[9928]: Attempting to form a new cluster 
 17:53:47 host02 cmcld[9928]: Beginning standard election 
 21:42:51 host02 cmcld[9361]: Logging level changed to level 0. 
 21:42:51 host02 cmcld[9361]: Daemon Initialization - Maximum number of packages supported for this incarnation is 150. 
 21:42:51 host02 cmcld[9361]: Global Cluster Information: 
 21:42:51 host02 cmcld[9361]: Heartbeat Interval is 2.00 seconds. 
 21:42:51 host02 cmcld[9361]: Node Timeout is 8.00 seconds. 
 21:42:51 host02 cmcld[9361]: Max reformation duration is 100.00 seconds. 
 21:42:51 host02 cmcld[9361]: Network Polling Interval is 2.00 seconds. 
 21:42:51 host02 cmcld[9361]: IO Timeout Extension is 0.00 seconds. 
 21:42:51 host02 cmcld[9361]: Auto Start Timeout is 600.00 seconds. 
 21:42:51 host02 cmcld[9361]: Failover Optimization is disabled. 
 21:42:51 host02 cmcld[9361]: Information Specific to node host02: 
 21:42:51 host02 cmcld[9361]: bond0  0x00:xx:xx:xx:xx:xx   10.x.x.x bridged net:1 
 21:42:52 host02 cmcld[9361]: Heartbeat Subnet: 10.168.x.x
 21:42:52 host02 cmcld[9361]: The maximum # of concurrent local connections to the daemon that will be supported is 962. 
 21:42:52 host02 cmcld[9361]: rcomm health:  Initializing timeout to 142500000 microseconds 
 21:42:52 host02 cmcld[9361]: Warning. No cluster lock is configured. 
 21:42:52 host02 cmcld[9361]: Total allocated: 24625152 bytes, used: 121920 bytes, unused 24503232 bytes 
 21:42:52 host02 cmcld[9361]: wait event from serv_assistant_port 
 21:42:52 host02 cmcld[9361]: Starting cluster management protocols. 
 21:42:52 host02 cmcld[9361]: Attempting to form a new cluster 
 21:42:52 host02 cmcld[9361]: Beginning standard election 
 21:42:58 host02 cmcld[9361]: Turning on safety time protection 
 21:42:58 host02 cmcld[9361]: 2 nodes have formed a new cluster, sequence #2 
 21:42:58 host02 cmcld[9361]: The new active cluster membership is: host01(id=2), host02(id=3) 
 21:42:59 host02 cmcld[9361]: Request from node host01 to start package pkg_drgmgr on node host02. 
 21:42:59 host02 cmcld[9361]: Executing '/usr/local/cmcluster/conf/pkg_drgmgr/drgmgr.cntl  start' for package pkg_drgmgr, as service PKG*11010. 
 21:43:19 host02 cmcld[9361]: Service PKG*11010 terminated due to an exit(0). 
 21:43:19 host02 cmcld[9361]: Started package pkg_drgmgr on node host02.

Another log from a different node:

May  2 10:11:49  firewalld[691560]: WARNING: AllowZoneDrifting is enabled. This is considered an insecu
re configuration option. It will be removed in a future release. Please consider disabling it now.
May  2 10:11:51  cmcld[17163]: Member host0 seems unhealthy, not receiving heartbeats from it.  <<<----
May  2 10:11:51  cmcld[17163]: Member host1 seems unhealthy, not receiving heartbeats from it.
May  2 10:12:03  cmcld[17163]: Timed out unhealthy member(s).
May  2 10:12:03  cmcld[17163]: Lost heartbeat to host0
May  2 10:12:03  cmcld[17163]: Lost heartbeat to host1
May  2 10:12:03  cmcld[17163]: Resolving quorum with members 
May  2 10:12:03  cmcld[17163]: Quorum denied
May  2 10:12:03  cmcld[17163]: Deamon suspended as it lost the quorum.
May  2 10:12:03  cmcld[17140]: force: fd=13, safety_active=1, safety_enabled=1, toc_forced=1

--Rebooted--

May  2 10:12:19  kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-553.44.1.el8_10.x86_64 root=/dev/mapper/vg00-root ro crashkernel=auto resume=/dev/mapper/vg00-swap rd.lvm.lv=vg00/root rd.lvm.lv=vg00/swap rd.lvm.lv=vg00/usr biosdevname=0 ipv6.disable=1 net.ifnames=0 rhgb quiet

Another Reboot:

May  2 11:28:02  cmcld[9766]: Member host0 seems unhealthy, not receiving heartbeats from it.
May  2 11:28:02  cmcld[9766]: Member host1 seems unhealthy, not receiving heartbeats from it. <<<----
May  2 11:28:12  cmcld[9766]: Timed out unhealthy member(s).
May  2 11:28:12  cmcld[9766]: Lost heartbeat to host0  <<<----
May  2 11:28:12  cmcld[9766]: Lost heartbeat to host1  <<<----
May  2 11:28:12  cmcld[9750]: force: fd=13, safety_active=1, safety_enabled=1, toc_forced=1
May  2 11:28:12  cmcld[9766]: Resolving quorum with members 
May  2 11:28:12  cmcld[9766]: Quorum denied    <<<----
May  2 11:28:12  cmcld[9766]: Deamon suspended as it lost the quorum.  <<<----

--Rebooted--

May  2 11:28:28  kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-553.44.1.el8_10.x86_64 root=/dev/mapper/vg00-root ro crashkernel=auto resume=/dev/mapper/vg00-swap rd.lvm.lv=vg00/root rd.lvm.lv=vg00/swap rd.lvm.lv=vg00/usr biosdevname=0 ipv6.disable=1 net.ifnames=0 rhgb quiet

Environment

Red Hat Enterprise Linux
HP ServiceGuard software

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

HP service guard daemon, cmcld caused both cluster nodes to reboot

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links