Server rebooted abruptly on IBM's Reliable Scalable Cluster
Issue
- Why did Server rebooted on IBM's Reliable Scalable Cluster Technology (RSCT) ?
02:55:37 ConfigRM[4334]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:
:::Template ID: 0:::Details File: :::Location: RSCT,ConfigRMGroup.C,1.305,770
:::CONFIGRM_MERGE_ST The sub-domain containing the local node is being dissolved because another
sub-domain has been detected that takes precedence over it. Group services will be ended on each
node of the local sub-domain which will cause the configuration manager daemon (IBM.ConfigRMd) to
force the node offline and then bring it back online in the surviving domain.
02:55:37 cthags[4660]: (Recorded using libct_ffdc.a cv 2):::Error ID:825....7CyjG/jjX.DuhHu.
:::Reference ID: :::Template ID: 0:::Details File: :::Location: RSCT,NS.C,1.XX.1.VV,4755
:::GS_DOM_MERGE_ER Group Services daemon exit to merge domains DIAGNOSTIC EXPLANATION NS::Ack():
The master requests to dissolve my domain because of the merge with other domain 1.49
02:55:37 RMCdaemon[4286]: (Recorded using libct_ffdc.a cv 2):::Error ID: 822....7CyjG/QeK
/DuhHu....................:::Reference ID: :::Template ID: 0:::Details File: :::Location:
RSCT,rmcd_gsi.c,1.50,1048 :::RMCD_2610_101_ER Internal error.
Error data 1 00000001 Error data 2 00000000 Error data 3 dispatch_gs
02:55:37 ConfigRM[4334]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:
:::Template ID: 0:::Details File: :::Location: RSCT,ConfigRMGroup.C,1.305,5264
:::CONFIGRM_EXIT_GS_ER The peer domain configuration manager daemon (IBM.ConfigRMd) is exiting
due to the Group Services subsystem terminating. The configuration manager daemon will restart
automatically, synchronize the nodes configuration with the domain and rejoin the domain if possible.
02:55:37 ConfigRM[4334]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:
:::Template ID: 0:::Details File: :::Location: RSCT,PeerDomain.C,1.RR.22.XX,20865 :::CONFIGRM_REBOOTOS_ER
The operating system is being rebooted to ensure that critical resources are stopped so that another
sub-domain that has operational quorum may recover these resources without causing corruption or conflict.
03:01:00 syslogd 1.4.1: restart.
03:01:01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Environment
- Red Hat Enterprise Linux
- IBM's Reliable Scalable Cluster Technology
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.