Following Failover of SAPHana resource, the new secondary takes a long time to start
Issue
After the primary node fails and takes over, the secondary node takes some time to start and continuously reports monitor failures. These monitor failures trigger repeated restarts of SAPHana on the new secondary node, until we finally report "Register successful":
$ grep "Mar 9 07.*SAPHana_RH1_00" /var/log/messages
-----------------------------------------8<-----------------------------------------
1st loop, resource start
Mar 9 07:42:11 rhel8-node1 SAPHana(SAPHana_RH1_00)[11118]: INFO: RA ==== begin action start_clone (0.162.1) ====
Mar 9 07:42:13 rhel8-node1 SAPHana(SAPHana_RH1_00)[11118]: WARNING: RA: HANA_CALL stderr from command 'su - rh1adm -c' is 'logname: no login name', stderr from command 'python landscapeHostConfiguration.py' is ''
Mar 9 07:42:13 rhel8-node1 SAPHana(SAPHana_RH1_00)[11118]: INFO: DEC: hana_rh1_site_srHook_RH1HA00A=PRIM
Mar 9 07:42:13 rhel8-node1 SAPHana(SAPHana_RH1_00)[11118]: INFO: DEC: Finally get_SRHOOK()=PRIM
Mar 9 07:42:13 rhel8-node1 SAPHana(SAPHana_RH1_00)[11118]: INFO: LPA: landscape: DOWN, LPA: register ==> try to register
Mar 9 07:42:13 rhel8-node1 SAPHana(SAPHana_RH1_00)[11118]: INFO: DEC: ANOTHER HANA IS AVAILABLE ==> LETS REGISTER
Mar 9 07:42:13 rhel8-node1 pacemaker-attrd[9854]: notice: Setting master-SAPHana_RH1_00[rhel8-node1]: 5 -> 0
Mar 9 07:42:33 rhel8-node1 SAPHana(SAPHana_RH1_00)[11118]: INFO: RA ==== end action start_clone with rc=0 (0.162.1) (22s)====
Mar 9 07:42:33 rhel8-node1 pacemaker-controld[9856]: notice: Result of start operation for SAPHana_RH1_00 on rhel8-node1: ok
Mar 9 07:42:33 rhel8-node1 pacemaker-controld[9856]: notice: Requesting local execution of monitor operation for SAPHana_RH1_00 on rhel8-node1
1st loop, resource monitor fails and we restart:
Mar 9 07:42:33 rhel8-node1 SAPHana(SAPHana_RH1_00)[12139]: INFO: RA ==== begin action monitor_clone (0.162.1) ====
Mar 9 07:42:33 rhel8-node1 SAPHana(SAPHana_RH1_00)[12139]: INFO: DEC: hana_rh1_site_srHook_RH1HA00A=PRIM
Mar 9 07:42:33 rhel8-node1 SAPHana(SAPHana_RH1_00)[12139]: INFO: DEC: Finally get_SRHOOK()=PRIM
Mar 9 07:42:35 rhel8-node1 SAPHana(SAPHana_RH1_00)[12139]: INFO: ACT: systemd service SAPRH1_00.service is active
Mar 9 07:42:35 rhel8-node1 SAPHana(SAPHana_RH1_00)[12139]: ERROR: SAP instance service hdbdaemon status color is GRAY !
Mar 9 07:42:35 rhel8-node1 SAPHana(SAPHana_RH1_00)[12139]: WARNING: RA: HANA_CALL stderr from command 'su - rh1adm -c' is 'logname: no login name', stderr from command 'python landscapeHostConfiguration.py' is ''
Mar 9 07:42:35 rhel8-node1 SAPHana(SAPHana_RH1_00)[12139]: INFO: DEC: PRIMDEFECT (in DEMOTED status)
Mar 9 07:42:35 rhel8-node1 SAPHana(SAPHana_RH1_00)[12139]: INFO: RA ==== end action monitor_clone with rc=7 (0.162.1) (2s)====
Mar 9 07:42:35 rhel8-node1 pacemaker-controld[9856]: notice: Result of monitor operation for SAPHana_RH1_00 on rhel8-node1: not running <--- reported "not running"
-----------------------------------------8<-----------------------------------------
Above pattern repeats until,
Last loop and reported successful register:
Mar 9 07:51:30 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: RA ==== begin action start_clone (0.162.1) ====
Mar 9 07:51:32 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: WARNING: RA: HANA_CALL stderr from command 'su - rh1adm -c' is 'logname: no login name'
, stderr from command 'python landscapeHostConfiguration.py' is ''
Mar 9 07:51:32 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: DEC: hana_rh1_site_srHook_RH1HA00A=PRIM
Mar 9 07:51:32 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: DEC: Finally get_SRHOOK()=PRIM
Mar 9 07:51:32 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: LPA: landscape: DOWN, LPA: register ==> try to register
Mar 9 07:51:32 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: DEC: ANOTHER HANA IS AVAILABLE ==> LETS REGISTER
Mar 9 07:51:32 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: ACT: REGISTER: hdbnsutil -sr_register --remoteHost=rhel8-node2 --remoteInstance=00 --replicationMode=syncmem --operationMode=logreplay --name=RH1HA00A
Mar 9 07:51:35 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: ACT: Register successful <---
Mar 9 07:51:36 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: ACT: systemd service SAPRH1_00.service is active
Mar 9 07:51:36 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: DEC: hana_rh1_site_srHook_RH1HA00A=PRIM
Mar 9 07:51:36 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: ACT: Starting SAPHANA Instance RH1-HDB00: #01209.09.2024 07:51:36#012Start#012OK
Mar 9 07:52:28 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: ACT: SAPHANA Instance RH1-HDB00 started: #01209.09.2024 07:52:28#012WaitforStarted#012OK
Mar 9 07:52:28 rhel8-node1 SAPHana(SAPHana_RH1_00)[46280]: INFO: RA ==== end action start_clone with rc=0 (0.162.1) (58s)====
Mar 9 07:52:28 rhel8-node1 pacemaker-controld[9856]: notice: Result of start operation for SAPHana_RH1_00 on rhel8-node1: ok
Mar 9 07:52:28 rhel8-node1 pacemaker-controld[9856]: notice: Requesting local execution of monitor operation for SAPHana_RH1_00 on rhel8-node1
Mar 9 07:52:28 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: RA ==== begin action monitor_clone (0.162.1) ====
Mar 9 07:52:28 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: hana_rh1_site_srHook_RH1HA00A=SOK
Mar 9 07:52:28 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: Finally get_SRHOOK()=SOK
Mar 9 07:52:29 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: ACT: systemd service SAPRH1_00.service is active
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: hana_rh1_site_srHook_RH1HA00A=SOK
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: Finally get_SRHOOK()=SOK
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: secondary with sync status SOK ==> possible takeover node
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: hana_rh1_site_srHook_RH1HA00A=SOK
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: Finally get_SRHOOK()=SOK
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: saphana_monitor_secondary: scoring_crm_master(4:S:master1:master:worker:master,SOK)
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: scoring_crm_master: roles(4:S:master1:master:worker:master) are matching pattern ([234]*:S:[^:]*:master)
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: scoring_crm_master: sync(SOK) is matching syncPattern (SOK)
Mar 9 07:52:31 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: DEC: scoring_crm_master: set score 100
Mar 9 07:52:32 rhel8-node1 pacemaker-attrd[9854]: notice: Setting master-SAPHana_RH1_00[rhel8-node1]: 0 -> 100
Mar 9 07:52:32 rhel8-node1 SAPHana(SAPHana_RH1_00)[51687]: INFO: RA ==== end action monitor_clone with rc=0 (0.162.1) (4s)====
Environment
- Red Hat Enterprise Linux 7, 8, 9 with High Availability Add-On
SAPHanaresource agent
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.