Automating SAP HANA Scale-Up System Replication using the RHEL HA Add-On

Note: For guidelines on how to set up a RHEL HA Add-On based cluster for managing SAP HANA Scale-Up System Replication on RHEL 8, please use the version of the documentation available in the RHEL 8 for SAP Solutions product documentation: Automating SAP HANA Scale-Up System Replication using the RHEL HA Add-On.

Note: For guidelines on how to set up a RHEL HA Add-On based cluster for managing SAP HANA Scale-Up System Replication on RHEL 9, please use the version of the documentation available in the RHEL 9 for SAP Solutions product documentation: Automating SAP HANA Scale-Up System Replication using the RHEL HA Add-On.

1. Overview
- 1.1 Supported Scenarios
- 1.2. Subscription and Repos
  - 1.2.1. On-Premise or Bring Your Own Subscription through Cloud Access
  - 1.2.2. On-Demand on Public Clouds through RHUI
2. SAP HANA System Replication
3. Configuring monitoring account in SAP HANA for cluster resource agents (SAP HANA 1.0 SPS12 and earlier)
- 3.1. Creating monitoring user
- 3.2. Store monitoring user credentials on all nodes
4. Configuring SAP HANA in a pacemaker cluster

1. Overview

This article describes how to configure Automated HANA System Replication in Scale-Up in a Pacemaker cluster on supported RHEL releases.

This article does NOT cover preparation of a RHEL system for SAP HANA installation nor the SAP HANA installation procedure. For more details on these topics refer to SAP Note 2009879 - SAP HANA Guidelines for RedHat Enterprise Linux (RHEL).

1.1. Supported scenarios

See: Support Policies for RHEL High Availability Clusters - Management of SAP HANA in a Cluster

1.2. Subscription and Repos

The following repos are required:

RHEL 7.x
- RHEL Server: provides the RHEL kernel packages
- RHEL HA Add-On: provides the Pacemaker framework
- RHEL for SAP HANA: provides the resource agents for the automation of HANA System Replication in Scale-Up

1.2.1. On-Premise or Bring Your Own Subscription through Cloud Access

For on-premise or Bring Your Own Subscription through Red Hat Cloud Access, the subscription to use is RHEL for SAP Solutions.

RHEL 7.x: below is the example of repos enabled with RHEL for SAP Solutions 7.6, on-premise or through Cloud Access:

# yum repolist
repo id                                                  repo name                                                                                                status
rhel-7-server-e4s-rpms/7Server/x86_64                    Red Hat Enterprise Linux 7 Server - Update Services for SAP Solutions (RPMs)                             18,929
rhel-ha-for-rhel-7-server-e4s-rpms/7Server/x86_64        Red Hat Enterprise Linux High Availability (for RHEL 7 Server) Update Services for SAP Solutions (RPMs)     437
rhel-sap-hana-for-rhel-7-server-e4s-rpms/7Server/x86_64  RHEL for SAP HANA (for RHEL 7 Server) Update Services for SAP Solutions (RPMs)                               38

1.2.2. On-Demand on Public Clouds through RHUI

For deployment in on-demand images on public cloud, the software packages are delivered in Red Hat Enterprise Linux for SAP with High Availability and Update Services, a variant of RHEL for SAP Solutions, customized for public clouds, available through RHUI.

Below is the example of repos enabled on a RHUI system with RHEL for SAP with High Availability and Update Services 7.5. For configuration of Automated HANA System Replication in Scale-Up, the following repos must present:

# yum repolist
repo id                                                        repo name                                                                  status
rhui-rhel-7-server-rhui-eus-rpms/7.5/x86_64                   Red Hat Enterprise Linux 7 Server - Extended Update Support (RPMs) from RH 21,199
rhui-rhel-ha-for-rhel-7-server-eus-rhui-rpms/7.5/x86_64       Red Hat Enterprise Linux High Availability from RHUI (for RHEL 7 Server) -    501
rhui-rhel-sap-hana-for-rhel-7-server-eus-rhui-rpms/7.5/x86_64 RHEL for SAP HANA (for RHEL 7 Server) Extended Update Support (RPMs) from      43

2. SAP HANA System Replication

The following example shows how to set up system replication between 2 nodes running SAP HANA.

Configuration used in the example:

SID:                   RH2
Instance Number:       02
node1 FQDN:            node1.example.com
node2 FQDN:            node2.example.com
node1 HANA site name:  DC1
node2 HANA site name:  DC2
SAP HANA 'SYSTEM' user password: <HANA_SYSTEM_PASSWORD>
SAP HANA administrative user:    rh2adm

Ensure that both systems can resolve the FQDN of both systems without issues. To ensure that FQDNs can be resolved even without DNS you can place them into /etc/hosts like in the example below.

# /etc/hosts
192.168.0.11 node1.example.com node1
192.168.0.12 node2.example.com node2

For the system replication to work, the SAP HANA log_mode variable must be set to normal. This can be verified as HANA system user using the command below on both nodes.

[rh2adm]# hdbsql -u system -p <HANA_SYSTEM_PASSWORD> -i 02 "select value from "SYS"."M_INIFILE_CONTENTS" where key='log_mode'"
VALUE "normal"
1 row selected

Note that later configuration of primary and secondary node is used only during setup. The roles (primary/secondary) may change during cluster operation based on cluster configuration.

A lot of the configuration steps are performed from the SAP HANA administrative user on the system whose name was selected during installation. In examples we will use rh2adm as we use SID RH2. To become the SAP HANA administrative user you can use the command below.

[root]# sudo -i -u rh2adm
[rh2adm]#

2.1. Configure HANA primary node

SAP HANA system replication will only work after initial backup has been performed. The following command will create an initial backup in /tmp/foo directory. Please note that the size of the backup depends on the database size and may take some time to complete. The directory to which the backup will be placed must by writeable by the SAP HANA administrative user.

a) On single container systems following command can be used for backup:

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "BACKUP DATA USING FILE ('/tmp/foo')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

b) On multiple container systems (MDC) SYSTEMDB and all tenant databases needs to be backed up:

Example below is on the backup of SYSTEMDB. Please check SAP documentation on how to backup tenant databases.

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> -d SYSTEMDB "BACKUP DATA USING FILE ('/tmp/foo')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> -d SYSTEMDB "BACKUP DATA FOR RH2 USING FILE ('/tmp/foo-RH2')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

After the initial backup, initialize the replication using the command below.

[rh2adm]# hdbnsutil -sr_enable --name=DC1
checking for active nameserver ...
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.

Verify that initialization is showing current node as 'primary' and that SAP HANA is running on it.

[rh2adm]# hdbnsutil -sr_state
checking for active or inactive nameserver ...
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: primary
site id: 1
site name: DC1
Host Mappings:

2.2. Configure HANA secondary node

Secondary node needs to be registered to, now running, primary node. SAP HANA on the secondary node must be shut down before using the command bellow.

[rh2adm]# HDB stop

(SAP HANA2.0 only) Copy the SAP HANA system PKI SSFS_RH2.KEY and SSFS_RH2.DAT files from primary node to secondary node.

[rh2adm]# scp root@node1:/usr/sap/RH2/SYS/global/security/rsecssfs/key/SSFS_RH2.KEY /usr/sap/RH2/SYS/global/security/rsecssfs/key/SSFS_RH2.KEY
[rh2adm]# scp root@node1:/usr/sap/RH2/SYS/global/security/rsecssfs/data/SSFS_RH2.DAT /usr/sap/RH2/SYS/global/security/rsecssfs/data/SSFS_RH2.DAT

To register secondary node use the command below.

[rh2adm]# hdbnsutil -sr_register --remoteHost=node1 --remoteInstance=02 --replicationMode=syncmem --name=DC2
adding site ...
checking for inactive nameserver ...
nameserver node2:30201 not responding.
collecting information ...
updating local ini files ...
done.

Start SAP HANA on the secondary node.

[rh2adm]# HDB start

Verify that the secondary node is running and that 'mode' is syncmem. Output should look similar to the output below.

[rh2adm]# hdbnsutil -sr_state
checking for active or inactive nameserver ...

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: syncmem
site id: 2
site name: DC2
active primary site: 1

Host Mappings:
~~~~~~~~~~~~~~
node2 -> [DC1] node1
node2 -> [DC2] node2

2.3. Testing SAP HANA System Replication

To manually test the SAP HANA System Replication setup you can follow the procedure described in following SAP documents:

SAP HANA 1.0: chapter "8. Testing" - How to Perform System Replication for SAP HANA 1.0 guide
SAP HANA 2.0: chapter "9. Testing" - How to Perform System Replication for SAP HANA 2.0 guide

2.4. Checking SAP HANA System Replication state

To check the current state of SAP HANA System Replication you can execute the following command as the SAP HANA administrative user on current primary SAP HANA node.

On single_container system:

[rh2adm]# python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py

| Host  | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    |
|       |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details |
| ----- | ----- | ------------ | --------- | ------- | --------- | --------- | --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- |
| node1 | 30201 | nameserver   |         1 |       1 | DC1       | node2     |     30201 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| node1 | 30207 | xsengine     |         2 |       1 | DC1       | node2     |     30207 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| node1 | 30203 | indexserver  |         3 |       1 | DC1       | node2     |     30203 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

On multiple_containers system (MDC):

[rh2adm]# python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py
| Database | Host  | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    |
|          |       |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details |
| -------- | ----- | ----- | ------------ | --------- | ------- | --------- | ----------| --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- |
| SYSTEMDB | node1 | 30201 | nameserver   |         1 |       1 | DC1       | node2     |     30201 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| RH2      | node1 | 30207 | xsengine     |         2 |       1 | DC1       | node2     |     30207 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| RH2      | node1 | 30203 | indexserver  |         3 |       1 | DC1       | node2     |     30203 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

3. Configuring monitoring account in SAP HANA for cluster resource agents (SAP HANA 1.0 SPS12 and earlier)

Starting with SAP HANA 2.0 SPS0 monitoring account is no longer needed
A technical user with CATALOG READ and MONITOR ADMIN privileges must exist in SAP HANA for the resource agents to be able to run queries on the system replication status. The example below shows how to create such a user, assign him the correct permissions and disable password expiration for this user.

monitoring user username: rhelhasync
monitoring user password: <MONITORING_USER_PASSWORD>

3.1. Creating monitoring user

When SAP HANA System replication is active then only the primary system is able to access the database. Accessing the secondary system will fail.

On the primary system run the following commands to create the monitoring user.

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "create user rhelhasync password \"<MONITORING_USER_PASSWORD>\""
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "grant CATALOG READ to rhelhasync"
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "grant MONITOR ADMIN to rhelhasync"
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "ALTER USER rhelhasync DISABLE PASSWORD LIFETIME"

3.2. Store monitoring user credentials on all nodes

The SAP HANA userkey allows the "root" user on OS level to access SAP HANA via monitoring user without asking for password. This is needed by resource agents so they can run queries on HANA System Replication status.

[root]# /usr/sap/RH2/HDB02/exe/hdbuserstore SET SAPHANARH2SR localhost:30215 rhelhasync "<MONITORING_USER_PASSWORD>"

To verify that the userkey has been created correctly in root's userstore, you can run hdbuserstore list command on each node and check if the monitoring account is present in the output as shown below:

[root]# /usr/sap/RH2/HDB02/exe/hdbuserstore list

DATA FILE      :  /root/.hdb/node1/SSFS_HDB.DAT
KEY FILE       :  /root/.hdb/node1/SSFS_HDB.KEY

KEY SAPHANARH2SR
  ENV : localhost:30215
  USER: rhelhasync

Please also verify that it is possible to run hdbsql commands as root using the SAPHANASR userkey without being prompted for a password by running the following command on the primary node of the SAP HANA SR setup:

[root]# /usr/sap/RH2/HDB02/exe/hdbsql -U SAPHANARH2SR -i 02 "select distinct REPLICATION_STATUS from SYS.M_SERVICE_REPLICATION"
REPLICATION_STATUS
"ACTIVE"
1 row selected

If you get an error message about issues with the password or if you are prompted for a password please verify with hdbsql command or HANA Studio that the password for the user created with the hdbsql commands above is not configured 'to be changed on first login' or that the password has not expired. You can use the command below.
(Note: be sure to use the name of monitoring user in capital letters)

[root]# /usr/sap/RH2/HDB02/exe/hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "select * from sys.users where USER_NAME='RHELHASYNC'"

USER_NAME,USER_ID,USER_MODE,EXTERNAL_IDENTITY,CREATOR,CREATE_TIME,VALID_FROM,VALID_UNTIL,LAST_SUCCESSFUL_CONNECT,LAST_INVALID_CONNECT_ATTEMPT,INVALID_CONNECT_A
TTEMPTS,ADMIN_GIVEN_PASSWORD,LAST_PASSWORD_CHANGE_TIME,PASSWORD_CHANGE_NEEDED,IS_PASSWORD_LIFETIME_CHECK_ENABLED,USER_DEACTIVATED,DEACTIVATION_TIME,IS_PASSWORD
_ENABLED,IS_KERBEROS_ENABLED,IS_SAML_ENABLED,IS_X509_ENABLED,IS_SAP_LOGON_TICKET_ENABLED,IS_SAP_ASSERTION_TICKET_ENABLED,IS_RESTRICTED,IS_CLIENT_CONNECT_ENABLE
D,HAS_REMOTE_USERS,PASSWORD_CHANGE_TIME
"RHELHASYNC",156529,"LOCAL",?,"SYSTEM","2017-05-12 15:10:49.971000000","2017-05-12 15:10:49.971000000",?,"2017-05-12 15:21:12.117000000",?,0,"TRUE","2017-05-12
 15:10:49.971000000","FALSE","FALSE","FALSE",?,"TRUE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","TRUE","FALSE",?
1 row selected

4. Configuring SAP HANA in a pacemaker cluster

Please refer to the following documentation to first set up a pacemaker cluster. Note that the cluster must conform to article Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH.

This guide will assume that following things are working properly:

Pacemaker cluster is configured according to documentation and has proper and working fencing
SAP HANA startup on boot is disabled on all cluster nodes as the start and stop will be managed by the cluster
SAP HANA system replication and takeover using tools from SAP are working properly between cluster nodes
Both nodes are subscribed to the required channels:
- RHEL 7: 'High-availability' and 'RHEL for SAP HANA' (https://access.redhat.com/solutions/2334521)) channels

4.1. Install resource agents and other components required for managing SAP HANA Scale-Up System Replication using the RHEL HA Add-On

[root]# yum install resource-agents-sap-hana

Note: this will only install the resource agents and additional components required to set up this HA solution. The configuration steps documented in the following sections must still be carried out for a fully operable setup that is supported by Red Hat.

4.2. Enable the SAP HANA srConnectionChanged() hook

As documented in SAP's Implementing a HA/DR Provider, recent versions of SAP HANA provide so called "hooks" that allow SAP HANA to send out notifications for certain events. The srConnectionChanged() hook can be used to improve the ability of the cluster to detect when a change in the status of the HANA System Replication occurs that requires the cluster to take action, and to avoid data loss/data corruption by preventing accidental takeovers to be triggered in situations where this should be avoided. When using SAP HANA 2.0 SPS0 or later and a version of the resource-agents-sap-hana that provides the components for supporting the srConnectionChanged() hook it is required to enable the hook before proceeding with the cluster setup.

4.2.1. Verify that a version of the `resource-agents-sap-hana` package is installed that provides the components to enable the srConnectionChanged() hook

Please verify that the correct version of the resource-agents-sap-hana package providing the components required to enable the srConnectionChanged() hook for your version of RHEL is installed as documented in the following article: Is the srConnectionChanged() hook supported with the Red Hat High Availability solution for SAP HANA Scale-up System Replication?

4.2.2. Activate the srConnectionChanged() hook on all SAP HANA instances

Note: the steps to activate the srConnectionChanged() hook need to be performed for each SAP HANA instance.

Stop the cluster on both nodes and verify that the HANA instances are stopped completely.
```
[root]# pcs cluster stop --all
```
Install the hook script into the /hana/shared/myHooks directory for each HANA instance and make sure it has the correct ownership on all nodes (replace rh2adm with the username of the admin user of the HANA instances).
```
[root]# mkdir -p /hana/shared/myHooks
[root]# cp /usr/share/SAPHanaSR/srHook/SAPHanaSR.py /hana/shared/myHooks
[root]# chown -R rh2adm:sapsys /hana/shared/myHooks
```
Update the global.ini file on each node to enable use of the hook script by both HANA instances (e.g., in file /hana/shared/RH2/global/hdb/custom/config/global.ini):
```
[ha_dr_provider_SAPHanaSR]
provider = SAPHanaSR
path = /hana/shared/myHooks
execution_order = 1

[trace]
ha_dr_saphanasr = info
```

On each cluster node create the file /etc/sudoers.d/20-saphana by running sudo visudo -f /etc/sudoers.d/20-saphana and add the contents below to allow the hook script to update the node attributes when the srConnectionChanged() hook is called.
Replace rh2 with the lowercase SID of your HANA installation and replace DC1 and DC2 with your HANA site names.

Cmnd_Alias DC1_SOK   = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC1 -v SOK -t crm_config -s SAPHanaSR
Cmnd_Alias DC1_SFAIL = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC1 -v SFAIL -t crm_config -s SAPHanaSR
Cmnd_Alias DC2_SOK   = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC2 -v SOK -t crm_config -s SAPHanaSR
Cmnd_Alias DC2_SFAIL = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC2 -v SFAIL -t crm_config -s SAPHanaSR
rh2adm ALL=(ALL) NOPASSWD: DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL
Defaults!DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL !requiretty

For further information on why the Defaults setting is needed see The srHook attribute is set to SFAIL in a Pacemaker cluster managing SAP HANA system replication, even though replication is in a healthy state.

Start both HANA instances manually without starting the cluster.
Verify that the hook script is working as expected. Perform some action to trigger the hook, such as stopping a HANA instance. Then check whether the hook logged anything using a method such as the one below.
```
[rh2adm]# cdtrace
[rh2adm]# awk '/ha_dr_SAPHanaSR.*crm_attribute/ { printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_*
2018-05-04 12:34:04.476445 ha_dr_SAPHanaSR SFAIL
2018-05-04 12:53:06.316973 ha_dr_SAPHanaSR SOK
[rh2adm]# grep ha_dr_ *
```
Note: For more information please check SAP doc Install and Configure a HA/DR Provider Script.
When the functionality of the hook has been verified the cluster can be started again.
```
[root]# pcs cluster start --all
```

4.3. Configure general cluster properties

To avoid unnecessary failovers of the resources during initial testing and post production, set the following default values for the resource-stickiness and migration-threshold parameters. Note that defaults do not apply to resources which override them with their own defined values.

[root]# pcs resource defaults resource-stickiness=1000
[root]# pcs resource defaults migration-threshold=5000

Notes:
1. It is sufficient to run the commands above on one node of the cluster.
2. Previous versions of this document recommended setting these defaults for the initial testing of the cluster setup, but removing them after production. Due to customer feedback and additional testing, it has been determined that it is beneficial to use these defaults for production cluster setups as well.
3. The command resource-stickiness=1000 will encourage the resource to stay running where it is, while migration-threshold=5000 will cause the resource to move to a new node after 5000 failures. 5000 is generally sufficient in preventing the resource from prematurely failing over to another node. This also ensures that the resource failover time stays within a controllable limit.

Previous versions of this guide recommended setting the no-quorum-policy to ignore, which is currently NOT supported. In the default configuration the no-quorum policy property of the cluster does not need to be modified. To achieve the behavior provided by this option see Can I configure pacemaker to continue to manage resources after a loss of quorum in RHEL 6 or 7?

4.4. Create cloned SAPHanaTopology resource

SAPHanaTopology resource gathers status and configuration of SAP HANA System Replication on each node. In addition, it starts and monitors the local SAP HostAgent which is required for starting, stopping, and monitoring the SAP HANA instances. It has the following attributes:

Attribute Name	Required?	Default value	Description
SID	yes	null	The SAP System Identifier (SID) of the SAP HANA installation (must be identical for all nodes). Example: RH2
InstanceNumber	yes	null	The Instance Number of the SAP HANA installation (must be identical for all nodes). Example: 02

Below is an example command to create the SAPHanaTopology cloned resource.

Note: the timeouts shown below for the resource operations are only examples and may need to be adjusted depending on the actual SAP HANA setup (for example large HANA databases can take longer to start up therefore the start timeout may have to be increased.)

[root]# pcs resource create SAPHanaTopology_RH2_02 SAPHanaTopology SID=RH2 InstanceNumber=02 \
op start timeout=600 \
op stop timeout=300 \
op monitor interval=10 timeout=600 \
clone clone-max=2 clone-node-max=1 interleave=true

Resulting resource should look like the following.

[root]# pcs resource show SAPHanaTopology_RH2_02-clone

 Clone: SAPHanaTopology_RH2_02-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: SAPHanaTopology_RH2_02 (class=ocf provider=heartbeat type=SAPHanaTopology)
   Attributes: SID=RH2 InstanceNumber=02
   Operations: start interval=0s timeout=600 (SAPHanaTopology_RH2_02-start-interval-0s)
               stop interval=0s timeout=300 (SAPHanaTopology_RH2_02-stop-interval-0s)
               monitor interval=10 timeout=600 (SAPHanaTopology_RH2_02-monitor-interval-10s)

Once the resource is started you will see the collected information stored in the form of node attributes that can be viewed with the command crm_mon -A1. Below is an example of what attributes can look like when only SAPHanaTopology is started.

[root]# crm_mon -A1
...
Node Attributes:
* Node node1:
    + hana_rh2_remoteHost               : node2
    + hana_rh2_roles                    : 1:P:master1::worker:
    + hana_rh2_site                     : DC1
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node1
* Node node2:
    + hana_rh2_remoteHost               : node1
    + hana_rh2_roles                    : 1:S:master1::worker:
    + hana_rh2_site                     : DC2
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node2
...

4.5. Create Master/Slave SAPHana resource

The SAPHana resource agent manages two SAP HANA instances (databases) that are configured in HANA System Replication.

Attribute Name	Required?	Default value	Description
SID	yes	null	The SAP System Identifier (SID) of the SAP HANA installation (must be identical for all nodes). Example: RH2
InstanceNumber	yes	null	The Instance Number of the SAP HANA installation (must be identical for all nodes). Example: 02
PREFER_SITE_TAKEOVER	no	null	Should resource agent prefer to switch over to the secondary instance instead of restarting primary locally? `true`: do prefer takeover to the secondary site; `false`: do prefer restart locally; `never`: under no circumstances do a takeover to the other node
AUTOMATED_REGISTER	no	false	If a takeover event has occurred, and the `DUPLICATE_PRIMARY_TIMEOUT` has expired, should the former primary instance be registered as secondary? ("false": no, manual intervention will be needed; "true": yes, the former primary will be registered by resource agent as secondary) [1]
DUPLICATE_PRIMARY_TIMEOUT	no	7200	The time difference (in seconds) needed between two primary time stamps, if a `dual-primary` situation occurs. If the time difference is less than the time gap, the cluster will hold one or both instances in a "WAITING" status. This is to give the system admin a chance to react to a takeover. After the time difference has passed, if `AUTOMATED_REGISTER` is set to `true`, the failed former primary will be registered as secondary. After the registration to the new primary, all data on the former primary will be overwritten by the system replication.

[1] - As a good practice for test and PoC, we recommend to leave AUTOMATED_REGISTER at its default value (AUTOMATED_REGISTER="false") to prevent that a failed primary instance automatically registers as a secondary instance. After testing, if the failover scenarios work as expected, especially for production environment, we recommend to set AUTOMATED_REGISTER="true", so that after a takeover, the system replication will resume in a timely manner, to avoid disruption. When AUTOMATED_REGISTER="false", in case of a failure on the primary node, after investigation, you will need to manually register it as the secondary HANA System Replication node.

Note:
- the timeouts shown below for the resource operations are only examples and may need to be adjusted depending on the actual SAP HANA setup (for example large HANA databases can take longer to start up therefore the start timeout may have to be increased.)

4.5.1. RHEL 7.x

Below is an example command to create the SAPHana Master/Slave resource.

[root]# pcs resource create SAPHana_RH2_02 SAPHana SID=RH2 InstanceNumber=02 \
PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false \
op start timeout=3600 \
op stop timeout=3600 \
op monitor interval=61 role="Slave" timeout=700 \
op monitor interval=59 role="Master" timeout=700 \
op promote timeout=3600 \
op demote timeout=3600 \
master meta notify=true clone-max=2 clone-node-max=1 interleave=true

RHEL 7.x, when running pcs-0.9.158-6.el7, or newer, use the command below to avoid deprecation warning. More information about the change is explained in What are differences between master and --master option in pcs resource create command?.

[root]# pcs resource create SAPHana_RH2_02 SAPHana SID=RH2 InstanceNumber=02 \
PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false \
op start timeout=3600 \
op stop timeout=3600 \
op monitor interval=61 role="Slave" timeout=700 \
op monitor interval=59 role="Master" timeout=700 \
op promote timeout=3600 \
op demote timeout=3600 \
master notify=true clone-max=2 clone-node-max=1 interleave=true

Resulting resource should look like the following.

[root]# pcs resource show SAPHana_RH2_02-master
     Master: SAPHana_RH2_02-master
      Meta Attrs: clone-max=2 clone-node-max=1 interleave=true notify=true
      Resource: SAPHana_RH2_02 (class=ocf provider=heartbeat type=SAPHana)
       Attributes: AUTOMATED_REGISTER=false DUPLICATE_PRIMARY_TIMEOUT=7200 InstanceNumber=02 PREFER_SITE_TAKEOVER=true SID=RH2
       Operations: demote interval=0s timeout=3600 (SAPHana_RH2_02-demote-interval-0s)
                   methods interval=0s timeout=5 (SAPHana_RH2_02-methods-interval-0s)
                   monitor interval=61 role=Slave timeout=700 (SAPHana_RH2_02-monitor-interval-61)
                   monitor interval=59 role=Master timeout=700 (SAPHana_RH2_02-monitor-interval-59)
                   promote interval=0s timeout=3600 (SAPHana_RH2_02-promote-interval-0s)
                   start interval=0s timeout=3600 (SAPHana_RH2_02-start-interval-0s)
                   stop interval=0s timeout=3600 (SAPHana_RH2_02-stop-interval-0s)

Once the resource is started it will add additional node attributes describing the current state of SAP HANA databases on nodes as seen below.

[root]# crm_mon -A1
...
Node Attributes:
* Node node1:
    + hana_rh2_clone_state              : PROMOTED
    + hana_rh2_op_mode                  : delta_datashipping
    + hana_rh2_remoteHost               : node2
    + hana_rh2_roles                    : 4:S:master1:master:worker:master
    + hana_rh2_site                     : DC1
    + hana_rh2_sync_state               : PRIM
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node1
    + lpa_rh2_lpt                       : 1495204085
    + master-hana                       : 150
* Node node2:
    + hana_rh2_clone_state              : DEMOTED
    + hana_rh2_remoteHost               : node1
    + hana_rh2_roles                    : 4:P:master1:master:worker:master
    + hana_rh2_site                     : DC2
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_sync_state               : SOK
    + hana_rh2_vhost                    : node2
    + lpa_rh2_lpt                       : 30
    + master-hana                       : 100
...

4.6. Create Virtual IP address resource

Cluster will contain Virtual IP address in order to reach the Master instance of SAP HANA. Below is example command to create IPaddr2 resource with IP 192.168.0.15.

[root]# pcs resource create vip_RH2_02 IPaddr2 ip="192.168.0.15"

Resulting resource should look like one below.

[root]# pcs resource show vip_RH2_02

 Resource: vip_RH2_02 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.0.15
  Operations: start interval=0s timeout=20s (vip_RH2_02-start-interval-0s)
              stop interval=0s timeout=20s (vip_RH2_02-stop-interval-0s)
              monitor interval=10s timeout=20s (vip_RH2_02-monitor-interval-10s)

4.7. Create constraints

For correct operation we need to ensure that SAPHanaTopology resources are started before starting the SAPHana resources and also that the virtual IP address is present on the node where the Master resource of SAPHana is running. To achieve this, the following 2 constraints need to be created.

4.7.1 RHEL 7.x

4.7.1.1 constraint - start `SAPHanaTopology` before `SAPHana`

Example command below will create the constraint that mandates the start order of these resources. There are 2 things worth mentioning here:

symmetrical=false attribute defines that we care only about the start of resources and they don't need to be stopped in reverse order.
Both resources (SAPHana and SAPHanaTopology) have the attribute interleave=true that allows parallel start of these resources on nodes. This permits that despite of ordering we will not wait for all nodes to start SAPHanaTopology but we can start the SAPHana resource on any of nodes as soon as SAPHanaTopology is running there.

Command for creating the constraint:

[root]# pcs constraint order SAPHanaTopology_RH2_02-clone then SAPHana_RH2_02-master symmetrical=false

The resulting constraint should look like the one in the example below.

[root]# pcs constraint
...
Ordering Constraints:
  start SAPHanaTopology_RH2_02-clone then start SAPHana_RH2_02-master (kind:Mandatory) (non-symmetrical)
...

4.7.1.2 constraint - colocate the `IPaddr2` resource with Master of `SAPHana` resource

Below is an example command that will colocate the IPaddr2 resource with SAPHana resource that was promoted as Master.

[root]# pcs constraint colocation add vip_RH2_02 with master SAPHana_RH2_02-master 2000

Note that the constraint is using a score of 2000 instead of the default INFINITY. This allows the IPaddr2 resource to be taken down by the cluster in case there is no Master promoted in the SAPHana resource so it is still possible to use this address with tools like SAP Management Console or SAP LVM that can use this address to query the status information about the SAP Instance.

The resulting constraint should look like one in the example below.

[root]# pcs constraint
...
Colocation Constraints:
  vip_RH2_02 with SAPHana_RH2_02-master (score:2000) (rsc-role:Started) (with-rsc-role:Master)
...

4.8. Adding a secondary virtual IP address for an Active/Active (Read-Enabled) HANA System Replication setup

Starting with SAP HANA 2.0 SPS1, SAP enables 'Active/Active (Read Enabled)' setups for SAP HANA System Replication, where the secondary systems of SAP HANA system replication can be used actively for read-intensive workloads. To be able to support such setups, a second virtual IP address is required, which enables clients to access the secondary SAP HANA database. To ensure that the secondary replication site can still be accessed after a takeover has occurred, the cluster needs to move the virtual IP address around with the slave of the master/slave SAPHana resource.

Note that when establishing HSR for the read-enabled secondary configuration, the operationMode should be set to logreplay_readaccess.

4.8.1. Creating the resource for managing the secondary virtual IP address

[root]# pcs resource create vip2_RH2_02 IPaddr2 ip="192.168.1.11"

Please use the appropriate resource agent for managing the IP address based on the platform on which the cluster is running.

4.8.2. Creating location constraints to ensure that the secondary virtual IP address is placed on the right cluster node

[root]# pcs constraint location vip2_RH2_02 rule score=INFINITY hana_rh2_sync_state eq SOK and hana_rh2_roles eq 4:S:master1:master:worker:master
[root]# pcs constraint location vip2_RH2_02 rule score=2000 hana_rh2_sync_state eq PRIM and hana_rh2_roles eq 4:P:master1:master:worker:master

These location constraints ensure that the second virtual IP resource will have the following behavior:

If there is a Master/PRIMARY node and a Slave/SECONDARY node, both available, with HANA System Replication as "SOK", the second virtual IP will run on the Slave/SECONDARY node.
If the Slave/SECONDARY node is not available or the HANA System Replication is not "SOK", the secondary virtual IP will run on the Master/PRIMARY node. When the Slave/SECONDARY will be available and the HANA System Replication will be "SOK" again, the second virtual IP will move back to the Slave/SECONDARY node.
If the Master/PRIMARY node is not available or the HANA instance running there has a problem, when the Slave/SECONDARY will take over the Master/PRIMARY role, the second virtual IP will continue running on the same node until the other node will take the Slave/SECONDARY role and the HANA System Replication will be "SOK".

This maximizes the time that the second virtual IP resource will be assigned to a node where a healthy SAP HANA instance is running.

4.9. Testing the manual move of SAPHana resource to another node (SAP Hana takeover by cluster)

Test moving the SAPHana resource from one node to another

4.9.1. Moving SAPHana resource on RHEL 7

Use the command below on RHEL 7. Note that the option --master should not be used when running the below command due to the way the SAPHana resource works internally.

[root]# pcs resource move SAPHana_RH2_02-master

With each pcs resource move command invocation, the cluster creates location constraints to cause the resource to move. These constraints must be removed after it has been verified that the HANA System Replication takeover has been completed in order to allow the cluster to manage the former primary HANA instance again. To remove the constraints created by the move, run the command below.

[root]# pcs resource clear SAPHana_RH2_02-master

Contents