Automating SAP HANA Scale-Up System Replication using the RHEL HA Add-On

Updated -

Contents

1. Overview

This article describes how to configure Automated HANA System Replication in Scale-Up in a Pacemaker cluster on supported RHEL releases.

This article does NOT cover preparation of a RHEL system for SAP HANA installation nor the SAP HANA installation procedure. For more details on these topics refer to SAP Note 2009879 - SAP HANA Guidelines for RedHat Enterprise Linux (RHEL).

1.1. Supported scenarios

See: Support Policies for RHEL High Availability Clusters - Management of SAP HANA in a Cluster

1.2. Subscription and Repos

The following repos are required:

RHEL 7.x
- RHEL Server: provides the RHEL kernel packages
- RHEL HA Add-On: provides the Pacemaker framework
- RHEL for SAP HANA: provides the resource agents for the automation of HANA System Replication in Scale-Up

RHEL 8.x
- RHEL BaseOS: provides the RHEL kernel packages
- RHEL AppStream: provides all the applications you might want to run in a given userspace
- RHEL High Availability: provides the Pacemaker framework
- RHEL for SAP Solutions: provides the resource agents for the automation of HANA System Replication in Scale-Up

1.2.1. On-Premise or Bring Your Own Subscription through Cloud Access

For on-premise or Bring Your Own Subscription through Red Hat Cloud Access, the subscription to use is RHEL for SAP Solutions.

RHEL 7.x: below is the example of repos enabled with RHEL for SAP Solutions 7.6, on-premise or through Cloud Access:

# yum repolist
repo id                                                  repo name                                                                                                status
rhel-7-server-e4s-rpms/7Server/x86_64                    Red Hat Enterprise Linux 7 Server - Update Services for SAP Solutions (RPMs)                             18,929
rhel-ha-for-rhel-7-server-e4s-rpms/7Server/x86_64        Red Hat Enterprise Linux High Availability (for RHEL 7 Server) Update Services for SAP Solutions (RPMs)     437
rhel-sap-hana-for-rhel-7-server-e4s-rpms/7Server/x86_64  RHEL for SAP HANA (for RHEL 7 Server) Update Services for SAP Solutions (RPMs)                               38

RHEL 8.x x86_64: below is the the example of repos enabled with RHEL for SAP Solutions 8.0, on-premise or through Cloud Access:

# yum repolist
repo id                                                  repo name                                    status
rhel-8-for-x86_64-appstream-rpms        Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)       8,603
rhel-8-for-x86_64-baseos-rpms           Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)          3,690
rhel-8-for-x86_64-highavailability-rpms Red Hat Enterprise Linux 8 for x86_64 - High Availability (RPM   156
rhel-8-for-x86_64-sap-solutions-rpms    Red Hat Enterprise Linux 8 for x86_64 - SAP Solutions (RPMs)      10

RHEL 8.x power9: below is the the example of repos enabled with RHEL for SAP Solutions 8.0 on power9:

# yum repolist
repo id                                       repo name                                                                                                         status
rhel-8-for-ppc64le-appstream-e4s-rpms         Red Hat Enterprise Linux 8 for Power, little endian - AppStream - Update Services for SAP Solutions (RPMs)         4,949
rhel-8-for-ppc64le-baseos-e4s-rpms            Red Hat Enterprise Linux 8 for Power, little endian - BaseOS - Update Services for SAP Solutions (RPMs)            1,766
rhel-8-for-ppc64le-highavailability-e4s-rpms  Red Hat Enterprise Linux 8 for Power, little endian - High Availability - Update Services for SAP Solutions (RPMs)    71
rhel-8-for-ppc64le-sap-solutions-e4s-rpms     Red Hat Enterprise Linux 8 for Power, little endian - SAP Solutions - Update Services for SAP Solutions (RPMs)         4

1.2.2. On-Demand on Public Clouds through RHUI

For deployment in on-demand images on public cloud, the software packages are delivered in Red Hat Enterprise Linux for SAP with High Availability and Update Services, a variant of RHEL for SAP Solutions, customized for public clouds, available through RHUI.

Below is the example of repos enabled on a RHUI system with RHEL for SAP with High Availability and Update Services 7.5. For configuration of Automated HANA System Replication in Scale-Up, the following repos must present:

# yum repolist
repo id                                                        repo name                                                                  status
rhui-rhel-7-server-rhui-eus-rpms/7.5/x86_64                   Red Hat Enterprise Linux 7 Server - Extended Update Support (RPMs) from RH 21,199
rhui-rhel-ha-for-rhel-7-server-eus-rhui-rpms/7.5/x86_64       Red Hat Enterprise Linux High Availability from RHUI (for RHEL 7 Server) -    501
rhui-rhel-sap-hana-for-rhel-7-server-eus-rhui-rpms/7.5/x86_64 RHEL for SAP HANA (for RHEL 7 Server) Extended Update Support (RPMs) from      43

2. SAP HANA System Replication

The following example shows how to set up system replication between 2 nodes running SAP HANA.

Configuration used in the example:

SID:                   RH2
Instance Number:       02
node1 FQDN:            node1.example.com
node2 FQDN:            node2.example.com
node1 HANA site name:  DC1
node2 HANA site name:  DC2
SAP HANA 'SYSTEM' user password: <HANA_SYSTEM_PASSWORD>
SAP HANA administrative user:    rh2adm

Ensure that both systems can resolve the FQDN of both systems without issues. To ensure that FQDNs can be resolved even without DNS you can place them into /etc/hosts like in the example below.

# /etc/hosts
192.168.0.11 node1.example.com node1
192.168.0.12 node2.example.com node2

For the system replication to work, the SAP HANA log_mode variable must be set to normal. This can be verified as HANA system user using the command below on both nodes.

[rh2adm]# hdbsql -u system -p <HANA_SYSTEM_PASSWORD> -i 02 "select value from "SYS"."M_INIFILE_CONTENTS" where key='log_mode'"
VALUE "normal"
1 row selected

Note that later configuration of primary and secondary node is used only during setup. The roles (primary/secondary) may change during cluster operation based on cluster configuration.

A lot of the configuration steps are performed from the SAP HANA administrative user on the system whose name was selected during installation. In examples we will use rh2adm as we use SID RH2. To become the SAP HANA administrative user you can use the command below.

[root]# sudo -i -u rh2adm
[rh2adm]#

2.1. Configure HANA primary node

SAP HANA system replication will only work after initial backup has been performed. The following command will create an initial backup in /tmp/foo directory. Please note that the size of the backup depends on the database size and may take some time to complete. The directory to which the backup will be placed must by writeable by the SAP HANA administrative user.

a) On single container systems following command can be used for backup:

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "BACKUP DATA USING FILE ('/tmp/foo')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

b) On multiple container systems (MDC) SYSTEMDB and all tenant databases needs to be backed up:

Example below is on the backup of SYSTEMDB. Please check SAP documentation on how to backup tenant databases.

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> -d SYSTEMDB "BACKUP DATA USING FILE ('/tmp/foo')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> -d SYSTEMDB "BACKUP DATA FOR RH2 USING FILE ('/tmp/foo-RH2')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

After the initial backup, initialize the replication using the command below.

[rh2adm]# hdbnsutil -sr_enable --name=DC1
checking for active nameserver ...
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.

Verify that initialization is showing current node as 'primary' and that SAP HANA is running on it.

[rh2adm]# hdbnsutil -sr_state
checking for active or inactive nameserver ...
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: primary
site id: 1
site name: DC1
Host Mappings:

2.2. Configure HANA secondary node

Secondary node needs to be registered to, now running, primary node. SAP HANA on the secondary node must be shut down before using the command bellow.

[rh2adm]# HDB stop

(SAP HANA2.0 only) Copy the SAP HANA system PKI SSFS_RH2.KEY and SSFS_RH2.DAT files from primary node to secondary node.

[rh2adm]# scp root@node1:/usr/sap/RH2/SYS/global/security/rsecssfs/key/SSFS_RH2.KEY /usr/sap/RH2/SYS/global/security/rsecssfs/key/SSFS_RH2.KEY
[rh2adm]# scp root@node1:/usr/sap/RH2/SYS/global/security/rsecssfs/data/SSFS_RH2.DAT /usr/sap/RH2/SYS/global/security/rsecssfs/data/SSFS_RH2.DAT

To register secondary node use the command below.

[rh2adm]# hdbnsutil -sr_register --remoteHost=node1 --remoteInstance=02 --replicationMode=syncmem --name=DC2
adding site ...
checking for inactive nameserver ...
nameserver node2:30201 not responding.
collecting information ...
updating local ini files ...
done.

Start SAP HANA on the secondary node.

[rh2adm]# HDB start

Verify that the secondary node is running and that 'mode' is syncmem. Output should look similar to the output below.

[rh2adm]# hdbnsutil -sr_state
checking for active or inactive nameserver ...

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: syncmem
site id: 2
site name: DC2
active primary site: 1

Host Mappings:
~~~~~~~~~~~~~~
node2 -> [DC1] node1
node2 -> [DC2] node2

2.3. Testing SAP HANA System Replication

To manually test the SAP HANA System Replication setup you can follow the procedure described in following SAP documents:

2.4. Checking SAP HANA System Replication state

To check the current state of SAP HANA System Replication you can execute the following command as the SAP HANA administrative user on current primary SAP HANA node.

On single_container system:

[rh2adm]# python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py

| Host  | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    |
|       |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details |
| ----- | ----- | ------------ | --------- | ------- | --------- | --------- | --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- |
| node1 | 30201 | nameserver   |         1 |       1 | DC1       | node2     |     30201 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| node1 | 30207 | xsengine     |         2 |       1 | DC1       | node2     |     30207 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| node1 | 30203 | indexserver  |         3 |       1 | DC1       | node2     |     30203 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

On multiple_containers system (MDC):

[rh2adm]# python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py
| Database | Host  | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    |
|          |       |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details |
| -------- | ----- | ----- | ------------ | --------- | ------- | --------- | ----------| --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- |
| SYSTEMDB | node1 | 30201 | nameserver   |         1 |       1 | DC1       | node2     |     30201 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| RH2      | node1 | 30207 | xsengine     |         2 |       1 | DC1       | node2     |     30207 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| RH2      | node1 | 30203 | indexserver  |         3 |       1 | DC1       | node2     |     30203 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

3. Configuring monitoring account in SAP HANA for cluster resource agents (SAP HANA 1.0 SPS12 and earlier)

Starting with SAP HANA 2.0 SPS0 monitoring account is no longer needed
A technical user with CATALOG READ and MONITOR ADMIN privileges must exist in SAP HANA for the resource agents to be able to run queries on the system replication status. The example below shows how to create such a user, assign him the correct permissions and disable password expiration for this user.

monitoring user username: rhelhasync
monitoring user password: <MONITORING_USER_PASSWORD>

3.1. Creating monitoring user

When SAP HANA System replication is active then only the primary system is able to access the database. Accessing the secondary system will fail.

On the primary system run the following commands to create the monitoring user.

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "create user rhelhasync password \"<MONITORING_USER_PASSWORD>\""
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "grant CATALOG READ to rhelhasync"
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "grant MONITOR ADMIN to rhelhasync"
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "ALTER USER rhelhasync DISABLE PASSWORD LIFETIME"

3.2. Store monitoring user credentials on all nodes

The SAP HANA userkey allows the "root" user on OS level to access SAP HANA via monitoring user without asking for password. This is needed by resource agents so they can run queries on HANA System Replication status.

[root]# /usr/sap/RH2/HDB02/exe/hdbuserstore SET SAPHANARH2SR localhost:30215 rhelhasync "<MONITORING_USER_PASSWORD>"

To verify that the userkey has been created correctly in root's userstore, you can run hdbuserstore list command on each node and check if the monitoring account is present in the output as shown below:

[root]# /usr/sap/RH2/HDB02/exe/hdbuserstore list

DATA FILE      :  /root/.hdb/node1/SSFS_HDB.DAT
KEY FILE       :  /root/.hdb/node1/SSFS_HDB.KEY

KEY SAPHANARH2SR
  ENV : localhost:30215
  USER: rhelhasync

Please also verify that it is possible to run hdbsql commands as root using the SAPHANASR userkey without being prompted for a password by running the following command on the primary node of the SAP HANA SR setup:

[root]# /usr/sap/RH2/HDB02/exe/hdbsql -U SAPHANARH2SR -i 02 "select distinct REPLICATION_STATUS from SYS.M_SERVICE_REPLICATION"
REPLICATION_STATUS
"ACTIVE"
1 row selected

If you get an error message about issues with the password or if you are prompted for a password please verify with hdbsql command or HANA Studio that the password for the user created with the hdbsql commands above is not configured 'to be changed on first login' or that the password has not expired. You can use the command below.
(Note: be sure to use the name of monitoring user in capital letters)

[root]# /usr/sap/RH2/HDB02/exe/hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "select * from sys.users where USER_NAME='RHELHASYNC'"

USER_NAME,USER_ID,USER_MODE,EXTERNAL_IDENTITY,CREATOR,CREATE_TIME,VALID_FROM,VALID_UNTIL,LAST_SUCCESSFUL_CONNECT,LAST_INVALID_CONNECT_ATTEMPT,INVALID_CONNECT_A
TTEMPTS,ADMIN_GIVEN_PASSWORD,LAST_PASSWORD_CHANGE_TIME,PASSWORD_CHANGE_NEEDED,IS_PASSWORD_LIFETIME_CHECK_ENABLED,USER_DEACTIVATED,DEACTIVATION_TIME,IS_PASSWORD
_ENABLED,IS_KERBEROS_ENABLED,IS_SAML_ENABLED,IS_X509_ENABLED,IS_SAP_LOGON_TICKET_ENABLED,IS_SAP_ASSERTION_TICKET_ENABLED,IS_RESTRICTED,IS_CLIENT_CONNECT_ENABLE
D,HAS_REMOTE_USERS,PASSWORD_CHANGE_TIME
"RHELHASYNC",156529,"LOCAL",?,"SYSTEM","2017-05-12 15:10:49.971000000","2017-05-12 15:10:49.971000000",?,"2017-05-12 15:21:12.117000000",?,0,"TRUE","2017-05-12
 15:10:49.971000000","FALSE","FALSE","FALSE",?,"TRUE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","TRUE","FALSE",?
1 row selected

4. Configuring SAP HANA in a pacemaker cluster

Please refer to the following documentation to first set up a pacemaker cluster. Note that the cluster must conform to article Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH.

This guide will assume that following things are working properly:

  • Pacemaker cluster is configured according to documentation and has proper and working fencing
  • SAP HANA startup on boot is disabled on all cluster nodes as the start and stop will be managed by the cluster
  • SAP HANA system replication and takeover using tools from SAP are working properly between cluster nodes
  • Both nodes are subscribed to the required channels:
    • RHEL 7: 'High-availability' and 'RHEL for SAP HANA' (https://access.redhat.com/solutions/2334521)) channels
    • RHEL 8: 'High-availability' and 'RHEL for SAP Solutions' (https://access.redhat.com/solutions/4714781)) channels

4.1. Install resource agents and other components required for managing SAP HANA Scale-Up System Replication using the RHEL HA Add-On

[root]# yum install resource-agents-sap-hana

Note: this will only install the resource agents and additional components required to set up this HA solution. The configuration steps documented in the following sections must still be carried out for a fully operable setup that is supported by Red Hat.

4.2. Enable the SAP HANA srConnectionChanged() hook

As documented in SAP's Implementing a HA/DR Provider, recent versions of SAP HANA provide so called "hooks" that allow SAP HANA to send out notifications for certain events. The srConnectionChanged() hook can be used to improve the ability of the cluster to detect when a change in the status of the HANA System Replication occurs that requires the cluster to take action, and to avoid data loss/data corruption by preventing accidental takeovers to be triggered in situations where this should be avoided. When using SAP HANA 2.0 SPS0 or later and a version of the resource-agents-sap-hana that provides the components for supporting the srConnectionChanged() hook it is required to enable the hook before proceeding with the cluster setup.

4.2.1. Verify that a version of the resource-agents-sap-hana package is installed that provides the components to enable the srConnectionChanged() hook

Please verify that the correct version of the resource-agents-sap-hana package providing the components required to enable the srConnectionChanged() hook for your version of RHEL is installed as documented in the following article: Is the srConnectionChanged() hook supported with the Red Hat High Availability solution for SAP HANA Scale-up System Replication?

4.2.2. Activate the srConnectionChanged() hook on all SAP HANA instances

Note: the steps to activate the srConnectionChanged() hook need to be performed for each SAP HANA instance.

  1. Stop the cluster on both nodes and verify that the HANA instances are stopped completely.

    [root]# pcs cluster stop --all
    
  2. Install the hook script into the /hana/shared/myHooks directory for each HANA instance and make sure it has the correct ownership on all nodes (replace rh2adm with the username of the admin user of the HANA instances).

    [root]# mkdir -p /hana/shared/myHooks
    [root]# cp /usr/share/SAPHanaSR/srHook/SAPHanaSR.py /hana/shared/myHooks
    [root]# chown -R rh2adm:sapsys /hana/shared/myHooks
    
  3. Update the global.ini file on each node to enable use of the hook script by both HANA instances (e.g., in file /hana/shared/RH2/global/hdb/custom/config/global.ini):

    [ha_dr_provider_SAPHanaSR]
    provider = SAPHanaSR
    path = /hana/shared/myHooks
    execution_order = 1
    
    [trace]
    ha_dr_saphanasr = info
    
  4. On each cluster node create the file /etc/sudoers.d/20-saphana by running sudo visudo -f /etc/sudoers.d/20-saphana and add the contents below to allow the hook script to update the node attributes when the srConnectionChanged() hook is called.
    Replace rh2 with the lowercase SID of your HANA installation and replace DC1 and DC2 with your HANA site names.

    Cmnd_Alias DC1_SOK   = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC1 -v SOK -t crm_config -s SAPHanaSR
    Cmnd_Alias DC1_SFAIL = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC1 -v SFAIL -t crm_config -s SAPHanaSR
    Cmnd_Alias DC2_SOK   = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC2 -v SOK -t crm_config -s SAPHanaSR
    Cmnd_Alias DC2_SFAIL = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC2 -v SFAIL -t crm_config -s SAPHanaSR
    rh2adm ALL=(ALL) NOPASSWD: DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL
    Defaults!DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL !requiretty
    

    For further information on why the Defaults setting is needed see The srHook attribute is set to SFAIL in a Pacemaker cluster managing SAP HANA system replication, even though replication is in a healthy state.

  5. Start both HANA instances manually without starting the cluster.

  6. Verify that the hook script is working as expected. Perform some action to trigger the hook, such as stopping a HANA instance. Then check whether the hook logged anything using a method such as the one below.

    [rh2adm]# cdtrace
    [rh2adm]# awk '/ha_dr_SAPHanaSR.*crm_attribute/ { printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_*
    2018-05-04 12:34:04.476445 ha_dr_SAPHanaSR SFAIL
    2018-05-04 12:53:06.316973 ha_dr_SAPHanaSR SOK
    [rh2adm]# grep ha_dr_ *
    

    Note: For more information please check SAP doc Install and Configure a HA/DR Provider Script.

  7. When the functionality of the hook has been verified the cluster can be started again.

    [root]# pcs cluster start --all
    

4.3. Configure general cluster properties

To avoid unnecessary failovers of the resources during initial testing and post production, set the following default values for the resource-stickiness and migration-threshold parameters. Note that defaults do not apply to resources which override them with their own defined values.

[root]# pcs resource defaults resource-stickiness=1000
[root]# pcs resource defaults migration-threshold=5000

Warning: As of RHEL 8.4 (pcs-0.10.8-1.el8), the commands above are deprecated. Use the commands below:

[root]# pcs resource defaults update resource-stickiness=1000
[root]# pcs resource defaults update migration-threshold=5000

Notes:
1. It is sufficient to run the commands above on one node of the cluster.
2. Previous versions of this document recommended setting these defaults for the initial testing of the cluster setup, but removing them after production. Due to customer feedback and additional testing, it has been determined that it is beneficial to use these defaults for production cluster setups as well.
3. The command resource-stickiness=1000 will encourage the resource to stay running where it is, while migration-threshold=5000 will cause the resource to move to a new node after 5000 failures. 5000 is generally sufficient in preventing the resource from prematurely failing over to another node. This also ensures that the resource failover time stays within a controllable limit.

Previous versions of this guide recommended setting the no-quorum-policy to ignore, which is currently NOT supported. In the default configuration the no-quorum policy property of the cluster does not need to be modified. To achieve the behavior provided by this option see Can I configure pacemaker to continue to manage resources after a loss of quorum in RHEL 6 or 7?

4.4. Create cloned SAPHanaTopology resource

SAPHanaTopology resource gathers status and configuration of SAP HANA System Replication on each node. In addition, it starts and monitors the local SAP HostAgent which is required for starting, stopping, and monitoring the SAP HANA instances. It has the following attributes:

Attribute Name Required? Default value Description
SID yes null The SAP System Identifier (SID) of the SAP HANA installation (must be identical for all nodes). Example: RH2
InstanceNumber yes null The Instance Number of the SAP HANA installation (must be identical for all nodes). Example: 02

Below is an example command to create the SAPHanaTopology cloned resource.

Note: the timeouts shown below for the resource operations are only examples and may need to be adjusted depending on the actual SAP HANA setup (for example large HANA databases can take longer to start up therefore the start timeout may have to be increased.)

[root]# pcs resource create SAPHanaTopology_RH2_02 SAPHanaTopology SID=RH2 InstanceNumber=02 \
op start timeout=600 \
op stop timeout=300 \
op monitor interval=10 timeout=600 \
clone clone-max=2 clone-node-max=1 interleave=true

Resulting resource should look like the following.

[root]# pcs resource show SAPHanaTopology_RH2_02-clone

 Clone: SAPHanaTopology_RH2_02-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: SAPHanaTopology_RH2_02 (class=ocf provider=heartbeat type=SAPHanaTopology)
   Attributes: SID=RH2 InstanceNumber=02
   Operations: start interval=0s timeout=600 (SAPHanaTopology_RH2_02-start-interval-0s)
               stop interval=0s timeout=300 (SAPHanaTopology_RH2_02-stop-interval-0s)
               monitor interval=10 timeout=600 (SAPHanaTopology_RH2_02-monitor-interval-10s)

Once the resource is started you will see the collected information stored in the form of node attributes that can be viewed with the command crm_mon -A1. Below is an example of what attributes can look like when only SAPHanaTopology is started.

[root]# crm_mon -A1
...
Node Attributes:
* Node node1:
    + hana_rh2_remoteHost               : node2
    + hana_rh2_roles                    : 1:P:master1::worker:
    + hana_rh2_site                     : DC1
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node1
* Node node2:
    + hana_rh2_remoteHost               : node1
    + hana_rh2_roles                    : 1:S:master1::worker:
    + hana_rh2_site                     : DC2
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node2
...

4.5. Create Master/Slave SAPHana resource

The SAPHana resource agent manages two SAP HANA instances (databases) that are configured in HANA System Replication.

Attribute Name Required? Default value Description
SID yes null The SAP System Identifier (SID) of the SAP HANA installation (must be identical for all nodes). Example: RH2
InstanceNumber yes null The Instance Number of the SAP HANA installation (must be identical for all nodes). Example: 02
PREFER_SITE_TAKEOVER no null Should resource agent prefer to switch over to the secondary instance instead of restarting primary locally? true: do prefer takeover to the secondary site; false: do prefer restart locally; never: under no circumstances do a takeover to the other node
AUTOMATED_REGISTER no false If a takeover event has occurred, and the DUPLICATE_PRIMARY_TIMEOUT has expired, should the former primary instance be registered as secondary? ("false": no, manual intervention will be needed; "true": yes, the former primary will be registered by resource agent as secondary) [1]
DUPLICATE_PRIMARY_TIMEOUT no 7200 The time difference (in seconds) needed between two primary time stamps, if a dual-primary situation occurs. If the time difference is less than the time gap, the cluster will hold one or both instances in a "WAITING" status. This is to give the system admin a chance to react to a takeover. After the time difference has passed, if AUTOMATED_REGISTER is set to true, the failed former primary will be registered as secondary. After the registration to the new primary, all data on the former primary will be overwritten by the system replication.

[1] - As a good practice for test and PoC, we recommend to leave AUTOMATED_REGISTER at its default value (AUTOMATED_REGISTER="false") to prevent that a failed primary instance automatically registers as a secondary instance. After testing, if the failover scenarios work as expected, especially for production environment, we recommend to set AUTOMATED_REGISTER="true", so that after a takeover, the system replication will resume in a timely manner, to avoid disruption. When AUTOMATED_REGISTER="false", in case of a failure on the primary node, after investigation, you will need to manually register it as the secondary HANA System Replication node.

Note:
- Please note the slight difference in the configuration of resource between RHEL 7.x and RHEL 8.x
- the timeouts shown below for the resource operations are only examples and may need to be adjusted depending on the actual SAP HANA setup (for example large HANA databases can take longer to start up therefore the start timeout may have to be increased.)

4.5.1. RHEL 7.x

Below is an example command to create the SAPHana Master/Slave resource.

[root]# pcs resource create SAPHana_RH2_02 SAPHana SID=RH2 InstanceNumber=02 \
PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false \
op start timeout=3600 \
op stop timeout=3600 \
op monitor interval=61 role="Slave" timeout=700 \
op monitor interval=59 role="Master" timeout=700 \
op promote timeout=3600 \
op demote timeout=3600 \
master meta notify=true clone-max=2 clone-node-max=1 interleave=true

RHEL 7.x, when running pcs-0.9.158-6.el7, or newer, use the command below to avoid deprecation warning. More information about the change is explained in What are differences between master and --master option in pcs resource create command?.

[root]# pcs resource create SAPHana_RH2_02 SAPHana SID=RH2 InstanceNumber=02 \
PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false \
op start timeout=3600 \
op stop timeout=3600 \
op monitor interval=61 role="Slave" timeout=700 \
op monitor interval=59 role="Master" timeout=700 \
op promote timeout=3600 \
op demote timeout=3600 \
master notify=true clone-max=2 clone-node-max=1 interleave=true

Resulting resource should look like the following.

[root]# pcs resource show SAPHana_RH2_02-master
     Master: SAPHana_RH2_02-master
      Meta Attrs: clone-max=2 clone-node-max=1 interleave=true notify=true
      Resource: SAPHana_RH2_02 (class=ocf provider=heartbeat type=SAPHana)
       Attributes: AUTOMATED_REGISTER=false DUPLICATE_PRIMARY_TIMEOUT=7200 InstanceNumber=02 PREFER_SITE_TAKEOVER=true SID=RH2
       Operations: demote interval=0s timeout=3600 (SAPHana_RH2_02-demote-interval-0s)
                   methods interval=0s timeout=5 (SAPHana_RH2_02-methods-interval-0s)
                   monitor interval=61 role=Slave timeout=700 (SAPHana_RH2_02-monitor-interval-61)
                   monitor interval=59 role=Master timeout=700 (SAPHana_RH2_02-monitor-interval-59)
                   promote interval=0s timeout=3600 (SAPHana_RH2_02-promote-interval-0s)
                   start interval=0s timeout=3600 (SAPHana_RH2_02-start-interval-0s)
                   stop interval=0s timeout=3600 (SAPHana_RH2_02-stop-interval-0s)

Once the resource is started it will add additional node attributes describing the current state of SAP HANA databases on nodes as seen below.

[root]# crm_mon -A1
...
Node Attributes:
* Node node1:
    + hana_rh2_clone_state              : PROMOTED
    + hana_rh2_op_mode                  : delta_datashipping
    + hana_rh2_remoteHost               : node2
    + hana_rh2_roles                    : 4:S:master1:master:worker:master
    + hana_rh2_site                     : DC1
    + hana_rh2_sync_state               : PRIM
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node1
    + lpa_rh2_lpt                       : 1495204085
    + master-hana                       : 150
* Node node2:
    + hana_rh2_clone_state              : DEMOTED
    + hana_rh2_remoteHost               : node1
    + hana_rh2_roles                    : 4:P:master1:master:worker:master
    + hana_rh2_site                     : DC2
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_sync_state               : SOK
    + hana_rh2_vhost                    : node2
    + lpa_rh2_lpt                       : 30
    + master-hana                       : 100
...

4.5.2. RHEL 8.x

In RHEL 8.x, there is slight change in the command to create the SAPHana resource agent. The official documentation on configuring promotable clone resources in RHEL 8 can be found here.

[root]# pcs resource create SAPHana_RH2_02 SAPHana SID=RH2 InstanceNumber=02 \
PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=true \
op start timeout=3600 \
op stop timeout=3600 \
op monitor interval=61 role="Slave" timeout=700 \
op monitor interval=59 role="Master" timeout=700 \
op promote timeout=3600 \
op demote timeout=3600 \
promotable notify=true clone-max=2 clone-node-max=1 interleave=true

Resulting resource should look like the following.

[root]# pcs resource config SAPHana_RH2_02
 Clone: SAPHana_RH2_02-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true notify=true promotable=true
  Resource: SAPHana_RH2_02 (class=ocf provider=heartbeat type=SAPHana)
   Attributes: AUTOMATED_REGISTER=true DUPLICATE_PRIMARY_TIMEOUT=180 InstanceNumber=02 PREFER_SITE_TAKEOVER=true SID=RH2
   Operations: demote interval=0s timeout=3600 (SAPHana_RH2_02-demote-interval-0s)
               methods interval=0s timeout=5 (SAPHana_RH2_02-methods-interval-0s)
               monitor interval=61 role=Slave timeout=700 (SAPHana_RH2_02-monitor-interval-61)
               monitor interval=59 role=Master timeout=700 (SAPHana_RH2_02-monitor-interval-59)
               promote interval=0s timeout=3600 (SAPHana_RH2_02-promote-interval-0s)
               start interval=0s timeout=3600 (SAPHana_RH2_02-start-interval-0s)
               stop interval=0s timeout=3600 (SAPHana_RH2_02-stop-interval-0s)

4.6. Create Virtual IP address resource

Cluster will contain Virtual IP address in order to reach the Master instance of SAP HANA. Below is example command to create IPaddr2 resource with IP 192.168.0.15.

[root]# pcs resource create vip_RH2_02 IPaddr2 ip="192.168.0.15"

Resulting resource should look like one below.

[root]# pcs resource show vip_RH2_02

 Resource: vip_RH2_02 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.0.15
  Operations: start interval=0s timeout=20s (vip_RH2_02-start-interval-0s)
              stop interval=0s timeout=20s (vip_RH2_02-stop-interval-0s)
              monitor interval=10s timeout=20s (vip_RH2_02-monitor-interval-10s)

4.7. Create constraints

For correct operation we need to ensure that SAPHanaTopology resources are started before starting the SAPHana resources and also that the virtual IP address is present on the node where the Master resource of SAPHana is running. To achieve this, the following 2 constraints need to be created.

4.7.1 RHEL 7.x

4.7.1.1 constraint - start SAPHanaTopology before SAPHana

Example command below will create the constraint that mandates the start order of these resources. There are 2 things worth mentioning here:

  • symmetrical=false attribute defines that we care only about the start of resources and they don't need to be stopped in reverse order.
  • Both resources (SAPHana and SAPHanaTopology) have the attribute interleave=true that allows parallel start of these resources on nodes. This permits that despite of ordering we will not wait for all nodes to start SAPHanaTopology but we can start the SAPHana resource on any of nodes as soon as SAPHanaTopology is running there.

Command for creating the constraint:

[root]# pcs constraint order SAPHanaTopology_RH2_02-clone then SAPHana_RH2_02-master symmetrical=false

The resulting constraint should look like the one in the example below.

[root]# pcs constraint
...
Ordering Constraints:
  start SAPHanaTopology_RH2_02-clone then start SAPHana_RH2_02-master (kind:Mandatory) (non-symmetrical)
...
4.7.1.2 constraint - colocate the IPaddr2 resource with Master of SAPHana resource

Below is an example command that will colocate the IPaddr2 resource with SAPHana resource that was promoted as Master.

[root]# pcs constraint colocation add vip_RH2_02 with master SAPHana_RH2_02-master 2000

Note that the constraint is using a score of 2000 instead of the default INFINITY. This allows the IPaddr2 resource to be taken down by the cluster in case there is no Master promoted in the SAPHana resource so it is still possible to use this address with tools like SAP Management Console or SAP LVM that can use this address to query the status information about the SAP Instance.

The resulting constraint should look like one in the example below.

[root]# pcs constraint
...
Colocation Constraints:
  vip_RH2_02 with SAPHana_RH2_02-master (score:2000) (rsc-role:Started) (with-rsc-role:Master)
...

4.7.2 RHEL 8.x

4.7.2.1 constraint - start SAPHanaTopology before SAPHana

Example command below will create the constraint that mandates the start order of these resources. There are 2 things worth mentioning here:

  • symmetrical=false attribute defines that we care only about the start of resources and they don't need to be stopped in reverse order.
  • Both resources (SAPHana and SAPHanaTopology) have the attribute interleave=true that allows parallel start of these resources on nodes. This permits that despite of ordering we will not wait for all nodes to start SAPHanaTopology but we can start the SAPHana resource on any of nodes as soon as SAPHanaTopology is running there.

Command for creating the constraint:

[root]# pcs constraint order SAPHanaTopology_RH2_02-clone then SAPHana_RH2_02-clone symmetrical=false

The resulting constraint should look like the one in the example below.

[root]# pcs constraint
...
Ordering Constraints:
  start SAPHanaTopology_RH2_02-clone then start SAPHana_RH2_02-clone (kind:Mandatory) (non-symmetrical)
...
4.7.2.2 constraint - colocate the IPaddr2 resource with Master of SAPHana resource

Below is an example command that will colocate the IPaddr2 resource with SAPHana resource that was promoted as Master.

[root]# pcs constraint colocation add vip_RH2_02 with master SAPHana_RH2_02-clone 2000

Note that the constraint is using a score of 2000 instead of the default INFINITY. This allows the IPaddr2 resource to be taken down by the cluster in case there is no Master promoted in the SAPHana resource so it is still possible to use this address with tools like SAP Management Console or SAP LVM that can use this address to query the status information about the SAP Instance.

The resulting constraint should look like one in the example below.

[root]# pcs constraint
...
Colocation Constraints:
  vip_RH2_02 with SAPHana_RH2_02-clone (score:2000) (rsc-role:Started) (with-rsc-role:Master)
...

4.8. Adding a secondary virtual IP address for an Active/Active (Read-Enabled) HANA System Replication setup

Starting with SAP HANA 2.0 SPS1, SAP enables 'Active/Active (Read Enabled)' setups for SAP HANA System Replication, where the secondary systems of SAP HANA system replication can be used actively for read-intensive workloads. To be able to support such setups, a second virtual IP address is required, which enables clients to access the secondary SAP HANA database. To ensure that the secondary replication site can still be accessed after a takeover has occurred, the cluster needs to move the virtual IP address around with the slave of the master/slave SAPHana resource.

Note that when establishing HSR for the read-enabled secondary configuration, the operationMode should be set to logreplay_readaccess.

4.8.1. Creating the resource for managing the secondary virtual IP address

[root]# pcs resource create vip2_RH2_02 IPaddr2 ip="192.168.1.11"

Please use the appropriate resource agent for managing the IP address based on the platform on which the cluster is running.

4.8.2. Creating location constraints to ensure that the secondary virtual IP address is placed on the right cluster node

[root]# pcs constraint location vip2_RH2_02 rule score=INFINITY hana_rh2_sync_state eq SOK and hana_rh2_roles eq 4:S:master1:master:worker:master
[root]# pcs constraint location vip2_RH2_02 rule score=2000 hana_rh2_sync_state eq PRIM and hana_rh2_roles eq 4:P:master1:master:worker:master

These location constraints ensure that the second virtual IP resource will have the following behavior:

  • If there is a Master/PRIMARY node and a Slave/SECONDARY node, both available, with HANA System Replication as "SOK", the second virtual IP will run on the Slave/SECONDARY node.

  • If the Slave/SECONDARY node is not available or the HANA System Replication is not "SOK", the secondary virtual IP will run on the Master/PRIMARY node. When the Slave/SECONDARY will be available and the HANA System Replication will be "SOK" again, the second virtual IP will move back to the Slave/SECONDARY node.

  • If the Master/PRIMARY node is not available or the HANA instance running there has a problem, when the Slave/SECONDARY will take over the Master/PRIMARY role, the second virtual IP will continue running on the same node until the other node will take the Slave/SECONDARY role and the HANA System Replication will be "SOK".

This maximizes the time that the second virtual IP resource will be assigned to a node where a healthy SAP HANA instance is running.

4.9. Testing the manual move of SAPHana resource to another node (SAP Hana takeover by cluster)

Test moving the SAPHana resource from one node to another

4.9.1. Moving SAPHana resource on RHEL 7

Use the command below on RHEL 7. Note that the option --master should not be used when running the below command due to the way the SAPHana resource works internally.

[root]# pcs resource move SAPHana_RH2_02-master

With each pcs resource move command invocation, the cluster creates location constraints to cause the resource to move. These constraints must be removed in order to allow automatic failover in the future. To remove the constraints created by the move, run the command below.

[root]# pcs resource clear SAPHana_RH2_02-master

4.9.2. Moving SAPHana resource on RHEL 8

On RHEL 8, the equivalent of the pcs command for RHEL 7 fails due to a change on pcs behavior. To perform the failover on RHEL 8, run the following command.

[root]# crm_resource --move --resource SAPHana_RH2_02-clone

With each pcs resource move command invocation, the cluster creates location constraints to cause the resource to move. These constraints must be removed in order to allow automatic failover in the future. To remove the constraints created by the move, run the command below.

[root]# pcs resource clear SAPHana_RH2_02-clone

76 Comments

How are RHEL version requirements (a la RHEL7.3 ceiling) realized with RHUI repositories? Last I checked, a setting a release level was not permitted with RHUI.

Hello John, standard ccsp certificate contains for example EUS repos - those have RHEL versions. So customers need to sync those repos into their RHUIs and then create client package with repo paths pointing to concrete versions. It is possible to hand modify pregenerated client package to have paths containing $releasever instead of hard coded version - from RHUI 3.0.5 version (iirc) there is even script rhui-set-release present in every RHUI generated client package that enables clients to set different releasever. I've written KB article on how to "manually" create you client package: https://access.redhat.com/articles/4070201 that notes also $releasever.

Bottomline: It is possible to change versions, but you need to know what you are doing - eg. changing version to non-existing repo will break your client.

Hey, the following command is wrong in the documentation:

pcs constraint colocation add vip2_SAPHana_RH2_02 with slave msl_rsc_SAPHana_RH2_02 2000

it should be:

pcs constraint colocation add vip2_SAPHana_RH2_02 with slave rsc_SAPHana_H13_HDB00-master 2000

I think, this is a part of an older documentation u used before.

Thomas

Hi Thomas, I am implementing HA Cluster for an Active-Active HANA scenario, I believe the command you are putting as correct is also wrong, maybe a typo, I guess it should be -as in the most up-to-date documentation-:

[root]# pcs constraint colocation add vip2_RH2_02 with slave SAPHana_RH2_02-clone 2000

( so neither "-master" resource nor SAPHana Master/Slave resource )

Could you confirm ?

2.1 section, both commands just taking SYSTEMDB backups - not any Tenants

[rh2adm]# hdbsql -i 02 -u system -p -d SYSTEMDB "BACKUP DATA USING FILE ('/tmp/foo')" 0 rows affected (overall time xx.xxx sec; server time xx.xxx sec) [rh2adm]# hdbsql -i 02 -u system -p -d SYSTEMDB "BACKUP DATA FOR RH2 USING FILE ('/tmp/foo-RH2')"

0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

Manojkumar

Hi, in our setup we had to increase the start timeout of the Hana instance. Otherwise a move of the Hana database resource would fail because the cluster seems only to report the slave instance as running, when the database sync of all the individual tenants are active again. This is in my opinion not very ideal. I think the sync runs in the background, whereas the database status of the slave instances is nevertheless reported online. After all, what timeout should I choose. The sync time depends on the size of the database, network speed and so on. And a start timeout of hours does not make any sense.

We found out that failover from Primary to Secondary takes VIP immediately (pcs node standby NodeA) when DB on Secondary is not ready yet (not restarted as read-write). So we had this extra dependency created to avoid stale connections to secondary.

Create and delete VIP to Database dependency pcs constraint order promote rsc_Hana__HDB00 then start rsc__HDB00 --force pcs constraint order remove rsc_Hana__HDB00 then start rsc__HDB00 --force

If anyone is doing this on RHEL 8 with pacemaker 2.0 please see https://gist.github.com/tosmi/d9777811d1aede10e225e263bf55119c

What is the meaning of "Scale-Up" in the title? (Automated SAP HANA System Replication in Scale-Up in pacemaker cluster) I cannot find a similar article for a Scale-Out configuration so could I assume that this article is applicable for both cases? Thanks in advance

Scale-Up means single machine with more resources vs Scale-Out many machines with less resources.

Scale-Up: vertical scaling for resources, like adding memory & CPU in same machine. Scale-Out: Horizontal scaling for resources like adding more nodes instead adding more resources in same machine.

Hey Vincenzo,

i've wrote the following article for SAP HANA Scale-Out Systems: https://access.redhat.com/sites/default/files/attachments/ha_solution_for_sap_hana_scale_out_system_replication_0.pdf

This are different configurations in each Environment. I also have developed the first versions of Ansible Playbooks which now are supported by a complete Red Hat Team. This will nearly prepare everything you need for a Scale-Out Deployment.

Ask me if you need more informations about this.

Best Regards Thomas Bludau

There is a minor correction in the awk command mentioned under section - 4.2.2.6 . It should be as below

awk '/ha_dr_SAPHanaSR.crm_attribute/{ printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_

Hi, Sreeram. I spoke with the dev team regarding this. The proposed command change makes the search a bit more restrictive and limits the HANA log files that are searched, but the existing one doesn't appear to be an error.

This also is only one of multiple options to check whether the hook is working on the HANA side, and it is not a requirement that this command must always be run successfully.

Are we missing any key piece of information here, from your perspective?

Hi, any recommended value for totem.token in implementing RHEL 8 HA for SAP HANA DB ?

The link in section 4.7.2.3 (October 14th of 2020) appears to be broken/missing. The SAP link showing how to enable HANA for Active/Read-enabled isn't loading for me. https://help.sap.com/viewer/de855a01ee2248dfb139088793f8802a/2.0.03/en-US

While enabling the SAP HANA System Replication - Choose the Operation mode as Logreplay - Read Access. You can refer to the below blogs for references: https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.04/en-US/676844172c2442f0bf6c8b080db05ae7.html

https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.04/en-US/fe5fc53706a34048bf4a3a93a5d7c866.html

https://archive.sap.com/documents/docs/DOC-56044

@Alexander Mayberry: Thanks for the notification, and sorry for the late reply.

The broken link you mentioned has now been fixed, as well as the same broken one in section 4.7.1.3.

We've also changed the all occurrences of 'Active/Read-Enabled' to 'Active/Active (Read Enabled)' to stay in Sync with the official terminology used by SAP for these kinds of setups.

Sudo configuration (4.2.2-4) defines aliases like DC1_SOK and then assigns aliases like S1_SOK...

Hello Pavel,

thanks for pointing that out.

This has now been fixed.

Regards,
Frank Danapfel
Senior Software Engineer
Red Hat SAP Alliance Technology Team

Could it be possible to add link/comment on reasoning why to use 'syncmem' instead of 'sync' as replicationMode? Our SAP guys seem to be unhappy with that...

Hello Pavel,

the hdbnsutil command documented in section 2.2 is only an example, it is also possible to use other options for the replicationMode (like 'sync') and other parameters required by the '-sr_register' option.

It is not a hard requirement to use 'syncmem' for the replicationMode when setting up SAP HANA System Replication in combination with the HA solution documented here.

Regards, Frank Danapfel Senior Software Engineer Red Hat SAP Alliance Technology Team

When I created the /etc/sudoers.d/20-saphana file, all people trying to do a "sudo" are failing with this message :

[sas-3333@Node1 ~]$ sudo -i -u root

/etc/sudoers.d/20-saphana: Alias "DCA_SOK" already defined near line 1 <<<

/etc/sudoers.d/20-saphana: Alias "DCA_SFAIL" already defined near line 2 <<<

/etc/sudoers.d/20-saphana: Alias "DCM_SOK" already defined near line 3 <<<

/etc/sudoers.d/20-saphana: Alias "DCM_SFAIL" already defined near line 4 <<<

sudo: parse error in /etc/sudoers.d/20-saphana near line 1

sudo: no valid sudoers sources found, quitting

sudo: unable to initialize policy plugin

Why is this happening ?

Hello,

without ore information about your exact configuration it won't be possible to say what is causing this issue and how to fix it. Therefore it would be better if you would open a support case via the Red Hat Customer Portal for this issue, so that the support colleagues who are familiar with sudo can have a look at this and provide guidance how to get it fixed.

Regards, Frank Danapfel Senior Software Engineer Red Hat SAP Alliance Technology Team

are you really serious ? This article is all about HA Cluster to be aware about HANA Replication, so the file which is required for the Cluster is causing this issue, I guess this is clear enough, what do you need exactly as further information ?

Hello,

yes, I'm serious. Because the comment section of this article is not the official way to get support.

If you have subscriptions for 'RHEL for SAP solutions' it shouldn't be a problem for you to open an official support case.

Since the error messages indicates issues with your /etc/sudoers.d/20-saphana file we would need to see the entire file, and posting it as a comment here might not be the best idea.

Regards, Frank Danapfel Senior Software Engineer Team Red Hat SAP Alliance Technology

how is the content of your file looks like ?

here is an example of ansible code to created this file:

- name: Create sudoers.d/20-saphana
  blockinfile:
    path: /etc/sudoers.d/20-saphana
    create: yes
    backup: yes
    block: |
      Cmnd_Alias SOK   = /usr/sbin/crm_attribute -n hana_{{ sid_uppercase }}_glob_srHook -v SOK -t crm_config -s SAPHanaSR
      Cmnd_Alias SFAIL = /usr/sbin/crm_attribute -n hana_{{ sid_uppercase }}_glob_srHook -v SFAIL -t crm_config -s SAPHanaSR
      {{ sidadm }} ALL=(ALL) NOPASSWD: SOK, SFAIL

I fixed it thanks, actually the issue is : Always use visudo and not vi /etc/sudoers, because visudo will check the /etc/sudoers file for errors, the other won't.

Here is the post that helped : https://access.redhat.com/discussions/3364261#comment-2150041

Great, good to know you got the fix. Thanks for sharing...

Hello every one! I have a question about when one of the Hana nodes fail. When the node1 fail, the node 2 takeover and becomes the master, thats clear. But i dont understand when the VIP change from node1 to node2. Can someone explan that more in detail please? Because if VIP its created on node1 how it is supouse to switch to a node2 if there is a conection fail.

I was able to make the fail in node 1 and the takeover in node 2 automaticaly but cant achive to make the VIP switch.

Thnanks!

You are not supposed to move the vip resource, it will follow the primary Node on its own once the failover happens, there is a colocation constraint that you are supposed to create that will make vip and SAPHana resource attached one to the other :

code : "constraint colocation add vip_RH2_02 with master SAPHana_RH2_02-master 2000"

Maybe you are getting confused between Active-Passive and Active-Active scenario ?

Understanding mmm you are right, but i have another question. The VIP must exist or peacemaker also create it?

You are supposed to create it, the IP has to be free ( not used by any service/machine ). So you assign it with the command : ( as in step 4.6 here in this document ) pcs resource create vip_RH2_02 IPaddr2 ip="192.168.0.15"

Okay, and the last question. So if I understand everything, you must have created the IP in node1 and node2 right? for example you have in both nodes: ehtest1: 192.168.0.15, right?

Thanks in advice!

Hello,

no, there should be no static entry for the VIP on any of the nodes (if this was required then it would have been described in this documentation). The VIP will be dynamically created by the cluster on the node where it is supposed to be running using the appropriate resource agent depending on the platform the cluster is running on (for example for on-premise setups on bare-metal the IPaddr2 resource agent can be used, or on AWS you could use the aws-vpc-move-ip resource agent), and the resource agent will also make sure that the IP address is reachable from the outside.

Regards, Frank Danapfel Senior Software Engineer Red Hat SAP Alliance Technology Team

Right thanks both of you! Now all its crystal clear!

Hi all,

Since few days I am getting the "awk" command showing "SOK" even though there is no replication.

Whether I do a "pcs resource move", "HDB stop", "HDB kill", result is same.

What could be the issue ? the command was very accurate and worked for weeks, but now stopped suddenly.

Details :

awk command : awk '/ha_dr_SAPHanaSR.crm_attribute/ { printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_

RHEL version : Red Hat Enterprise Linux 8.3

HANA version : HANA 2.0 SPS05 Rev 54 ( 2.00.054.00.1611906357 )

we had issue with awk command, this what we do: Another thing older version like SPS03 needs pkill -9

SPS04+ :
sudo -S su - $dbUser -c "hdbnsutil -reloadHADRProviders"
SPS03:
sudo -S su - $dbUser -c  "pkill -9 hdbnameserver"
sudo su - $dbUser -c "grep -i 'loading HA/DR Provider' /usr/sap/$sid_uppercase/HDB00/$host_name/trace/nameserver_*"

Thanks Viney.

I just looked again and discovered that the SFAIL/SOK entries just showed up now ! ( after 2 Hours from the failover ), so the awk provides the right entries but late, very late, which is not good at all.

any input ?

Good to know that its working for you, I will talk to my Database colleagues and get back to you.

A tiny note on the 20-saphana sudoers file:

  1. the Cmnd_Alias is all uppercase

  2. the "sid" between "hana_" and "_site" is all lowercase, like this:

Cmnd_Alias MYSITE1_SOK   = /usr/sbin/crm_attribute -n hana_sid_site_srHook_MYSITE1 -v SOK -t crm_config -s SAPHanaSR

Then, always chmod the file with 440 and chown it with root:root. When you'll run visudo -c to check for errors, it should report none.

(SAPHanaSR.py from resource-agents-sap-hana-4.1.1-61.el7_9.11.x86_64)

Hello Fulvio,

thanks for sharing this information. The instructions on how to create the 20-saphana sudoers file have now been updated to point out that visudo should be used to create the file which wil ensure that the ownership and permissions of the file will be correct.

Also some information has been added that the lowercase SID must be used for the 'hana_sid_site_sitename' parameter in the sudoers file

Regards,
Frank Danapfel
Senior Software Engineer
Red Hat SAP Alliance Technology Team

Hello! I have a question related with the VIP (again...) Once the VIP is created, how do I register it in the DNS. What I mean is the hostname to which I point the entry, let's say my VIP is : 192.168.100.1

The DNS entry would be:

192.168.100.1 mynode1hostname

The doubt is if I put that entry when there is failure and node 2 is the primary one, what will happen? should I register the two nodes with the same ip in the DNS? Sorry if I ask silly questions but it causes me a lot of confusion.

Because of course my SAP applications like S4 have to know where they have to enter the service and those machines have to be able to see the VIP.

Thanks in advance!

Hello Aleksandrov,

VIP is the third IP in same subnet ideally, for example,

Node A : 192.168.1.10
Node B : 192.168.1.11
VIP :  192.168.1.12
your application connects to VIP, not to A or B. 

You Don't make CNAME to A or B but just create VIP as independent dns record.

Regards, Viney

Hello Alexandrov,

to clarify what Viney has mentioned: in DNS you will need to have at least 3 entries, one entry each for the hostname and IP address for each cluster node, and one entry for the hostname associated with the virtual IP address. For example:

192.168.1.10 nodea.example.com nodea
192.168.1.11 nodeb.example.com nodeb
192.168.1.12 vip.example.com vip

The clients will have to be configured to use vip.example.com to connect to HANA, this will ensure that when a failure occurs that requires the cluster to perform a takeover of the HANA instances from the primary to the secondary site the clients will still be able to connect to HANA because the cluster will take care that the virtual IP is also moved from one node to the other.

BTW. as documented in https://access.redhat.com/solutions/81123 it is also recommended to add the mapping between hostnames and IP addresses not only in DNS but also in the local /etc/hosts file on all cluster nodes and also on al clients that access thes cluster nodes, to prevent outages in case there is an outage with the DNS system used in the environment where the clster is running.

Regards,
Frank Danapfel
Senior Software Engineer
Red Hat SAP Alliance Technology Team

Good morning following this guide I have a problem, I think the nodes were disconnected with the cluster on and now I have the two nodes in DEMOTED and tells me that the service is on: "Resource is promotable but has not been promoted on any node." Any thoughts?

Your best resource for assistance is almost always going to be a support ticket.

Just a cursory look at your scenario makes me think of resource constraints possibly not being cleaned up after a failure. I'm thinking specifically that the nodes may be marked as unclean, or have temporary colocation locks that need to be cleared. The resources themselves are in a state waiting to be promoted, but there are no viable locations (nodes) where it can be started. There's no logs here, nothing to show pcs status, etc, so I'm mostly just telling you my instincts here, vs. anything specific.

The docs have some steps on identifying and cleaning up locks from failed starts and such, this chapter is probably the best place to start. But again, I would suggest opening a support ticket if you are stuck. They will ask you for SOSreports and such that provide much greater detail about the status of your cluster, and resources, as well as logs that may provide context necessary to truly give you specific advice here.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_high_availability_clusters/assembly_managing-cluster-resources-configuring-and-managing-high-availability-clusters

Thank you for the reply, i found the problem but im not sure how to fix it. Looking the logs a realize that my hook status is:

2021-11-09 15:31:45.108688 ha_dr_SAPHanaSR SOK 2021-11-09 15:41:55.011339 ha_dr_SAPHanaSR SFAIL 2021-11-09 15:41:56.385763 ha_dr_SAPHanaSR SFAIL 2021-11-09 15:41:56.540530 ha_dr_SAPHanaSR SFAIL

SFAIL how can i restart the status, and also think its ignoring it: ha_dr_SAPHanaSR SAPHanaSR.py(00113) : SAPHanaSR ### Ignoring bad SR status because of is_in_sync=True ###

Because he check the sync status of the node its fine but the status its SFail.

Some ideas how to fix this?

can you please provide details of crm_mon -1ARf, I would like to see LPA [Last Primary Active] number on both nodes, one should be higher than other.

If you see WAITING4PRIM, what means cluster needs you to intervene and tell cluster which one is primary to avoid data corruption.

-Viney

Cluster Summary:
  * Stack: corosync
  * Current DC: xx-xxx-ht02 (2) (version 2.0.3-5.el8_2.5-4b1f869f0f) - partition with quorum
  * Last updated: Wed Nov 10 08:49:44 2021
  * Last change:  Wed Nov 10 08:47:59 2021 by root via crm_attribute on xx-xxx-ht01
  * 2 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ xx-xxx-ht01 (1) xx-xxx-ht02 (2) ]

Active Resources:
  * Clone Set: SAPHanaTopology_HTD_00-clone [SAPHanaTopology_HTD_00]:
    * SAPHanaTopology_HTD_00    (ocf::heartbeat:SAPHanaTopology):       Started xx-xxx-ht01
    * SAPHanaTopology_HTD_00    (ocf::heartbeat:SAPHanaTopology):       Started xx-xxx-ht02
    * Started: [ xx-xxx-ht01 xx-xxx-ht02 ]
  * Clone Set: SAPHana_HTD_00-clone [SAPHana_HTD_00]:
    * SAPHana_HTD_00    (ocf::heartbeat:SAPHana):        FAILED xx-xxx-ht01
    * SAPHana_HTD_00    (ocf::heartbeat:SAPHana):       Started xx-xxx-ht02
    * Started: [ xx-xxx-ht02 ]

Node Attributes:
  * Node: xx-xxx-ht01 (1):
    * hana_htd_clone_state              : DEMOTED
    * hana_htd_op_mode                  : logreplay
    * hana_htd_remoteHost               : xx-xxx-ht02
    * hana_htd_roles                    : 4:P:master1:master:worker:master
    * hana_htd_site                     : NODE1
    * hana_htd_srmode                   : syncmem
    * hana_htd_version                  : 2.00.054.00.1611906357
    * hana_htd_vhost                    : xx-xxx-ht01
    * lpa_htd_lpt                       : 1636530479
    * master-SAPHana_HTD_00             : 150
  * Node: xx-xxx-ht02 (2):
    * hana_htd_clone_state              : DEMOTED
    * hana_htd_op_mode                  : logreplay
    * hana_htd_remoteHost               : xx-xxx-ht01
    * hana_htd_roles                    : 4:S:master1:master:worker:master
    * hana_htd_site                     : NODE2
    * hana_htd_srmode                   : syncmem
    * hana_htd_version                  : 2.00.054.00.1611906357
    * hana_htd_vhost                    : xx-xxx-ht02
    * lpa_htd_lpt                       : 30
    * master-SAPHana_HTD_00             : -INFINITY

Migration Summary:
  * Node: xx-xxx-ht01 (1):
    * SAPHana_HTD_00: migration-threshold=1000000 fail-count=2 last-failure=Tue Nov  9 17:16:48 2021:

Failed Resource Actions:
  * SAPHana_HTD_00_monitor_119000 on xx-xxx-ht01 'ok' (0): call=14, status='complete', exitreason='', last-rc-change='2021-11-09 17:16:48 +01:00', queued=0ms, exec=2223ms

Please open a support case so that we can more effectively help troubleshoot your issue.

The platform dont let me open support case about this toppic. I think my problem is that master-SAPHana_HTD_00 : -INFINITY

and it should be a value less than the node 1(primary) : 150

Somone know how to change that value?

seems like LPT / LPA is fine with ht01 (primary node), just that DB is failing there, at least thats what cluster says, check mounts needed for DB to start, if they are available. once you fix that you can give it a try again with

disclaimer: this will restart your DB if cluster is out of maintenance mode.

pcs resource cleanup SAPHana_HTD_00 --node xx-xxx-ht01

if DB is good out of cluster, then it could be attribute related to logreplay_readaccess etc, you can check that with :

# pcs property show

more details can be found in /var/log/cluster/corosync.log.

sync up with your DBA, cluster config should match with DB config.

As suggested by Mr. Reid Wahl, The expert, please open case or call support number if you can open case.

hope I made some sense.

Regards, Viney

Hello Alexandrov,

can you provide some more details on what you mean by 'The platform dont let me open support case about this toppic.' ?

As mentioned at the top of this document in order to be able to use this solution you need valid subscriptions either for 'RHEL for SAP solutions' (if the setup is on-premise or if you are using 'Bring Your Own Subscription' (BYOS) via Cloud Access) or 'RHEL for SAP with HA and update Services' (if you use 'Pay As You Go' (PAYG) on a Cloud platform).

When using the 'RHEL for SAP Solutions' subcription you are entitled to open support tickets directly on the Red Hat Customer Portal.

When using the 'RHEL for SAP Solutions with HA and Update Services' supcriptions in the PAYG model then you should be able to support tickets via the Cloud provider.

Regards, Frank Danapfel Senior Software Engineer Red Hat SAP Alliance Technology Team

HI Experts, i am trying to seggrate replication Lan for SAP HANA replication on different subnet . once i change the host file and redirect replication to look for different subnet with same hostname , my cluster start having issue . Any one face the similar issue . my Cluster is built on 10.10.10.10( Host One ) 10.10.10.11 Host 2 . Prd1 and Prd2 Respectively with domain suffix. in order to define HANA replication on different subnet i need to create one more host entry and direct prd1 and prd2 to 10.10.11.10 and 10.10.11.11. after the mapping replication work but cluster start showing error .

any one work on similar setup and can share how they are segregating the lan for replication . thanks in Advance

Hi experts , Any idea how to promote to Primary node as both node showing WAITING4LPA [root@pprdb01 ~]# crm_mon -1ARf Cluster Summary: * Stack: corosync * Current DC: pprdb01 (1) (version 2.1.0-8.el8-7c3f660707) - partition with quorum * Last updated: Sun Dec 5 18:35:54 2021 * Last change: Sun Dec 5 18:25:45 2021 by root via crm_resource on pprdb01 * 2 nodes configured * 6 resource instances configured

Node List: * Online: [ pprdb01 (1) pprdb02 (2) ]

Active Resources: * Vmware_fence_device (stonith:fence_vmware_soap): Started hisnbtpprdb01 * Clone Set: SAPHanaTopology_RH1_00-clone [SAPHanaTopology_RH1_00]: * SAPHanaTopology_RH1_00 (ocf::heartbeat:SAPHanaTopology): Started pprdb01 * SAPHanaTopology_RH1_00 (ocf::heartbeat:SAPHanaTopology): Started pprdb02 * Clone Set: SAPHana_RH1_00-clone [SAPHana_RH1_00] (promotable): * SAPHana_RH1_00 (ocf::heartbeat:SAPHana): Slave pprdb01 * SAPHana_RH1_00 (ocf::heartbeat:SAPHana): Slave pprdb02 * vip_RH1_00 (ocf::heartbeat:IPaddr2): Started pprdb02

Node Attributes: * Node: pprdb01 (1): * hana_rh1_clone_state : WAITING4LPA * hana_rh1_op_mode : logreplay * hana_rh1_remoteHost :pprdb02 * hana_rh1_roles : 1:P:master1::worker: * hana_rh1_site : PR01 * hana_rh1_srmode : sync * hana_rh1_version : 2.00.055.00.1615413201 * hana_rh1_vhost : pprdb01 * lpa_rh1_lpt : 1638698256 * master-SAPHana_RH1_00 : -9000 * Node: pprdb02 (2): * hana_rh1_clone_state : WAITING4LPA * hana_rh1_op_mode : logreplay * hana_rh1_remoteHost : pprdb01 * hana_rh1_roles : 1:P:master1::worker: * hana_rh1_site : SR01 * hana_rh1_srmode : sync * hana_rh1_version : 2.00.055.00.1615413201 * hana_rh1_vhost : pprdb02 * lpa_rh1_lpt : 1638698242 * master-SAPHana_RH1_00 : -9000

I think if you run crm_attribute -N pprdb02 -n lpa_rh1_lpt -v 20 -l forever, that will cause node 1 to be promoted and node 2 to be registered as secondary. It may be the other way around.

If you have a Red Hat support subscription, then I encourage you to open support cases for a faster and more reliable response when you have questions or concerns.

I think if you run crm_attribute -N pprdb02 -n lpa_rh1_lpt -v 20 -l forever, that will cause node 1 to be promoted and node 2 to be registered as secondary. It may be the other way around.

If you have a Red Hat support subscription, then I encourage you to open support cases for a faster and more reliable response when you have questions or concerns.

crm_attribute -N pprdb02 -n lpa_rh1_lpt -v 20 -l Command tried to Start Hana on one node and for some time shows one as promoted . but after some some both showing as Demoted and HANA is stopped on both cluster nodes. It is not a production cluster yet so can do few more troubleshooting . Could you please share what else i can check and is there any Trouble shooting guide which i can follow for "Automating SAP HANA Scale-Up System Replication using the RHEL HA Add-On " Setups.

Thanks

If the 'hana__clone_state' node attribute shows WAITING4LPA on both nodes it means that the HANA setup is in a state where the resources agents aren't able to determine which one of the HANA instances is supposed to be the primary and and which is the secondary replication site.

hana__roles shows the status of both HANA instances as '1:P:...' which means both HANA instances aren't running, but both currently 'think' they are the primary HANA site. And since the value for the lpa_rh1_lpt node attribute is almost identical on both cluster nodes the resource agent won't automatically recover from this situation because it might cause data loss.

So in order to get out of this situation you should actually do the following: 1. completely stop the cluster 2. determine which of the two HANA instances is the actual primary (the one where clients were last connecting to), and start that HANA instance manually 3 . verify that the other HANA instance isn't running and then register it as the secondary site 4. start the second HANA instance and verify that HANA system replication is working as expected 5. start the cluster again

Hi FD , thanks for your response. i have done same as you advice . stop cluster , setup the replication . start the cluster. after that when i start the cluster , Actually cluster seems trying to bring down the primary node and promoting secondary node as primary . have you seen such outcome ?

Hello Ravi,

Once you are done with Node A to Node B replication working and you have verified thats fully with:

python systemReplicationStatus.py 

To start with clean slate, you can put cluster in maintenance mode & stop start cluster and remove maintenance mode, this will clear all existing states, wait for sometime between each command, paste the crm_mon -1ARf output for details:

/usr/sbin/pcs property set maintenance-mode=true
pcs cluster stop --all
pcs cluster start --all 
/usr/sbin/pcs property set maintenance-mode=false

if that doesn't help, you can tell cluster to set which one is primary by below command:

crm_attribute --node NodeA  --name lpa_SID_lpt --update 20 

[high value has to determined looking at cluster state]

Thanks Viney

Thanks Viney for your support. System is back as intended. Since this is customized solution from Redhat and SAP both , Is there any specific documentation on administration and troubleshooting for this solution.

Good hear that your problem is solved, this page has most of the things you need, for pacemaker in general refer to clusterlabs & sap blog

Regards, Viney

visudo/sudoers expects for "DC1" alias identifier to be ALLCAPS. Replacing DC1 with something like, "Site1" will cause visudo to fail with a syntax error.

So the definition of the site can be camel-case, but the alias must be CAPS, so if the commands that are being issued will match the site name's case, then it becomes a requirement that the site name MUST be all caps also.

On each cluster node create the file /etc/sudoers.d/20-saphana by running sudo visudo -f /etc/sudoers.d/20-saphana and add the contents below to allow the hook script to update the node attributes when the srConnectionChanged() hook is called.
Replace rh2 with the lowercase SID of your HANA installation and replace DC1 and DC2 with your HANA site names.

Raw
Cmnd_Alias DC1_SOK   = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC1 -v SOK -t crm_config -s SAPHanaSR
Cmnd_Alias DC1_SFAIL = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC1 -v SFAIL -t crm_config -s SAPHanaSR
Cmnd_Alias DC2_SOK   = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC2 -v SOK -t crm_config -s SAPHanaSR
Cmnd_Alias DC2_SFAIL = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC2 -v SFAIL -t crm_config -s SAPHanaSR
rh2adm ALL=(ALL) NOPASSWD: DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL
Defaults!DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL !requiretty

While it might be correct that for the command alias name the sitename must be ALLCAPS, in the 'hana_sid_site_srHook_sitename'' parameter given to the crm_attribute command 'sitename' must be specified in exactly the same case as it was defined when setting up HANA System replication, otherwise the node attribute that the 'crm_attribute' command is trying to read/modify will not match the node attribute used by the SAPHANA/SAPHanaTopology resource agents.

So if the sitenames have been specified as 'Site1' and 'Site2' when setting up HANA System Replication, then for the name of the command aliases 'SITE1_SOK', 'SITE1_SFAIL', SITE2_SOK' ans 'SITE2_SFAIL' must be used. But for the node attribute given to the crm_attribute command 'hana_sid_site_srHook_Site1' and 'hana_sid_site_srHook_Site2' must be used.

Hi all,

Is there a way to integrate some SAP Applications ( specifically SAP Convergent Mediation -by DigitalRoute ) into Redhat HA (pacemaker) ? I have found some SAP Notes for CM High Availability, but that's a very high level note : https://launchpad.support.sap.com/#/notes/3079845 , the title is "Standard Practices for SAP CM High Availability", and what it basically says is : "Requirements and Standard practice for SAP CM High Availability for servers running on bare-metal and virtual servers (eg: vmware images):

  • HA Clustering software (eg: Redhat Pacemaker. Not part of CM product)

  • 2 servers - primary & secondary

  • 1 virtual ip (shared between primary & secondary)

  • external shared network storage (eg: NAS)

Install CM platform on primary server only using virtual ip. "

But with no further details on how to integrate, what files are to be edited, where to push the ha_dr_provider parameters, or where the hook script must be..

I know from other distros that SAP ASCS and SAP WebDispatcher can be integrated into pacemaker HA, so what about CM ?

Thanks !

It is also possible to use the RHEL HA Add-On to manage SAP (A)SCS, ERS, WebDispatcher and other SAP instance types, see https://access.redhat.com/articles/4079981#3-ha-solution-for-netweaver-or-s4-based-on-abap-platform-1709-or-older-18 and https://access.redhat.com/articles/4079981#2-ha-solution-for-s4hana-based-on-abap-platform-1809-or-newer-13 for more information.

I'm not familiar with this SAP Convergent Mediation Product, but if it also use the same mechanisms like the typical SAP instances (sapcontrol, sapstartsrv, ...) to manage its processes then it should be possible to use the SAPInstance resource agent to manage SAP CM instances in a pacemaker cluster on RHEL as well. If the SAP CM instances are managed differently then it might be possible to use a shell script and an LSB resource or a systemd resource to manage them in a cluster, but this would then be a custom solution which would not be fully supported by Red Hat (see https://access.redhat.com/solutions/753443 for more information).

All other requirements mentioned in the SAP Note, like moving around a virtual IP address or switching file systems from one cluster node to another, are basic functionality covered by the RHEL HA Add-On.

Since it looks like SAP CM is developed by a partner of SAP (DigitalRoute) it might actually be best to try to reach out directly to them to see if they can provide any further detalis on what is needed to set up a pacemaker cluster for SAP CM.

Thanks Frank, your answer is definitely helpful, and your remarks are accurate. I will ask the customer to reach the SAP CM people and get their feedback.

Hello experts, I was looking for documentation about how to operate in special cases with the pacemaker and hana. I dont know if you can provide me some links, I was look for example when you are going to upgrade hana version and how needs to be operated with the cluster under this type of situations.

KR, Petar.

Hello Aleksandrov,

I'm not aware of any documentation that specifically describes how to upgrade SAP HANA when it is managed by an HA solution like the one described above.

But SAP provides documentation on how to update HANA System Replication environments: https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/de9f8420b74d41d5aaae535602f970a6.html

And there is also some documentation available from us on best practices for updating a cluster setup based on the RHEL HA Add-On: https://access.redhat.com/articles/2059253

Does anyone has any good links or docs to setup Hana/RHEL8.x pacemaker cluster setup docs. We are trying to do POC and replacing HP service Guard.

Hello Mike,

You can try this for GCP https://cloud.google.com/solutions/sap/docs/sap-hana-ha-config-rhel. but fencing agent will change if you are doing it for i.e. bare metal like HPE iLO, VMware or Public cloud. it much simpler in bare metal and VMware for VIP resource and fencing agents.

-Vin

Here is the link from Redhat describing how to configure Redhat HA with Pacemaker : https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/pdf/configuring_and_managing_high_availability_clusters/red_hat_enterprise_linux-8-configuring_and_managing_high_availability_clusters-en-us.pdf

And also the page you are reading right now is very well documented, and it should be sufficient to configure a RHEL pacemaker Cluster.

By the way, I would say that Serviceguard is much simpler to configure than pacemaker, it is much easier to test the failover/failback, and much simpler to troubleshoot in case of issues, plus it supports 3-Tier replication nowadays.

But maybe its cost made you think about moving to pacemaker, anyway,

Hello Mike,

in addition to the documentation already shared by others I would also recommend to have a look at the following documents: - Support policies for Red Hat HA clusters: https://access.redhat.com/articles/2912891 (provides links to the various support policies for setting up clusters using the RHEL HA Add-On, which provide guidance for example on whch fencing /STONITH mechanisms are supported on which platform) - Overview of the Red Hat HA solutions for managing SAP HANA: https://access.redhat.com/articles/4079981#sap-hana (there you will also find links to the configuration guides for these HA solutions on platforms like AWS, Azure and GCP)

And since you are doing a POC it might be a good idea to reach out to Red Hat consulting (https://www.redhat.com/en/services/consulting) to see if you can get assistance from one of our consultants to get the setup working as expected.

Good morning,

I have implemented everything that appears in this thread, I just need to use fence, in this case I think I will go for the scsi fence, my question is only need to have a shared disk independent of the two nodes? With that these have access to it is enough for me right?

I have also read something about ring0 and ring1 but I do not know if it applies to this case?

thanks for your help

Hello Alexandrov,

sorry, using fence_scsi is not possible for these kinds of setups, because as documented in https://access.redhat.com/articles/3078811 fence_scsi can only be used if all managed applications are designed to interact with one or more shared storage devices, which is not the case for these types of HANA System Replication setups. You will have to figure out which is the correct fence mechanism for your platform that can ensure that a faulty cluster node can be completely powered off so that it is not accessible anymore until it is powered on again.

ring0 and ring1 are related to the "Redundant Ring Protocol" (RRP) which can be used to set up a redundant heartbeat network for the cluster communication. See https://access.redhat.com/solutions/61832 for more information. Since this is only related to the internal cluster communication it is independent of the fencing mechanism or the cluster configuration used for managing HANA System Replication, so if you want to improve the resiliency of your cluster against failures of the network infrastructure used for the cluster heartbeat you could still set up RRP in combination wit this HA solution.