Configure SAP Netweaver ASCS/ERS ENSA1 on Amazon Web Services (AWS)

Updated -

Contents

1. Overview

1.1. Introduction

SAP NetWeaver-based systems (like SAP ERP, SAP CRM, ...) play an important role in business processes, thus it's critical for SAP NetWeaver-based systems to be highly available. The underlying idea of Clustering is a fairly simple one: Not a single large machine bears all of the load and risk; but rather one or more machines automatically drop in as an instant full replacement for the service or the machine that has failed. In the best case, this replacement process causes no interruption to the systems' users.

1.2. Audience

This document is intended for SAP, Red Hat, and AWS certified or trained administrators and consultants who already have experience setting up high available solutions using the RHEL HA add-on or other clustering solutions. Access to both SAP Service Marketplace and Red Hat Customer Portal is required to be able to download software and additional documentation.

1.3. Concepts

This document describes how to set up a two-node-cluster solution that conforms to the guidelines for high availability that have been established by both SAP and Red Hat, particularly on AWS. It is based on SAP NetWeaver on top of Red Hat Enterprise Linux 7.5 or newer with RHEL HA Add-on. The reference architecture described in this document is under the tests of SAP High Availability Interface certification NW-HA-CLU 7.50. This document will be updated after the certification is obtained.

At application tier, the lock table of the enqueue server is the most critical piece. To protect it, SAP has developed the “Enqueue Replication Server” (ERS) which maintains a backup copy of the lock table. While the (A)SCS is running on node1, the ERS always needs to maintain a copy of the current enqueue table on node2. When the system needs to failover the (A)SCS to node2, it first starts on node2, and then shuts down the ERS, taking over its' shared memory segment and thereby acquiring an up-to-date enqueue table, thus the replicated enqueue table becomes the new primary enqueue table. ERS will start on node1 when it becomes available.

Required by the SAP HA-Interface certification, this document focuses on the high availability of (A)SCS and ERS, both are controlled by the cluster software. In normal mode, cluster ensures that ERS and (A)SCS always run on different nodes. In the event of failover, the (A)SCS needs to “follow” the ERS. (A)SCS needs to switch to the node where ERS is running. Cluster has to ensure that the ERS is already running when (A)SCS starts up, because (A)SCS needs to take over the replicated enqueue table from ERS.

The concept described above is known as Standalone Enqueue Server 1 (ENSA1), available in ABAP 1709 or older. For Standalone Enqueue Server 2 (ENSA2) which is now the default installation in ABAP 1809 or newer, please check Configure SAP S/4 ASCS/ERS ENSA2 on Amazon Web Services (AWS).

1.4. Resources: Standalone vs. Master/Slave

There are two approaches to configure (A)SCS and ERS resources in Pacemaker: Master/Slave and Standalone. Master/Slave approach has already been supported in all RHEL 7 minor releases. Standalone approach is supported in RHEL 7.5 and newer.

In any new deployment, Standalone is recommended for the following reasons:
- it meets the requirements of the current SAP HA Interface Certification
- it is compatible with the new Standalone Enqueue Server 2 (ENSA2) configuration
- (A)SCS/ERS instances can be started and stopped independently
- (A)SCS/ERS instance directories can be managed as part of the cluster

This article outlines the configuration procedure of the SAPinstance Standalone approach. For instructions on SAPInstance Master/Slave configuration, please refer to kabse article SAP Netweaver in Pacemaker cluster with Master/Slave SAPInstance resource

1.5. Support Policies

See: Support Policies for RHEL High Availability Clusters - Management of SAP Netweaver in a Cluster

Note the parts specific to Amazon Web Services.

2. Requirements

2.1. Subscriptions

It’s important to keep the subscription, kernel, and patch level identical on both nodes.

There are two ways to consume RHEL For SAP Solutions subscription on AWS: Pay As You Go (PAYG), or Bring Your Own Subscription (BYOS).

2.1.1. Pay As You Go - RHEL for SAP with HA and US

You can start RHEL instances using AMI image of RHEL for SAP with High Availability and Update Services in AWS Marketplace.

2.1.2. Bring Your Own Subscription - Red Hat Enterprise Linux for SAP Solutions

To port your RHEL for SAP Solutions subscriptions from on-premise to AWS, you must be enrolled in the Red Hat Cloud Access program and have unused RHEL for SAP Solutions subscriptions.

Please follow this kbase article to subscribe your systems to the Update Service for RHEL for SAP Solutions.

Note: One unused subscription migrated to AWS can be used for two RHEL EC2 instances. If you have two unused subscriptions, you can create four EC2 instances, and so on.

2.2. Pacemaker Resource Agents

The ASCS/ERS Standalone approach is supported by resource-agents-sap-3.9.5-124.el7.x86_64 or newer that contains the IS_ERS attribute for SAPInstance resource agent. It's included in the following RHEL releases:

  • AWS Marketplace: RHEL for SAP with HA and US 7.5 or newer
  • Cloud Access: RHEL for SAP Solutions 7.5 or newer

2.3. SAP NetWeaver High-Availability Architecture

A typical setup for SAP NetWeaver High-Availability consists of 3 distinctive components:

This article focuses on the configuration of SAP Netweaver ASCS and ERS in a pacemaker cluster. As the best practice, we recommend to install Application Servers and Database on separate nodes outside of the two-node cluster designated for (A)SCS and ERS.

Below is the architecture diagram of the example installation:

EFS Creation

2.4. SAPInstance resource agent

SAPInstance is a pacemaker resource agent used for both ASCS and ERS resources. All operations of the SAPInstance resource agent are done by using the SAP start service framework sapstartsrv.

2.5. Storage requirements

Directories created for Netweaver installation should be put on shared storage, following the rules:

2.5.1. Instance Specific Directory

The instance specific directory for 'ASCS' and 'ERS', respectively, must be present on the corresponding node. These directories must be available before the cluster is started.

  • ASCS node: /usr/sap/SID/ASCS<Ins#>
  • ERS node: /usr/sap/SID/ERS<Ins#>

For Application Servers, the following directory should be made available on corresponding node designated for the Application Server instance:

  • App Server D<Ins#>: /usr/sap/SID/D<Ins#>

2.5.2. Shared Directories

The following mount points must be available on ASCS, ERS, and Application Servers nodes.

/sapmnt
/usr/sap/trans
/usr/sap/SID/SYS

2.5.3. Shared Directories on HANA

The following mount point(s) must be available on the HANA node.

/sapmnt

These mount points must be either managed by cluster or mounted before cluster is started.

2.5.4. Amazon EFS as Shared Storage

Amazon Elastic File System (Amazon EFS) provides simple and scalable shared file storage service for Amazon EC2 instances. The configurations in the document uses Amazon EFS as the shared storage, mounted as NFS.

3. Required AWS Configurations

3.1. Initial AWS Setup

For instructions on the initial setup of the AWS environment, please refer to Installing and Configuring a Red Hat Enterprise Linux 7.4 (and later) High-Availability Cluster on Amazon Web Services.

As a summary, you should have configured the following components:

  • An AWS account
  • A Key Pair
  • An IAM user with permissions to: modify routing tables and create security groups, create IAM policies and roles
  • A VPC
  • 3 subnets: one public subnet, and two private subnets spanning in two different availability zones (that's recommended to minimize the service disruption related to zone-wise failures. However, single availability zone is also acceptable)
  • NAT Gateway
  • Security Group for the jump server
  • Security Group for the Netweaver instances
  • Remote Desktop Protocol (RDP) for GUI Access

3.2. Choose the Supported SAP NW Instance Type and Storage

Based on the sizing requirements, choose the appropriate instance type for ASCS/ERS, and application servers, respectively.

  • It's recommended to create the two nodes for ASCS and ERS in different availability zones.
  • To enable access, copy your private key to the jump server, and each of the Netweaver nodes.

In Services -> EC2 -> IMAGES -> AMIs choose one of the following AMIs:

  • RHEL 7.5 for Bring Your Own Subscription via Cloud Access
  • RHEL for SAP with HA and US 7.5 for Pay As You Go

AMI and Instance Type of Jump Server:

  • You can select an instance type from the free tier, e.g. t2.micro with 30GiB EBS storage volume.
  • RHEL 7.5 is sufficient, no need to use RHEL for SAP Solutions nor RHEL for SAP with HA and US.

After the Netweaver nodes are created, note the following information for use in the next section:

  • Region: e.g. us-east-1
  • Account ID of the AWS user account
  • Instance ID of the two ASCS/ERS nodes
  • AWS Access Key ID
  • AWS Secret Access Key

In the example, the following EC2 instances are created:

node1:     ASCS/ERS cluster node 1
node2:     ASCS/ERS cluster node 2
nwhana1:   HANA node
nwpas:     Primary Application Server node
nwaas:     Additional Application Server node

3.3. Create Policies

For the IAM user, you need to create three policies:

Services -> IAM -> Policies -> Create DataProvider Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "EC2:DescribeInstances",
                "EC2:DescribeVolumes"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "cloudwatch:GetMetricStatistics",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::aws-data-provider/config.properties"
        }
    ]
}

Services -> IAM -> Policies -> Create OverlayIPAgent Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1424870324000",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceAttribute",
                "ec2:DescribeTags",
                "ec2:DescribeRouteTables"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Stmt1424860166260",
            "Action": [
                "ec2:CreateRoute",
                "ec2:DeleteRoute",
                "ec2:ReplaceRoute"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:ec2:<region>:<account-id>:route-table/<ClusterRouteTableID>"
        }
    ]
}

In the last Resource clause, replace the following parameters with the real values:

  • region: e.g. us-east-1
  • account-id: the account ID of the user account
  • ClusterRouteTableID: the route table ID for the existing cluster VPC route table, in format of rtb-XXXXX.

Services -> IAM -> Policies -> Create STONITH Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1424870324000",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceAttribute",
                "ec2:DescribeTags"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Stmt1424870324001",
            "Effect": "Allow",
            "Action": [
                "ec2:ModifyInstanceAttribute",
                "ec2:RebootInstances",
                "ec2:StartInstances",
                "ec2:StopInstances"

            ],
            "Resource": [
                "arn:aws:ec2:<region>:<account-id>:instance/<instance-id-node1>",
                "arn:aws:ec2:<region>:<account-id>:instance/<instance-id-node2>"
            ]
        }
    ]
}

In the last Resource clause, replace the following parameters with the real values:

  • region: e.g. us-east-1
  • account-id: the account ID of the user account
  • instance-id-node1, instance-id-node2: instance ID of the two SAP ASCS/ERS instances

3.4. Create an IAM Role

Create an IAM Role, attached the 3 policies that are created in previous step, and assign the role to the two ASCS/ERS instances.

  • In EC2 -> IAM -> Roles -> Create a new role, e.g. PacemakerRole, attach the 3 policies to it: DataProvider, OverlayIPAgent, and STONITH
  • Assign the role to the ASCS/ERS and HANA instances: Perform for ALL nodes: in AWS EC2 console, right click the node -> Instance Settings -> Attach/Replace IAM Role -> Select PacemakerRole, click Apply.

3.5. Install AWS CLI

Follow the Install the AWS CLI section to install and verify the AWS CLI configuration on the ASCS/ERS and HANA nodes.

3.6. Configure EFS file system

3.6.1. Create EFS file system

In the AWS console -> EFS -> Create File System -> Select the VPC of the Netweaver installation, and the availability zones where the Netweaver and HANA nodes are running, so the nodes can access the EFS file system:

EFS Creation

In the example, a EFS file system has been created: fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/

3.6.2. Mount EFS as NFS

Mount the root directory of the EFS volume:

# mkdir /mnt/efs
# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/ /mnt/efs

3.6.3. Create sub-directories

On the EFS volume, create sub-directories to be mounted as various shared file systems used by SAP Netweaver HA installation:

[root@node1 ~]# cd /mnt/efs
[root@node1 efs]# mkdir RH2 trans sapmnt
[root@node1 efs]# chmod 777 *
[root@node1 efs]# cd RH2
[root@node1 RH2]# mkdir SYS ASCS20 D21 D22 ERS29
[root@node1 RH2]# chmod 777 *

3.7. Create Overlay IP addresses

On AWS, the virtual IP for IP failover is achieved by Overlay IP addresses. For details please refer to IP Failover with Overlay IP Addresses.

3.7.1. Determine the Virtual IP

In the example, the following instances need a virtual IP. Make sure the IP addresses are not in use by other applications.

ASCS virtual hostname and IP:   rhascs    192.168.200.101
ERS  virtual hostname and IP:   rhers     192.168.200.102
HANA virtual hostname and IP:   rhdb      192.168.200.103

3.7.2. Disable Source/Dest. Check

Disable Source/Dest. Check on the 3 instances: node1, node2, and nwhana1.

In AWS console, right click the respective instance -> Networking -> Change Source/Dest. Check -> In the pop up window, click “Yes, Disable”.

Disable Source/Dest. Check

3.7.3. Map Overlay IP on respective instance

For the Netweaver installation, the virtual hostname and virtual IP's are mapped to the following instances, respectively:

node1     rhascs    192.168.200.101
node2     rhers     192.168.200.102
nwhana1   rhdb      192.168.200.103

On node1, run the following commands. You must have AWS CLI installed. (The mapping can also be added in the routing table through AWS console)

# echo $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
I-0b1c0612341288612
# aws ec2 create-route --route-table-id rtb-9dd99ee2 --destination-cidr-block 192.168.200.101/32 --instance-id I-0b1c0612341288612

Repeat the step on node2 and nwhana1.

Check AWS console, you should now see the mapping in the routing table rtb-9dd99ee2:

Overlay IP mapping

3.7.4. Add the Virt IP address on respective instance

Add the IP address on each corresponding instance:

[root@node1 ~]# ip address add 192.168.200.101 dev eth0
[root@node2 ~]# ip address add 192.168.200.102 dev eth0
[root@nwhana1 ~]# ip address add 192.168.200.103 dev eth0

3.7.5. Add Virt IP addresses to /etc/hosts on all instances

On every instance of the Netweaver installation, add the virtual IP and Hostname mapping to /etc/hosts file:

[root]# cat /etc/hosts
192.168.200.101 rhascs
192.168.200.102 rhers
192.168.200.103 rhdb

3.7.6. Test the Virt IP addresses

Make sure the virtual IP addresses can be reached, and virtual hostnames can be resolved. Also try to ssh to the node using the virtual IP or hostname.

3.8. Optional - Configure Route 53 Agent

In order to route traffic to an Amazon EC2 Instance, Route 53 Agent is needed. Please follow the documents to configure it:

4. Install SAP Netweaver

4.1. Configuration options used in this document

Below are configuration options that will be used for instances in this document:

Two nodes will be running the ASCS/ERS instances in pacemaker:

1st node hostname:      node1
2nd node hostname:      node2

SID:                    RH2

ASCS Instance number:   20
ASCS virtual hostname:  rhascs

ERS Instance number:    29
ERS virtual hostname:   rhers

Outside the two-node cluster:

PAS Instance number:    21
AAS Instance number:    22

HANA database:

SID:                    RH0
HANA Instance number:   00
HANA virtual hostname:  rhdb

4.2. Prepare hosts

Before starting installation, ensure that:

  • Install RHEL 7.5 or newer
    • When using RHEL for SAP with HA and US AMI:
      • Create instance using the RHEL for SAP with HA and US AMI, 7.5 or newer
    • When using RHEL for SAP Solutions via Cloud Access:
      • Install RHEL for SAP Solutions 7.5 or newer
      • Register system to RHN or Satellite, enable RHEL for SAP Applications channel, or Update Services (E4S) channel
      • Enable High Availability Add-on channel
  • Shared storage and filesystems are present at correct mount points
  • Virtual IP addresses used by instances are present and reachable
  • Hostnames that will be used by instances can be resolved to IP addresses and back
  • Installation medias are available
  • System is configured according to the recommendation for running SAP Netweaver

4.3. Install Netweaver

Using software provisioning manager (SWPM) install instances in the following order:

  • ASCS instance
  • ERS instance
  • DB instance
  • PAS instance
  • AAS instances

4.3.1. Install ASCS on node1

The following file systems should be mounted on node1, where ASCS will be installed:

[root@node1 ~]# mkdir /sapmnt
[root@node1 ~]# mkdir -p /usr/sap/trans /usr/sap/RH2/SYS /usr/sap/RH2/ASCS20
[root@node1 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RHEL-NW-HA-750/sapmnt /sapmnt
[root@node1 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RHEL-NW-HA-750/trans /usr/sap/trans
[root@node1 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RHEL-NW-HA-750/RH2/SYS /usr/sap/RH2/SYS
[root@node1 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RHEL-NW-HA-750/RH2/ASCS20 /usr/sap/RH2/ASCS20

Virtual IP for rhascs should be enabled on node1.

Run the installer:

[root@node1]# ./sapinst SAPINST_USE_HOSTNAME=rhascs

Select High-Availability System option.

ASCS Installation

4.3.2. Install ERS on node2

The following file systems should be mounted on node2, where ERS will be installed:

[root@node2 ~]# mkdir /sapmnt
[root@node2 ~]# mkdir -p /usr/sap/trans /usr/sap/RH2/SYS /usr/sap/RH2/ERS29
[root@node2 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RHEL-NW-HA-750/sapmnt /sapmnt
[root@node2 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RHEL-NW-HA-750/trans /usr/sap/trans
[root@node2 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RHEL-NW-HA-750/RH2/SYS /usr/sap/RH2/SYS
[root@node2 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RHEL-NW-HA-750/RH2/ERS29 /usr/sap/RH2/ERS29

Virtual IP for rhers should be enabled on node2.

Run the installer:

[root@node2]# ./sapinst SAPINST_USE_HOSTNAME=rhers

Select High-Availability System option.

ERS Installation

4.3.3. SAP HANA

In the example, SAP HANA will be using the following configuration. You can also use other supported database.

SAP HANA SID:                    RH0
SAP HANA Instance number:        00

SAP HANA should be installed on separate host. Optionally, Automated HANA System Replication can be installed in another pacemaker cluster by following document Configure SAP HANA System Replication in Pacemaker on Amazon Web Services.

Run the installer on the HANA host:

[root]# ./sapinst SAPINST_USE_HOSTNAME=rhdb

4.3.4. Install Application Servers

The following file systems should be mounted on the host to run the Application Server instance. If you have multiple application servers, install each one on corresponding host:

/usr/sap/RH2/D<Ins#>
/usr/sap/RH2/SYS
/usr/sap/trans
/sapmnt

Run the installer:

[root]# ./sapinst

Select High-Availability System option.

4.4. Post Installation

4.4.1. (A)SCS profile modification

(A)SCS instance requires following modification in profile to prevent automatic restart of enqueue server as it will be managed by cluster. To apply the change run the following command at your ASCS profile /sapmnt/RH2/profile/RH2_ASCS20_rhascs.

[root]# sed -i -e 's/Restart_Program_01/Start_Program_01/' /sapmnt/RH2/profile/RH2_ASCS20_rhascs

4.4.2. ERS profile modification

ERS instance requires following modification in profile to prevent automatic restart as it will be managed by cluster. To apply the change run the following command at your ERS profile /sapmnt/RH2/profile/RH2_ERS29_rhers.

[root]# sed -i -e 's/Restart_Program_00/Start_Program_00/' /sapmnt/RH2/profile/RH2_ERS29_rhers

4.4.3. Update the /usr/sap/sapservices file

On both node1 and node2, make sure following two lines are commented out in /usr/sap/sapservices file:

#LD_LIBRARY_PATH=/usr/sap/RH2/ASCS20/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/RH2/ASCS20/exe/sapstartsrv pf=/usr/sap/RH2/SYS/profile/RH2_ASCS20_rhascs -D -u rh2adm
#LD_LIBRARY_PATH=/usr/sap/RH2/ERS29/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/RH2/ERS29/exe/sapstartsrv pf=/usr/sap/RH2/ERS29/profile/RH2_ERS29_rhers -D -u rh2adm

4.4.4. Create mount points for ASCS and ERS on the failover node

Respectively:

[root@node1 ~]# mkdir /usr/sap/RH2/ERS29
[root@node1 ~]# chown rh2adm:sapsys /usr/sap/RH2/ERS29

[root@node2 ~]# mkdir /usr/sap/RH2/ASCS20
[root@node2 ~]# chown rh2adm:sapsys /usr/sap/RH2/ASCS20

4.4.5. Manual test starting instance on other node

Stop ASCS and ERS instances. Move the instance specific directory to the other node:

[root@node1 ~]# umount /usr/sap/RH2/ASCS20
[root@node2 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RH2/ASCS20 /usr/sap/RH2/ASCS20

[root@node2 ~]# umount /usr/sap/RH2/ERS29
[root@node1 ~]# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-xxxxxxxx:/RH2/ERS29 /usr/sap/RH2/ERS29

Move the Overlay IP of ASCS and ERS to the other node, respectively:

[root@node1 ~]# ip address del 192.168.200.101 dev eth0
[root@node2 ~]# ip address del 192.168.200.102 dev eth0

[root@node2 ~]# ip address add 192.168.200.101 dev eth0
[root@node1 ~]# ip address add 192.168.200.102 dev eth0

Manually start ASCS and ERS instances on the new cluster node, then manually stop them, respectively.

4.4.6. Check SAP HostAgent on all nodes

On all nodes check if SAP HostAgent has the same version and meets the minimum version requirement:

[root]# /usr/sap/hostctrl/exe/saphostexec -version

To upgrade/install SAP HostAgent, follow SAP note 1031096.

4.4.7. Install permanent SAP license keys

SAP hardware key determination in the high-availability scenario has been improved. It might be necessary to install several SAP license keys based on the hardware key of each cluster node. Please see SAP Note 1178686 - Linux: Alternative method to generate a SAP hardware key for more information.

5. Install Pacemaker

Follow Pacemaker documentation: HA Add-On Reference - RHEL 7.

Below is a sample procedure to install pacemaker. It's recommended to work with a Red Hat consultant to install and configure Pacemaker in your environment.

5.1. Install Pacemaker rpm's

# yum -y install pcs pacemaker fence-agents-aws
# passwd hacluster
[provide a password]
# mkdir -p /var/log/pcsd
# systemctl enable pcsd.service; systemctl start pcsd.service

5.2. Create a Cluster

Create a cluster named nwha, consisting of node1 and node2, and start the cluster. Please note that at this point, cluster is not yet configured to auto-start after reboot.

# mkdir -p /var/log/cluster
# pcs cluster auth node1 node2
# pcs cluster setup --name nwha node1 node2
# pcs cluster start --all

5.2.1. Define General Cluster Properties

Set the resource stickiness:

# pcs resource defaults resource-stickiness=1
# pcs resource defaults migration-threshold=3

5.3. Configure STONITH

5.3.1. Look up the Instance ID

Take note of the Instance ID of node1 and node2, for use in the next step:

[root@node1]# echo $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
i-0b1c0612341288612
[root@node2]# echo $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
i-0e4e940fbbcb87337

5.3.2. Create STONITH

# pcs stonith create stonith-nwha fence_aws region=us-east-1 \
pcmk_host_map="node1:i-0b1c0612341288612;node2:i-0e4e940fbbcb87337" \
power_timeout=240 pcmk_reboot_timeout=600 pcmk_reboot_retries=4 \
pcmk_max_delay=45 op start timeout=600 op stop timeout=600 op monitor interval=180

# pcs config
...
Stonith Devices:
 Resource: stonith-hanasr (class=stonith type=fence_aws)
  Attributes: pcmk_host_map=node1:i-01e958df71a17cbb5;node2:i-0821bf7b586a564ba pcmk_reboot_retries=4 pcmk_reboot_timeout=480 power_timeout=240 region=us-east-1
  Operations: monitor interval=60s (stonith-hanasr-monitor-interval-60s)
Fencing Levels:
...

5.3.3. Test fencing

After configuring the STONITH, on node1, test fencing node2.

[root@node1]# pcs stonith fence node2

node2 should be properly fenced. After fencing, start cluster on node2 using the following command. This is because the cluster has not yet been enabled to auto-start. Auto-start will be enabled after initial testings showing the cluster is properly configured.

[root@node2 ~]# pcs cluster start

5.4. Install resource-agents-sap on both node1 and node2

[root]# yum install resource-agents-sap

5.5. Configure cluster resources for shared filesystems

Configure shared filesystem to provide following mount points on all of cluster nodes.

/sapmnt
/usr/sap/trans
/usr/sap/RH2/SYS

5.5.1. Configure shared filesystems managed by the cluster

The cloned Filesystem cluster resource can be used to mount the shares from external NFS server on all cluster nodes as shown below.
NOTE: The '--clone' option works in RHEL 7 but it does not work in RHEL 8. Hence, for RHEL 8, the current way forward to create a clone resource, is to create the resource without the '--clone' and then run 'pcs resource clone' to create the clone.

For RHEL 7

# pcs resource create rh2_fs_sapmnt Filesystem device='fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/sapmnt' directory='/sapmnt' fstype='nfs4' options='nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport' --clone interleave=true
# pcs resource create rh2_fs_sap_trans Filesystem device='fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/trans' directory='/usr/sap/trans' fstype='nfs4' options='nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport' --clone interleave=true
# pcs resource create rh2_fs_sys Filesystem device='fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/RH2/SYS' directory='/usr/sap/RH2/SYS' fstype='nfs4' options='nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport' --clone interleave=true

For RHEL 8

# pcs resource create rh2_fs_sapmnt Filesystem device='fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/sapmnt' directory='/sapmnt' fstype='nfs4' options='nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport'
# pcs resource clone rh2_fs_sapmnt interleave=true
# pcs resource create rh2_fs_sap_trans Filesystem device='fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/trans' directory='/usr/sap/trans' fstype='nfs4' options='nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport'
# pcs resource clone rh2_fs_sap_trans interleave=true
# pcs resource create rh2_fs_sys Filesystem device='fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/RH2/SYS' directory='/usr/sap/RH2/SYS' fstype='nfs4' options='nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport'
# pcs resource clone rh2_fs_sys interleave=true

After creating the Filesystem resources verify that they have started properly on all nodes.

[root]# pcs status
... 
Clone Set: rh2_fs_sapmnt-clone [rh2_fs_sapmnt]
    Started: [ node1 node2 ]
Clone Set: rh2_fs_sap_trans-clone [rh2_fs_sap_trans]
    Started: [ node1 node2 ]
Clone Set: rh2_fs_sys-clone [rh2_fs_sys]
    Started: [ node1 node2 ]
...

5.5.2. Configure shared filesystems managed outside of cluster

In case that shared filesystems will NOT be managed by cluster, it must be ensured that they are available before the pacemaker service is started.

In RHEL 7 due to systemd parallelization you must ensure that shared filesystems are started in resource-agents-deps target. More details on this can be found in documentation section 9.6. Configuring Startup Order for Resource Dependencies not Managed by Pacemaker (Red Hat Enterprise Linux 7.4 and later).

5.6. Configure ASCS resource group

5.6.1. Create resource for virtual IP address

# pcs resource create rh2_vip_ascs20 aws-vpc-move-ip ip=192.168.200.101 interface=eth0 routing_table=rtb-9dd99ee2 --group rh2_ASCS20_group

Note: Earlier versions of this document included the monapi=true option in the command above. This was a workaround for a bug in the probe operation that has since been fixed. However, setting monapi=true can result in unnecessary failovers due to external factors such as API throttling. For this reason, Red Hat and Amazon do not recommend setting monapi=true. Please ensure that the latest available version of the resource-agents package for your OS minor release is installed, so that the bug fix is included.

5.6.2. Create resource for ASCS filesystem

# pcs resource create s4h_fs_ascs20 Filesystem device='fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/RHEL-S4-HA/S4H/ASCS20' directory=/usr/sap/S4H/ASCS20 fstype='nfs4' options='nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport' force_unmount=safe --group s4h_ASCS20_group

5.6.3. Create resource for ASCS instance

# pcs resource create rh2_ascs20 SAPInstance InstanceName="RH2_ASCS20_rhascs" \
  START_PROFILE=/sapmnt/RH2/profile/RH2_ASCS20_rhascs \
  AUTOMATIC_RECOVER=false \
  meta resource-stickiness=5000 migration-threshold=1 failure-timeout=60 \
  --group rh2_ASCS20_group \
  op monitor interval=20 on-fail=restart timeout=60 \
  op start interval=0 timeout=600 \
  op stop interval=0 timeout=600

Note: meta resource-stickiness=5000 is here to balance out the failover constraint with ERS so the resource stays on the node where it started and doesn't migrate around cluster uncontrollably.

Add a resource stickiness to the group to ensure that the ASCS will stay on a node if possible:

# pcs resource meta rh2_ASCS20_group resource-stickiness=3000

5.7. Configure ERS resource group

5.7.1. Create resource for virtual IP address

# pcs resource create rh2_vip_ers29 aws-vpc-move-ip ip=192.168.200.102 interface=eth0 routing_table=rtb-9dd99ee2 --group rh2_ERS29_group

Note: Earlier versions of this document included the monapi=true option in the command above. This was a workaround for a bug in the probe operation that has since been fixed. However, setting monapi=true can result in unnecessary failovers due to external factors such as API throttling. For this reason, Red Hat and Amazon do not recommend setting monapi=true. Please ensure that the latest available version of the resource-agents package for your OS minor release is installed, so that the bug fix is included.

5.7.2. Create resource for ERS filesystem

# pcs resource create s4h_fs_ers29 Filesystem device='fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/RHEL-S4-HA/S4H/ERS29' directory='/usr/sap/S4H/ERS29' fstype='nfs4' options='nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport' force_unmount=safe --group s4h_ERS29_group

5.7.3. Create resource for ERS instance

Create ERS instance cluster resource. Note: In ENSA2 deployments the IS_ERS attribute is optional. To learn more about IS_ERS, additional information can be found in How does the IS_ERS attribute work on a SAP NetWeaver cluster with Standalone Enqueue Server (ENSA1 and ENSA2)?.

# pcs resource create rh2_ers29 SAPInstance InstanceName="RH2_ERS29_rhers" \
  START_PROFILE=/sapmnt/RH2/profile/RH2_ERS29_rhers \
  AUTOMATIC_RECOVER=false \
  IS_ERS=true \
  --group rh2_ERS29_group \
  op monitor interval=20 on-fail=restart timeout=60 \
  op start interval=0 timeout=600 \
  op stop interval=0 timeout=600

5.8. Create constraints

5.8.1. Create colocation constraint for ASCS and ERS resource groups

Resource groups rh2_ASCS20_group and rh2_ERS29_group should try to avoid running on same node. Order of groups matters.

# pcs constraint colocation add rh2_ERS29_group with rh2_ASCS20_group -5000

5.8.2. Create location constraint for ASCS resource

ASCS20 instance rh2_ascs20 prefers to run on node where ERS was running before when failover is happening

# pcs constraint location rh2_ascs20 rule score=2000 runs_ers_RH2 eq 1

5.8.3. Create order constraint for ASCS and ERS resource groups

Prefer to start rh2_ASCS20_group before the rh2_ERS29_group (optionally)

# pcs constraint order start rh2_ASCS20_group then stop rh2_ERS29_group symmetrical=false kind=Optional

6. Test the cluster configuration

6.1. Check the constraints

# pcs constraint
Location Constraints:
  Resource: rh2_ascs20
    Constraint: location-rh2_ascs20
      Rule: score=2000
Expression: runs_ers_RH2 eq 1
Ordering Constraints:
  start rh2_ASCS20_group then stop rh2_ERS29_group (kind:Optional) (non-symmetrical)
Colocation Constraints:
  rh2_ERS29_group with rh2_ASCS20_group (score:-5000)
Ticket Constraints:

6.2. Failover ASCS due to node crash

Before the crash, ASCS is running on node1 while ERS running on node2.

# pcs status
... 
Resource Group: rh2_ASCS20_group
    rh2_vip_ascs20  (ocf::heartbeat:aws-vpc-move-ip):   Started node1
    rh2_fs_ascs20   (ocf::heartbeat:Filesystem):    Started node1
    rh2_ascs20  (ocf::heartbeat:SAPInstance):   Started node1
Resource Group: rh2_ERS29_group
    rh2_vip_ers29   (ocf::heartbeat:aws-vpc-move-ip):   Started node2
    rh2_fs_ers29    (ocf::heartbeat:Filesystem):    Started node2
    rh2_ers29   (ocf::heartbeat:SAPInstance):   Started node2
...

On node2, run the following command to monitor the status changes in the cluster:

[root@node2 ~]# crm_mon -Arf

Crash node1 by running the following command. Please note that connection to node1 will be lost after the command.

[root@node1 ~]# echo c > /proc/sysrq-trigger

On node2, monitor the failover process. After failover, cluster should be in such state, with ASCS and ERS both on node2.

[root@node2 ~]# pcs status
...
Resource Group: rh2_ASCS20_group
    rh2_fs_ascs20   (ocf::heartbeat:Filesystem):    Started node2
    rh2_ascs20  (ocf::heartbeat:SAPInstance):   Started node2
    rh2_vip_ascs20  (ocf::heartbeat:aws-vpc-move-ip):   Started node2
Resource Group: rh2_ERS29_group
    rh2_fs_ers29    (ocf::heartbeat:Filesystem):    Started node2
    rh2_vip_ers29   (ocf::heartbeat:aws-vpc-move-ip):   Started node2
    rh2_ers29   (ocf::heartbeat:SAPInstance):   Started node2
...

6.3. ERS moves to the previously failed node

Bring node1 back online, and start the cluster on node1:

[root@node1 ~]# pcs cluster start

ERS should move to node1, while ASCS remaining on node2. Wait for ERS to finish the migration, and at the end the cluster should be in such state:

[root@node1 ~]# pcs status
...
Resource Group: rh2_ASCS20_group
    rh2_fs_ascs20   (ocf::heartbeat:Filesystem):    Started node2
    rh2_ascs20  (ocf::heartbeat:SAPInstance):   Started node2
    rh2_vip_ascs20  (ocf::heartbeat:aws-vpc-move-ip):   Started node2
Resource Group: rh2_ERS29_group
    rh2_fs_ers29    (ocf::heartbeat:Filesystem):    Started node1
    rh2_vip_ers29   (ocf::heartbeat:aws-vpc-move-ip):   Started node1
    rh2_ers29   (ocf::heartbeat:SAPInstance):   Started node1
...

7. Enable cluster to auto-start after reboot

The cluster is not yet enabled to auto-start after reboot. System admin needs to manually start the cluster after the node is fenced and rebooted.

After testing the previous section, when everything works fine, enable the cluster to auto-start after reboot:

# pcs cluster enable --all

Note: in some situations it can be beneficial not to have the cluster auto-start after a node has been rebooted. For example, if there is an issue with a filesystem that is required by a cluster resource, and the filesystem needs to be repaired first before it can be used again, having the cluster auto-start but then fail because the filesystem doesn't work can cause even more trouble.

Now please rerun the tests in previous section to make sure that cluster still works fine. Please note that in section 6.3., there is no need to run command pcs cluster start after a node is rebooted. Cluster should automatically start after reboot.

Comments