Red Hat Enterprise Linux HA Solution for SAP HANA Scale Out and System Replication

Solution Verified - Updated -

Resolution

As SAP HANA takes on a central function as the primary database platform for SAP landscapes, requirements for stability and reliability increase dramatically. Red Hat Enterprise Linux (RHEL) for SAP Solutions meets those requirements by enhancing native SAP HANA replication and failover technology to automate the takeover process. During a failover in a scale-out SAP HANA system replication deployment, a system administrator must manually instruct the application to perform a takeover to the secondary environment in case there is an issue in the primary environment.

This solution is for experienced Linux Administrators and SAP Certified Technology Associates. The solution contains planning and deployment information for SAP HANA scale out with system replication, as well as information on Pacemaker integration with RHEL 7.

For more information, see Red Hat Enterprise Linux HA Solution for SAP HANA Scale Out and System Replication.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

8 Comments

Some feedback for this documentation:

  • it would be nice to have link for this from article Where I can find documentation for SAP products on RHEL and other Red Hat products?
  • is there HTML version of the attached PDF?
  • I consider it a quite bad practice to recommend at this time to create pacemaker resources as primitives to just convert them in next step to Clone or Master/Slave. Why? Because people forget and because it can be done in single command that either succeeds or fails.
  • Would it be possible (and maybe better) to use quorum network device instead of 'majoritymaker' node that in the end is not running any resources? (using quorum network device would remove need for constraints that avoids that node for HANA resources as quorum device just adds vote and doesn't run any cluster resources)
  • page64 (top) cdtrace is most probably missing space and should be cd trace, right?
  • in general it is a bit hard to follow in the PDF on what is 'command' and what is 'command output' - I would suggest using something like root@dc1hana01# cd trace for example so it is clear 1. which user, 2. which system, 3. what command should be used. Now both commands and output are just a mix. - In fact this was used in some places where I can see things like rh1adm@dc2hana01:/usr/sap/RH1/HDB10>. But consistency through document would be better than just "some places".
  • on consistency note: it looks like the commands are mostly in blue color but not always (page62,page64 as examples). It is confusing on what is what (command vs output)
  • mentioning all 'op xxx' options for resource agents is quite daunting to rewrite while the pcs itself will use the defaults unless other are specified. removing them from examples would make commands shorter and easier to read. Instead of pcs resource create rsc_SAPHana_RH1_HDB10 SAPHanaController \ SID=RH1 \ InstanceNumber=10 \ PREFER_SITE_TAKEOVER=true \ DUPLICATE_PRIMARY_TIMEOUT=7200 \ AUTOMATED_REGISTER=true \ op start interval=0 timeout=3600 \ op stop interval=0 timeout=3600 \ op promote interval=0 timeout=3600 \ op monitor interval=60 role="Master" timeout=700 \ op monitor interval=61 role="Slave" timeout=700 you can then get to something like pcs resource create rsc_SAPHana_RH1_HDB10 SAPHanaController \ SID=RH1 \ InstanceNumber=10 \ PREFER_SITE_TAKEOVER=true \ DUPLICATE_PRIMARY_TIMEOUT=7200 \ AUTOMATED_REGISTER=true

In overall Thank you for the documentation and please consider the above as feedback for improvement to make it even better.

Hi Ondrej,

Thank you for your feedback; it is very much appreciated! I'm working with the SMEs on a response for each item. We'll add it to this thread as soon as it is ready.

Kind regards, Sharon.

Hi Ondrej,

I've provided a response for each point you raised. If you've any questions, etc., please let us know. Thanks to Markus Moster for providing the technical responses to your feedback.

Kind regards, Sharon.

  • It would be nice to have link for this from article Where I can find documentation for SAP products on RHEL and other Red Hat products?

    • Response: The article you referenced has been updated with a link to this Solution.
  • Is there HTML version of the attached PDF?

    • Response: While it is not available right now, there is a plan to move SAP-related content to the Product Documentation area of the Customer Portal, which supports HTML and PDF.
  • I consider it a quite bad practice to recommend at this time to create pacemaker resources as primitives to just convert them in next step to Clone or Master/Slave. Why? Because people forget and because it can be done in single command that either succeeds or fails.

    • Response: Both steps are necessary. This is a valid option for setting up such resources. More details in https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-multistateresource-haar
  • Would it be possible (and maybe better) to use quorum network device instead of 'majoritymaker' node that in the end is not running any resources? (using quorum network device would remove need for constraints that avoids that node for HANA resources as quorum device just adds vote and doesn't run any cluster resources).

    • Response: Using Quorum devices does not make configuration any easier. (https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-quorumdev-haar) However, additional options may become available in the future. And Red Hat is trying to further simplify configuration.
  • Page64 (top) cdtrace is most probably missing space and should be cd trace, right?

    • Response: No, 'cdtrace' is an alias definded for the adm user, see also https://www.hanatutorials.com/p/hana-log-and-trace-files-path.html
  • In general it is a bit hard to follow in the PDF on what is 'command' and what is 'command output' - I would suggest using something like root@dc1hana01# cd trace for example so it is clear 1. which user, 2. which system, 3. what command should be used. Now both commands and output are just a mix. - In fact this was used in some places where I can see things like rh1adm@dc2hana01:/usr/sap/RH1/HDB10>. But consistency through document would be better than just "some places". - Response: Thanks for pointing this out. We'll take a look at this.

  • On consistency note: it looks like the commands are mostly in blue color but not always (page62,page64 as examples). It is confusing on what is what (command vs output).

    • Response: Thanks for pointing this out. We'll take a look at this.
  • Mentioning all 'op xxx' options for resource agents is quite daunting to rewrite while the pcs itself will use the defaults unless other are specified. removing them from examples would make commands shorter and easier to read. Instead of pcs resource create rsc_SAPHana_RH1_HDB10 SAPHanaController \ SID=RH1 \ InstanceNumber=10 \ PREFER_SITE_TAKEOVER=true \ DUPLICATE_PRIMARY_TIMEOUT=7200 \ AUTOMATED_REGISTER=true \ op start interval=0 timeout=3600 \ op stop interval=0 timeout=3600 \ op promote interval=0 timeout=3600 \ op monitor interval=60 role="Master" timeout=700 \ op monitor interval=61 role="Slave" timeout=700 you can then get to something like pcs resource create rsc_SAPHana_RH1_HDB10 SAPHanaController \ SID=RH1 \ InstanceNumber=10 \ PREFER_SITE_TAKEOVER=true \ DUPLICATE_PRIMARY_TIMEOUT=7200 \ AUTOMATED_REGISTER=true.

    • Response: That the operation timeouts are mentioned in the command is done on purpose, because customers should explicitly use these recommended timeouts given in the command (or even higher ones depending on the environment), and not the defaults defined in the resource agents (since the default timeouts are far too low for most environments).

Hi Sharon,

Thank you and Markus Moster for reply and changes, please find additional clarifications/suggestions below.

  • The added link on page (Where I can find documentation for SAP products on RHEL and other Red Hat products?) links to page+particular comment https://access.redhat.com/solutions/4386601#comment-1698701, changing the link to https://access.redhat.com/solutions/4386601/ might give a better result I think.

  • I consider it a quite bad practice to recommend at this time to create pacemaker resources as primitives to just convert them in next step to Clone or Master/Slave. Why? Because people forget and because it can be done in single command that either succeeds or fails. - Response: Both steps are necessary. This is a valid option for setting up such resources. More details in https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-multistateresource-haar

    • Update: I agree that if you decide to use 'alternative approach' (as mentioned by the documentation page for pacemaker multi-state resource) you have to do both 'create resource' and 'convert it to master/slave resource'. My point here is why to take alternative approach when there is option to 'create master/slave resource' in one step? - pcs resource create resource_id standard:provider:type|type [resource options] master [master_options]. I have seen this in past and from my experience choosing between 'making one step procedure' or 'two step procedure' makes a clear choice for me for option with one step as two-step procedure is prone to errors such as "forgetting to make second step and assuming is might be not needed as there is already some resource".
  • Mentioning all 'op xxx' options for resource agents is quite daunting to rewrite while the pcs itself will use the defaults unless other are specified. removing them from examples would make commands shorter and easier to read. Instead of pcs resource ... (shortened for readability) - Response: That the operation timeouts are mentioned in the command is done on purpose, because customers should explicitly use these recommended timeouts given in the command (or even higher ones depending on the environment), and not the defaults defined in the resource agents (since the default timeouts are far too low for most environments).

    • **Update: ** I wonder why the resource agents for HANA scale-out that I have seen so far mentioned only in this documentation doesn't have "sane defaults" or at least a defaults that can be used in the example documentation like this one. What is the point of having defaults if you cannot use them in any example? :) In other words: would it make sense to adjust defaults in RHEL shipped version of HANA scale out resource agents to match sane defaults recommended for example by this document or tested by Red Hat to be a "good default" for some concrete example on RHEL? If you can provide "sane defaults" and then explain to users/customers on how to adjust them later (pcs resource update ...) I think it would make more sense than current long line that is just fixing the current defaults into configuration so there is no benefit to users when new version of resource agents with potentially changed/improved defaults is provided.
  • page 68 - "Procedure: Starting SAPHanaTopology before SAPHana" - this implies in my mind that we are in SAP HANA scale-up rather than scale-out. Probably this was intended as "Procedure: Starting SAPHanaTopologyScaleOut before SAPHanaController".

Thanks, Ondrej, for highlighing that issue with the link. I wasn't aware. I've updated the link. Kind regards, Sharon.

Hi Ondrej, Apologies for the delay. Responses to your other questions are below.

Your question: I agree that if you decide to use 'alternative approach' (as mentioned by the documentation page for pacemaker multi-state resource) you have to do both 'create resource' and 'convert it to master/slave resource'. My point here is why to take alternative approach when there is option to 'create master/slave resource' in one step? - pcs resource create resource_id standard:provider:type|type [resource options] master [master_options]. I have seen this in past and from my experience choosing between 'making one step procedure' or 'two step procedure' makes a clear choice for me for option with one step as two-step procedure is prone to errors such as "forgetting to make second step and assuming is might be not needed as there is already some resource".

Response: The documented way is tested and supported. Your approach makes sense and we would like to check if we can use it for future documentation. With RHEL8 the pcs syntax is changing a little bit so, you will find an updated version starting with RHEL 8.

Your question: I wonder why the resource agents for HANA scale-out that I have seen so far mentioned only in this documentation doesn't have "sane defaults" or at least a defaults that can be used in the example documentation like this one. What is the point of having defaults if you cannot use them in any example? :) In other words: would it make sense to adjust defaults in RHEL shipped version of HANA scale out resource agents to match sane defaults recommended for example by this document or tested by Red Hat to be a "good default" for some concrete example on RHEL? If you can provide "sane defaults" and then explain to users/customers on how to adjust them later (pcs resource update ...) I think it would make more sense than current long line that is just fixing the current defaults into configuration so there is no benefit to users when new version of resource agents with potentially changed/improved defaults is provided.

Reponse: The SAPHANA Resource Agent is part of an upstream project. In general, all information related to SAP software should use the adm user. In the environment of the adm user, the variable ${DIR_INSTANCE} is set. This is also used by the alias cdtrace:cdtrace='cd $DIR_INSTANCE/$VTHOSTNAME/trace'. There are no defaults for all projects, but work is ongoing to find the right defaults that might be used; this requires adding additional features to simplify and support the pacemaker configuration.

Your question: Page 68 - "Procedure: Starting SAPHanaTopology before SAPHana" - this implies in my mind that we are in SAP HANA scale-up rather than scale-out. Probably this was intended as "Procedure: Starting SAPHanaTopologyScaleOut before SAPHanaController".

Response: The Resource Agent of SAP HANA scale-up and SAP HANA scale-out are very similar. So it is not the same, even if it looks the same.

Hi Sharon, please see notes below to your answers and don't worry about delays, I know that this particular technology takes time to deal with.

Your question: I wonder why the resource agents for HANA scale-out that I have seen so far mentioned only in this documentation doesn't have "sane defaults" or at least a defaults that can be used in the example documentation like this one. What is the point of having defaults if you cannot use them in any example? :) In other words: would it make sense to adjust defaults in RHEL shipped version of HANA scale out resource agents to match sane defaults recommended for example by this document or tested by Red Hat to be a "good default" for some concrete example on RHEL? If you can provide "sane defaults" and then explain to users/customers on how to adjust them later (pcs resource update ...) I think it would make more sense than current long line that is just fixing the current defaults into configuration so there is no benefit to users when new version of resource agents with potentially changed/improved defaults is provided.

Reponse: The SAPHANA Resource Agent is part of an upstream project. In general, all information related to SAP software should use the adm user. In the environment of the adm user, the variable ${DIR_INSTANCE} is set. This is also used by the alias cdtrace:cdtrace='cd $DIR_INSTANCE/$VTHOSTNAME/trace'. There are no defaults for all projects, but work is ongoing to find the right defaults that might be used; this requires adding additional features to simplify and support the pacemaker configuration.

I think you have missed my point here - This is example of default operation values from upstream resource agent that pcs will automatically use (unlike the on non-pcs OS-es where these needs to be specified manually as far as I remember) - https://github.com/SUSE/SAPHanaSR-ScaleOut/blob/eb17b51188c4e300b514257f337c9d5f428e7c7e/SAPHana/ra/SAPHanaController#L271-L283 Specifying these by hand when not needed just feels like missed opportunity for presenting something simply. However I leaving this one now up to you. I just wanted to say that as my first comment to this suggested: running that monster long pcs resource create rsc_SAPHana_RH1_HDB10 ... with or without op start ... part will result in same configuration when checked by pcs resource show rsc_SAPHana_RH1_HDB10.

Your question: Page 68 - "Procedure: Starting SAPHanaTopology before SAPHana" - this implies in my mind that we are in SAP HANA scale-up rather than scale-out. Probably this was intended as "Procedure: Starting SAPHanaTopologyScaleOut before SAPHanaController".

Response: The Resource Agent of SAP HANA scale-up and SAP HANA scale-out are very similar. So it is not the same, even if it looks the same.

Let me rephrase it a bit: I think that thing I have pointed out is a typo - sed 's/Starting SAPHanaTopology before SAPHana/Starting SAPHanaTopologyScaleOut before SAPHanaController/.

There are scenarios in a SAP HANA Scale-Out configuration where each active node has a Virtual IP. Is this supported by Pacemaker for SAP HANA Scale-Out solution?