SAP HANA false-positive monitoring operation result

Solution In Progress - Updated -

Issue

There are corner cases where default value of 60s for monitoring operation of SAP HANA resource is not optimal because it can lead to false-positive results because monitoring operation (which has 60s timeout) relies on output of systemReplicationStatus.py with a shell timeout value of 60. However, if the cluster nodes did not communicate with each other (HANA System Replication network cut by network issue - or firewall when simulating it), the python script waited for 128 seconds and returned a result even if the shell has a timeout value less than 128 seconds. In this case, replication status could not be correctly checked. The replication status was reported SOK so failover was allowed. However there was actually already inconsistency in replica and result should have been SFAIL.

Environment

  • Red Hat Enterprise Linux 7.5 and earlier
  • Pacemaker cluster with SAP HANA resource

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content