How do heartbeat communication and heartbeat failure detection work in Red Hat Cluster Suite?

Latest response

The Red Hat cluster / high availability support group often gets asked for an explanation of how cluster heartbeat functions, generally in conjunction with incidents where a failure on the heartbeat network has lead to a hang or a node getting rebooted by fencing.

The following Knowledgebase Solution is in "Work in Progress" state:

The purpose of this discussion is to solicit input from the user community on that reference.  My aim is to make the information there as consumable as possible.  We can use this thread to clear up questions and then push improvements to the solution. Please reply here if:

  • The information is not clear or is confusing.
  • You would like it to answer a question it doesn't already.
  • You'd like to see a keyword or phrase added to the "Issue" section to make this solution hit on your access.redhat.com search for it.

Regards,

-Trap

Responses