How can I prevent my RHEL High Availability cluster from repeatedly failing to fence a node while the fence device is not accessible?

Solution Verified - Updated -

Issue

  • When a node of the cluster loses power at the same time its fencing device loses power too, cluster services do not failover and/or GFS filesystems are locked.
  • When the network goes down fencing of the other node fails and everything locks up.
Nov 19 12:55:50 node1 fenced[2080]: fencing node node2 still retrying
Nov 19 13:26:16 node1 fenced[2080]: fencing node node2 still retrying
Nov 19 13:56:42 node1 fenced[2080]: fencing node node2 still retrying
  • How can I ensure I have enough redundancy in my fence configuration to avoid the cluster blocking if there is a network problem?
  • What is the optimal network configuration for fence devices?
  • How to configure backup fencing method?

Environment

  • Red Hat Cluster Suite (RHCS) 4
  • Red Hat Enterprise Linux (RHEL) 5, 6, or 7 with the High Availability Add On
  • One or more fence/stonith devices using an agent which communicates with the device over the network
    • fence_scsi, fence_kdump, and fence_virt are examples of agents that do not use the network

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content