How can I prevent my RHEL High Availability cluster from repeatedly failing to fence a node while the fence device is not accessible?
Issue
- When a node of the cluster loses power at the same time its fencing device loses power too, cluster services do not failover and/or GFS filesystems are locked.
- When the network goes down fencing of the other node fails and everything locks up.
Nov 19 12:55:50 node1 fenced[2080]: fencing node node2 still retrying
Nov 19 13:26:16 node1 fenced[2080]: fencing node node2 still retrying
Nov 19 13:56:42 node1 fenced[2080]: fencing node node2 still retrying
- How can I ensure I have enough redundancy in my fence configuration to avoid the cluster blocking if there is a network problem?
- What is the optimal network configuration for fence devices?
- How to configure backup fencing method?
Environment
- Red Hat Cluster Suite (RHCS) 4
- Red Hat Enterprise Linux (RHEL) 5, 6, 7 or 8 with the High Availability Add On
- One or more fence/stonith devices using an agent which communicates with the device over the network
fence_scsi
,fence_kdump
, andfence_virt
are examples of agents that do not use the network
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.