How can I prevent my RHEL High Availability cluster from repeatedly failing to fence a node while the fence device is not accessible?
Issue
- When a node of the cluster loses power at the same time its fencing device loses power too, cluster services do not failover and/or GFS filesystems are locked.
- When the network goes down fencing of the other node fails and everything locks up.
Nov 19 12:55:50 node1 fenced[2080]: fencing node node2 still retrying
Nov 19 13:26:16 node1 fenced[2080]: fencing node node2 still retrying
Nov 19 13:56:42 node1 fenced[2080]: fencing node node2 still retrying
- How can I ensure I have enough redundancy in my fence configuration to avoid the cluster blocking if there is a network problem?
- What is the optimal network configuration for fence devices?
- How to configure backup fencing method?
Environment
- Red Hat Cluster Suite (RHCS) 4
- Red Hat Enterprise Linux (RHEL) 5, 6, or 7 with the High Availability Add On
- One or more fence/stonith devices using an agent which communicates with the device over the network
fence_scsi,fence_kdump, andfence_virtare examples of agents that do not use the network
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
