High Availability cluster node using RRP is fenced after ring1 interface fails
Issue
- The interface hosting the RRP (Redundant Ring Protocol)
ring1_addr
on a node fails, and the node is fenced after a few minutes due to stonith device stop failure. - The fenced node shows that the stop operation completed successfully. The node that requests the fencing shows that the stop operation timed out.
Apr 22 11:17:05 node1 crmd[3226]: warning: Timer popped (timeout=20000, abort_level=1000000, complete=false)
Apr 22 11:17:05 node1 crmd[3226]: error: [Action 5]: In-flight rsc op fence_node2_ipmi_stop_0 on node2 (priority: 0, waiting: none)
Apr 22 11:17:05 node1 crmd[3226]: warning: rsc_op 5: fence_node2_ipmi_stop_0 on node2 timed out
...
Apr 22 11:17:35 node1 crmd[3226]: error: Update 73 FAILED: Timer expired
Apr 22 11:17:35 node1 pengine[3225]: warning: Processing failed monitor of fence_node1_ipmi on node2: unknown error
Apr 22 11:17:35 node1 pengine[3225]: warning: Processing failed monitor of fence_node2_ipmi on node2: unknown error
Apr 22 11:17:35 node1 pengine[3225]: warning: Processing failed monitor of bond1-monitor:1 on node2: unknown error
...
Apr 22 11:17:55 node1 pengine[3225]: warning: Processing failed monitor of fence_node1_ipmi on node2: unknown error
Apr 22 11:17:55 node1 pengine[3225]: warning: Processing failed stop of fence_node2_ipmi on node2: unknown error
Apr 22 11:17:55 node1 pengine[3225]: warning: Cluster node node2 will be fenced: fence_node2_ipmi failed there
Apr 22 11:14:38 node2 ethmonitor(bond1-monitor)[1143]: WARNING: Monitoring of bond1-monitor failed, 2 retries left.
Apr 22 11:14:38 node2 crmd[64315]: error: Result of monitor operation for fence_node2_ipmi on node2: Timed Out
Apr 22 11:15:08 node2 crmd[64315]: warning: Resource update 45 failed: (rc=-62) Timer expired
Apr 22 11:15:14 node2 crmd[64315]: error: Result of monitor operation for fence_node1_ipmi on node2: Timed Out
Apr 22 11:15:44 node2 crmd[64315]: warning: Resource update 46 failed: (rc=-62) Timer expired
...
Apr 22 11:16:51 node2 crmd[64315]: notice: Result of stop operation for fence_node2_ipmi on node2: 0 (ok)
...
Apr 22 11:17:21 node2 crmd[64315]: warning: Resource update 48 failed: (rc=-62) Timer expired
Apr 22 11:17:21 node2 crmd[64315]: warning: Resource update 49 failed: (rc=-62) Timer expired
...
Apr 22 11:17:39 node2 corosync[64302]: [TOTEM ] Marking ringid 1 interface 192.168.23.22 FAULTY
...
Apr 22 11:17:48 node2 crmd[64315]: notice: Result of stop operation for fence_node2_ipmi on node2: 0 (ok)
Apr 22 11:17:54 node2 crmd[64315]: notice: Result of stop operation for fence_node1_ipmi on node2: 0 (ok)
...
Apr 22 11:18:25 node2 systemd-logind: Power key pressed.
Environment
- Red Hat Enterprise Linux 7 (with the High Availability Add-on)
- Redundant Ring Protocol (RRP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.