High Availability cluster node using RRP is fenced after ring1 interface fails

Solution In Progress - Updated -

Issue

  • The interface hosting the RRP (Redundant Ring Protocol) ring1_addr on a node fails, and the node is fenced after a few minutes due to stonith device stop failure.
  • The fenced node shows that the stop operation completed successfully. The node that requests the fencing shows that the stop operation timed out.
Apr 22 11:17:05 node1 crmd[3226]: warning: Timer popped (timeout=20000, abort_level=1000000, complete=false)
Apr 22 11:17:05 node1 crmd[3226]:   error: [Action    5]: In-flight rsc op fence_node2_ipmi_stop_0           on node2 (priority: 0, waiting: none)
Apr 22 11:17:05 node1 crmd[3226]: warning: rsc_op 5: fence_node2_ipmi_stop_0 on node2 timed out
...
Apr 22 11:17:35 node1 crmd[3226]:   error: Update 73 FAILED: Timer expired
Apr 22 11:17:35 node1 pengine[3225]: warning: Processing failed monitor of fence_node1_ipmi on node2: unknown error
Apr 22 11:17:35 node1 pengine[3225]: warning: Processing failed monitor of fence_node2_ipmi on node2: unknown error
Apr 22 11:17:35 node1 pengine[3225]: warning: Processing failed monitor of bond1-monitor:1 on node2: unknown error
...
Apr 22 11:17:55 node1 pengine[3225]: warning: Processing failed monitor of fence_node1_ipmi on node2: unknown error
Apr 22 11:17:55 node1 pengine[3225]: warning: Processing failed stop of fence_node2_ipmi on node2: unknown error
Apr 22 11:17:55 node1 pengine[3225]: warning: Cluster node node2 will be fenced: fence_node2_ipmi failed there
Apr 22 11:14:38 node2 ethmonitor(bond1-monitor)[1143]: WARNING: Monitoring of bond1-monitor failed, 2 retries left.
Apr 22 11:14:38 node2 crmd[64315]:   error: Result of monitor operation for fence_node2_ipmi on node2: Timed Out
Apr 22 11:15:08 node2 crmd[64315]: warning: Resource update 45 failed: (rc=-62) Timer expired
Apr 22 11:15:14 node2 crmd[64315]:   error: Result of monitor operation for fence_node1_ipmi on node2: Timed Out
Apr 22 11:15:44 node2 crmd[64315]: warning: Resource update 46 failed: (rc=-62) Timer expired
...
Apr 22 11:16:51 node2 crmd[64315]:  notice: Result of stop operation for fence_node2_ipmi on node2: 0 (ok)
...
Apr 22 11:17:21 node2 crmd[64315]: warning: Resource update 48 failed: (rc=-62) Timer expired
Apr 22 11:17:21 node2 crmd[64315]: warning: Resource update 49 failed: (rc=-62) Timer expired
...
Apr 22 11:17:39 node2 corosync[64302]:  [TOTEM ] Marking ringid 1 interface 192.168.23.22 FAULTY
...
Apr 22 11:17:48 node2 crmd[64315]:  notice: Result of stop operation for fence_node2_ipmi on node2: 0 (ok)
Apr 22 11:17:54 node2 crmd[64315]:  notice: Result of stop operation for fence_node1_ipmi on node2: 0 (ok)
...
Apr 22 11:18:25 node2 systemd-logind: Power key pressed.

Environment

  • Red Hat Enterprise Linux 7 (with the High Availability Add-on)
  • Redundant Ring Protocol (RRP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content