Clustered IP resource fails to recover after active node is fenced using a storage method in RHEL

Solution Verified - Updated -

Issue

  • When the node running our cluster service is evicted and fenced following a loss of access to the storage, and the cluster is configured with a storage fence method (fence_brocade, fence_scsi, etc), the IP resource in the service fails to recover on another node due to an IP collision:
Aug 27 11:40:49 node1 clurgmgrd: [5052]: <info> Adding IPv4 address 192.168.143.8/24 to bond2
Aug 27 11:40:49 node1 clurgmgrd: [5052]: <debug> Pinging addr 192.168.143.8 from dev bond2
Aug 27 11:40:49 node1 clurgmgrd: [5052]: <err> IPv4 address collision 192.168.143.8
Aug 27 11:40:49 node1 clurgmgrd[5052]: <notice> start on ip "192.168.143.8" returned 1 (generic error)
Aug 27 11:40:49 node1 clurgmgrd[5052]: <warning> #68: Failed to start service:test; return value: 1
Nov 11 18:19:27 node1 rgmanager[8429]: [ip] Adding IPv4 address 192.168.2.88/24 to bond0
Nov 11 18:19:27 node1 rgmanager[8475]: [ip] IPv4 address collision 192.168.2.88
Nov 11 18:19:27 node1 rgmanager[6412]: start on ip "192.168.2.88" returned 1 (generic error)
Nov 11 18:19:27 node1 rgmanager[6412]: #68: Failed to start service:myservice; return value: 1
  • Cluster resource operations (start, stop, status) may become blocked in the resource agent waiting for some operation (like I/O, or a network operation, etc) to complete. If the node is evicted and removed from the cluster during this time, it will not do an emergency shutdown of its services until that stuck operation has completed, but the other node in the cluster will attempt to recover that service as soon as the fencing is successful. If that service contains an IP, it will fail to start because the first node still has that IP up.
  • Node 2 lost its connection to node 1 attempted to fence it using fence_ILO which it thought was successful however it was not. Node 1 was still up. Node2 attempted to bring service online however failed.

Environment

  • Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
  • Storage-level fencing configured (fence_scsi, fence_brocade, or a similar method)
  • One or more services containing an ip resource

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content