clustered ip resource fails to start with 'IPv4 address collision' when using <dlm enable_fencing="0"/> in RHEL 6
Issue
- I have a service running on my Node2 of the cluster and on rebooting the Node2 service switch over to Node 1 (which is expected) once the Node2 comes back online Node1 gets fenced and reboots and service hangs.
- When a node boots up, it does not see the other node and proceeds to post-join fence it. While waiting for fencing to complete, it begins starting services, and gets an IP collision:
Sep 27 14:32:06 node2 rgmanager[4457]: Starting stopped service service:IP
Sep 27 14:32:06 node2 rgmanager[5356]: [ip] Adding IPv4 address 10.1.2.3/24 to eth0
Sep 27 14:32:07 node2 rgmanager[5402]: [ip] IPv4 address collision 10.1.2.3
Sep 27 14:32:07 node2 rgmanager[4457]: start on ip "10.1.2.3/24" returned 1 (generic error)
Sep 27 14:32:07 node2 rgmanager[4457]: #68: Failed to start service:IP; return value: 1
- When a node gets fenced, the other node recovers the service before fencing completes and this results in an IP collision
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
<dlm enable_fencing="0"/>set in/etc/cluster/cluster.conf
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
