clustered ip resource fails to start with 'IPv4 address collision' when using <dlm enable_fencing="0"/> in RHEL 6
Issue
- I have a service running on my Node2 of the cluster and on rebooting the Node2 service switch over to Node 1 (which is expected) once the Node2 comes back online Node1 gets fenced and reboots and service hangs.
- When a node boots up, it does not see the other node and proceeds to post-join fence it. While waiting for fencing to complete, it begins starting services, and gets an IP collision:
Sep 27 14:32:06 node2 rgmanager[4457]: Starting stopped service service:IP
Sep 27 14:32:06 node2 rgmanager[5356]: [ip] Adding IPv4 address 10.1.2.3/24 to eth0
Sep 27 14:32:07 node2 rgmanager[5402]: [ip] IPv4 address collision 10.1.2.3
Sep 27 14:32:07 node2 rgmanager[4457]: start on ip "10.1.2.3/24" returned 1 (generic error)
Sep 27 14:32:07 node2 rgmanager[4457]: #68: Failed to start service:IP; return value: 1
- When a node gets fenced, the other node recovers the service before fencing completes and this results in an IP collision
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
<dlm enable_fencing="0"/>
set in/etc/cluster/cluster.conf
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.