ip resource failed with "Link for <iface>: Not detected" in a RHEL 5 or 6 High Availability cluster with rgmanager
Issue
- There was a partial network outage due to which cluster services went down.
- "kernel: e1000e: eth0 NIC Link is Down" messages are found in the nodes after which the cluster service failed due to non-availability of the IP resource.
- There was a failure of my service following error messages from a network driver and our
ip
resource
Mar 24 23:44:11 nodeA kernel: e1000: eth2: e1000_watchdog_task: NIC Link is Down
Mar 24 23:44:15 nodeA kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Down
Mar 24 23:44:15 nodeA kernel: bonding: bond0: now running without any active interface !
Mar 24 23:44:20 nodeA clurgmgrd: [7851]: <warning> Link for bond0: Not detected
Mar 24 23:44:20 nodeA clurgmgrd: [7851]: <warning> No link on bond0...
Mar 24 23:44:20 nodeA clurgmgrd[7851]: <notice> Stopping service cluster_service
Mar 24 23:44:20 nodeA clurgmgrd: [7851]: <info> Executing /path/to/service stop
Mar 24 23:44:21 nodeA clurgmgrd: [7851]: <info> Removing IPv4 address XXX.XX.XX.XX from bond0
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
rgmanager
- One or more
<ip/>
resources configured in/etc/cluster/cluster.conf
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.