Why can't I ping the IP address for my cluster service even though it shows as started?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
  • Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
  • rgmanager
  • One or more global resources in the <resources/> section of /etc/cluster/cluster.conf that are referenced from multiple services, or don't match the references specified in the service section
  • Conga

Issue

  • A service containing just an ip ref="xxx.xxx.xxx.xxx" that doesn't point to any backing global ip resource will start successfully without actually binding the specified ip address to an interface.
  • cman_tool version -r validates this configuration with no errors
  • When attempting to manipulate a service via luci, the following error is received:

ERROR 500 We're sorry but we weren't able to process this request.

Resolution

  • Define a global IP Resource for your cluster that matches the ip ref="xxx.xxx.xxx.xxx"/ defined in your service
    OR
  • Define a local IP Resource directly in the service to replace the ip ref="xxx.xxx.xxx.xxx"/ defined in your service

Root Cause

It is possible to specify an Ip Reference ( ip ref="xxx.xxx.xxx.xxx"/ ) for a service without creating a global resource to be referenced. The ip ref="xxx.xxx.xxx.xxx"/ entry is meant to point at a globally defined IP Resource, but it will not generate an error in rgmanager if that globally defined resource doesn't exist, which allows a service to start, even though no IP Address is being bound.

If you attempt to manipulate the service via Luci, you may experience an error when Luci can't find the global resource:

ERROR 500
We're sorry but we weren't able to process this request.
  • This issue is related to bug #1128877

Diagnostic Steps

  • Check the cluster.conf for an ip reference that does not have a matching global resource:
<?xml version="1.0"?>
<cluster config_version="12" name="cluster">
        <clusternodes>
                <clusternode name="node1" nodeid="1"/>
                <clusternode name="node2" nodeid="2"/>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
        <rm>
                <resources/>
                <service autostart="0" name="ip_test" recovery="relocate">
                        <ip ref="10.13.212.250"/>
                </service>
        </rm>
</cluster>
  • Check the cluster.conf for an ip reference or global resource that may have a typographical error preventing it from matching:
<?xml version="1.0"?>
<cluster config_version="12" name="cluster">
        <clusternodes>
                <clusternode name="node1" nodeid="1"/>
                <clusternode name="node2" nodeid="2"/>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
        <rm>
                <resources>
                        <ip address="10.13.212.252" sleeptime="10"/>
               </resources>
                <service autostart="0" name="ip_test" recovery="relocate">
                        <ip ref="10.13.212.250"/>
                </service>
        </rm>
</cluster>

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.