Cluster resources are shifting but in a particular scenario its not shifting,,,,

Latest response

Dear friends.

 

I was testing my cluster. It was working fine. I shutdown my primary server by using commandline, then resources are shifting fine. I was happy. Because it is my first cluster. After few minutes something wild comes in my mind, May be this concept comes in all trechy guys.

 

I just suddenly unplugged the powercable from the server. and it goes down, and found resources are not shifting. :-(. Did eu test this? Just unplugged the power suddenly. So that server goes down immidiately. Resources are not shifitng.

 

Tried a lot. and still trying to solve this scenario. Please suggest,

 

 

 

 

rgds,

Jack

Responses

What sort of fencing have you configured  ?

Just to add to my previous update:

 

When the active cluster node fails, standy by nodes will try to fence the active node before it can take over the cluster service.
If fencing is failed, the service won't be relocated because it has no idea about the actual status of the Active node.

It is important to know what sort of fencing have you configured on the box

For example when Dell Drac is used https://access.redhat.com/knowledge/node/42518

Dear Ranjith,

 

Thanks for your reply, It was HP ILO for fencing.

 

Please read below.

 

"I have a 2 node cluster. My cluster is working fine. The resources are shifting to my secondary node in graceful shutdown process. It is cool.

One day, before going to live. I was testing fail-over policies. During testing something comes in my mind. What I did, I just unplugged the power from my primary node and found resources are not shifting. It was sudden power failure. Resources are not shifting."
 

Yeas I agree with you. But my scenario is not about fencing. My resources are not shifting if I direct unplugged powercable from the primary server. Fencing will not allow any failnode to acces the shared lun. But here my resources are not shifting. If I do a graceful shutdown, resources are shifting fine to another node.

 

Thank you for your reply. and for this given wonderful link.

Rahman,

 

Reason why resources are not shifting/relocating is basically due to the fact that the fence is failing when you unplug the power cable from the active node

 

HP ILO is a hardware based management board.

 

Please read the "Note" section in the below URL

 

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_Fence_Devices/HPiLO_Configuration.html

 

The behavior that you are observing seems to be an expected one when you use hardware based management board

 

When you do a graceful shutdown, relocation is initiated by the active node and fencing is not required

 

Let me know if you have any further questions

 

 

there is a KB that describes this particular scenario

 https://access.redhat.com/knowledge/solutions/42518

Dear Rajinth,

 

Thanks, voted up.  I m getting some clear idea from your explanation. I have already configured HP ILO as suggested in Redhat Cluster Documentation. btw Thank you sir. Would you please suggest What should be the proper configuration I will have to use for fencing nodes on HP servers?

For iLO3 you should use fence_ipmilan, and ensure you have a power_wait="4" (or more) set:

 

  How do I configure a cluster fence device for the HP ILO 3 in RHEL 5 or 6?

  https://access.redhat.com/knowledge/solutions/54453

 

For ilo1 and 2, you can use fence_ipmilan or fence_ilo. 

 

If your fencing is indeed failing, you should see messages saying so in /var/log/messages, and possibly more verbose error messages.  Do you see any? What do they say?

What does your configuration in /etc/cluster/cluster.conf look like (feel free to strip out any passwords or sensitive information before posting)?

 

Thanks,

John Ruemker, RHCA
Software Maintenance Engineer

Global Support Services

Red Hat, Inc.

Dear John,

 

Below is the sample. Note that. This is ILO 4

========================================

<?xml version="1.0"?>
<cluster config_version="12" name="cdbcluster">
        <clusternodes>
                <clusternode name="dcdb-ibs-clusnode1-pv" nodeid="1"/>
                <clusternode name="dcdb-ibs-clusnode2-pv" nodeid="2"/>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ilo_mp" ipaddr="clusnode1-ilo" login="****" name="cdbilo1" passwd="****1234" power_wait="4"/>
                <fencedevice agent="fence_ilo_mp" ipaddr="clusnode2-ilo" login="****" name="cdbilo2" passwd="****1234" power_wait="4"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="dcdb-failover-nodes" nofailback="0" ordered="1" restricted="0">
                                <failoverdomainnode name="dcdb-ibs-clusnode1-pv" priority="1"/>
                                <failoverdomainnode name="dcdb-ibs-clusnode2-pv" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="192.168.224.35/24" monitor_link="on" sleeptime="10"/>
                        <fs device="/dev/mapper/data--vg1-data--vg1--lv0" force_fsck="on" fsid="42023" fstype="ext4" mountpoint="/oradata" name="dcdb-shared-disk"/>
                </resources>
                <service domain="dcdb-failover-nodes" max_restarts="3" name="dcdb-ora-clus-srv" recovery="restart" restart_expire_time="60">
                        <ip ref="192.168.224.35/24"/>
                </service>
        </rm>
</cluster>

======================================

The problem is that even though you've created fencedevice definitions:

 

        <fencedevices>
                <fencedevice agent="fence_ilo_mp" ipaddr="clusnode1-ilo" login="****" name="cdbilo1" passwd="****1234" power_wait="4"/>
                <fencedevice agent="fence_ilo_mp" ipaddr="clusnode2-ilo" login="****" name="cdbilo2" passwd="****1234" power_wait="4"/>
        </fencedevices>

 

You have not assigned them to the nodes:

 

        <clusternodes>
                <clusternode name="dcdb-ibs-clusnode1-pv" nodeid="1"/>
                <clusternode name="dcdb-ibs-clusnode2-pv" nodeid="2"/>
        </clusternodes>

 

Normally it would look something like this:

 

        <clusternodes>
                <clusternode name="dcdb-ibs-clusnode1-pv" nodeid="1">

                       <fence>

                             <method name="1">

                                     <device name="clustnode1-ilo"/>

                             </method>

                      </fence>

                </clusternode>
                <clusternode name="dcdb-ibs-clusnode2-pv" nodeid="2">

                       <fence>

                             <method name="1">

                                     <device name="clustnode2-ilo"/>

                             </method>

                      </fence>

                </clusternode>
        </clusternodes>

 

You can see more specific instructions for assigning fence devices to cluster members here (RHEL 6):

 

  https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Cluster_Administration/index.html#s1-config-member-conga-CA

 

Or here (RHEL 5):

 

  https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/Cluster_Administration/index.html#s1-config-fence-devices-conga-CA

 

Hope this helps.

 

Thanks,

John Ruemker, RHCA

Senior Software Maintenance Engineer

Global Support Services

Red Hat, Inc.

Dear John,

 

Below is my cluster conf. Would like to guid me if I have to change any parameter or any method to add, which will make my cluster conf standard and solving instant failure. Thanks in advanced.

 

+++++++++++++++++++++++++++++++++++++++++++++

<?xml version="1.0"?>
<cluster config_version="23" name="cdbcluster">
        <clusternodes>
                <clusternode name="dcdb-ibs-clusnode1-pv" nodeid="1">
                        <fence>
                                <method name="dcdb-ibs-clusnode1-fnode">
                                        <device name="cdbilo1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="dcdb-ibs-clusnode2-pv" nodeid="2">
                        <fence>
                                <method name="dcdb-ibs-clusnode2-fnode">
                                        <device name="cdbilo2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ilo_mp" ipaddr="clusnode1-ilo" login="****" name="cdbilo1" passwd="********" power_wait="4"/>
                <fencedevice agent="fence_ilo_mp" ipaddr="clusnode2-ilo" login="****" name="cdbilo2" passwd="********" power_wait="4"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="dcdb-failover-nodes" nofailback="0" ordered="1" restricted="0">
                                <failoverdomainnode name="dcdb-ibs-clusnode1-pv" priority="1"/>
                                <failoverdomainnode name="dcdb-ibs-clusnode2-pv" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <fs device="/dev/mapper/data--vg1-data--vg1--lv0" force_fsck="on" fsid="42023" fstype="ext4" mountpoint="/oradata" name="dcdb-shared-disk"/>
                        <ip address="192.168.224.35/27" monitor_link="on" sleeptime="10"/>
                </resources>
                <service domain="dcdb-failover-nodes" name="dcdb-ora-clus-srv" recovery="disable">
                        <ip ref="*******/27">
                                <fs ref="dcdb-shared-disk"/>
                        </ip>
                </service>
        </rm>
</cluster>
 

+++++++++++++++++++++++++++++++++++++++++++++

Your configuration looks ok to me, assuming that fence_ilo_mp is the correct fence agent for your specific hardware.  Are you sure that's the one you need?  If you're using traditional iLOs, you'll just need fence_ilo (iLO 1 or 2) or fence_ipmilan (iLO 1, 2, or 3).  If you're sure fence_ilo_mp is the right agent, then everything else looks fine.

 

I definitely recommend testing your fencing configuration though.  From dcdb-ibs-clusnode1-pv, you can fence dcdb-ibs-clusnode2-pv with:

 

  # fence_node dcdb-ibs-clusnode2-pv

 

And you should see that node power cycle.  When it comes back up, you can test fencing dcdb-ibs-clusnode1-pv from dcdb-ibs-clusnode2-pv:

 

  # fence_node dcdb-ibs-clusnode1-pv

 

If either node did not power cycle during its test, then there is a problem with your configuration that you should correct before deploying the cluster into production.

 

Regards,

John

Dear John,

 

Thank you very much for your guide.

 

I am using HP servers. During implementation I found 2 options. Fence ILO Device and Fence ILO MP. I choose Fence ILO MP. and Its the latest ILO 4.

 

Definately I am going to test as you suggest. Thank you once again sir.

 

Regards,

Shyfur

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.