RHEL 6 Cluster Configuation problem

Latest response

Hi All,

We have 2-node cluster running RHEL 6 but we encounter cluster configuration error.

Error:

Jun 25 02:34:48 host2 fenced[4031]: fence host2 dev 0.0 agent fence_bladecenter result: error from agent
Jun 25 02:34:48 host2 fenced[4031]: fence host2-host failed

If I add:


Invalid configuration

[root@host2 cluster]# cman_tool version -r
Relax-NG validity error : Extra element fence in interleave
tempfile:5: element fence: Relax-NG validity error : Element clusternode failed to validate content
tempfile:4: element clusternode: Relax-NG validity error : Element clusternodes has extra content: clusternode
Configuration fails to validate
cman_tool: Not reloading, configuration is not valid

cluster.conf

<?xml version="1.0"?>
<cluster config_version="73" name="prod_cluster">
        <clusternodes>
                <clusternode name="host1" nodeid="1">
                        <fence>
                                <method name="Flex_CMM">
                                        <device name="Flex_CMM" secure="on"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="host2" nodeid="2">
                        <fence>
                                <method name="Flex_CMM">
                                        <device name="Flex_CMM" secure="on"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_bladecenter" ipaddr="192.168.1.211" login="USERID" name="Flex_CMM" passwd="PASSW0RD" power_wait="3"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="host1" nofailback="1" ordered="0" restricted="0">
                                <failoverdomainnode name="host1"/>
                                <failoverdomainnode name="host2"/>
                        </failoverdomain>
                        <failoverdomain name="host2" nofailback="1" ordered="0" restricted="0">
                                <failoverdomainnode name="host1"/>
                                <failoverdomainnode name="host2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.1.1.1/24" sleeptime="10"/>
                        <ip address="10.1.1.2/24" sleeptime="10"/>
                        <lvm name="lvm_vg_proddb_arch" vg_name="vg_proddb_arch"/>
                        <lvm name="lvm_vg_proddb" vg_name="vg_proddb"/>
                        <lvm name="lvm_vg_prodapp" vg_name="vg_prodapp"/>
                        <fs device="/dev/vg_proddb/lv_dbprd_tech_st" fsid="23166" fstype="ext4" mountpoint="/oracle/prd/db/tech_st" name="fs_dbprd_tech_st"/>
                        <fs device="/dev/vg_proddb/lv_dbprd_data" fsid="23924" fstype="ext4" mountpoint="/oracle/prd/db/apps_st/data" name="fs_dbprd_data"/>
                        <fs device="/dev/vg_proddb_arch/lv_dbprd_arch" fsid="35933" fstype="ext4" mountpoint="/oracle/prd/db/apps_st/arch" name="fs_dbprd_arch"/>
                        <fs device="/dev/vg_prodapp/lv_appprd" fsid="43364" fstype="ext4" mountpoint="/oracle/prd/apps" name="fs_appprd"/>
                        <fs device="/dev/vg_proddb/lv_dbprd" fsid="16033" fstype="ext4" mountpoint="/oracle/prd/db" name="fs_dbprd"/>
                        <script file="/script/db.script" name="db_script"/>
                        <script file="/script/app.script" name="app_script"/>
                </resources>
                <service domain="domain" name="service_proddb" recovery="relocate">
                        <ip ref="10.1.1.2/24"/>
                        <lvm ref="lvm_vg_proddb"/>
                        <lvm ref="lvm_vg_proddb_arch"/>
                        <fs ref="fs_dbprd"/>
                        <fs ref="fs_dbprd_tech_st"/>
                        <fs ref="fs_dbprd_data"/>
                        <fs ref="fs_dbprd_arch"/>
                        <script ref="db_script"/>
                </service>
                <service domain="domain" name="service_prodapp" recovery="relocate">
                        <ip ref="10.1.1.1/24"/>
                </service>
        </rm>
</cluster>

Appreciate all you can help to solve problem, since we can't specify blade number and can't fence failed node.

Regards,
Thomas

Responses

Hi Thomas - first off, these forums are typically populated by an Army of folks volunteering their time - you may see Red Hat folks will monitor and respond as well on occasion. Therefore, if this is truly important, I recommend opening a case with Red Hat.

Was this working and now failed, or is this a new(er) setup? (judging by the cluster version, I guess no.. this is not a new setup;-)
Does your blade center have a single IPMI interface and then the fencing daemon is smart enough to know which blade to manage? (in my own environment, each blade has it's own IPMI IP - which I test with ipmitool). The reason(s) I ask: my cluster config references 2 devices - and I don't see any reference in your cluster config that could indicate which device should be fenced. (this might totally be my lack of understanding with how the Blade Center works).

Here is what I have in my cluster config:

<?xml version="1.0"?>
<cluster alias="dbagfs" config_version="6" name="dbagfs">
    <fence_daemon clean_start="0" post_fail_delay="3" post_join_delay="90"/>
    <clusternodes>
        <clusternode name="dba04-PRI.corp.company.com" nodeid="1" votes="1">
                <multicast addr="239.253.136.6"/>
            <fence>
                <method name="1">
                    <device name="dba04.mgmt.company.com"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="dba03-PRI.corp.company.com" nodeid="2" votes="1">
                <multicast addr="239.253.136.6"/>
            <fence>
                <method name="1">
                    <device name="dba03.mgmt.company.com"/>
                </method>
            </fence>
        </clusternode>
    </clusternodes>
    <cman expected_votes="1" two_node="1" broadcast="yes">
        <multicast addr="239.253.136.6"/>
    </cman>
    <totem token="33000"/>
    <fencedevices>
        <fencedevice agent="fence_ipmilan" lanplus="1" power_wait="5" ipaddr="1.1.1.187" login="root" name="dba03.mgmt.company.com" passwd="redacted" privlvl="ADMINISTRATOR"/>
        <fencedevice agent="fence_ipmilan" lanplus="1" power_wait="5" ipaddr="1.1.1.184" login="root" name="dba04.mgmt.company.com" passwd="redacted" privlvl="ADMINISTRATOR"/>
    </fencedevices>
    <rm>
        <failoverdomains/>
        <resources/>
    </rm>
</cluster>

NOTE: On this cluster it is simply there to provide the Cluster for a GFS mount.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.