Fencing problem on Cluster Suite 3.0.12: fenced throws agent error when invoking fence_xvm
I am setting up a highly available cluster of KVM guests. I have setup fence_virtd on the hosts and I was able to manually fence correctly using fence_xvm and and fence_node. The problem is that fencing fails when triggered automatically by rgmanager and fenced. Rgmanager detects when a node is down and triggers fenced, but fenced throws an ambiguous "agent error". This probably means that fence_xvm returned error status, but I so far I cannot make fence_xvm spit it's debug output anywhere. All I know is that fence_virtd, which listens for fence_xvm multicast requests, does not report any fencing requests even while running in super verbose debug output mode.
My cluster.conf is here: http://pastebin.com/5vY3kNqB
I have configured debug logging for fenced in cluster.conf and /etc/sysconfig/cman
Fenced log below:
May 06 11:22:40 fenced cluster node 1 removed seq 356
May 06 11:22:40 fenced fenced:daemon conf 1 0 1 memb 2 join left 1
May 06 11:22:40 fenced fenced:daemon ring 2:356 1 memb 2
May 06 11:22:40 fenced fenced:default conf 1 0 1 memb 2 join left 1
May 06 11:22:40 fenced add_change cg 4 remove nodeid 1 reason 3
May 06 11:22:40 fenced add_change cg 4 m 1 j 0 r 1 f 1
May 06 11:22:40 fenced add_victims node 1
May 06 11:22:40 fenced check_ringid cluster 356 cpg 1:352
May 06 11:22:40 fenced fenced:default ring 2:356 1 memb 2
May 06 11:22:40 fenced check_ringid done cluster 356 cpg 2:356
May 06 11:22:40 fenced check_quorum done
May 06 11:22:40 fenced send_start 2:4 flags 2 started 3 m 1 j 0 r 1 f 1
May 06 11:22:40 fenced receive_start 2:4 len 152
May 06 11:22:40 fenced match_change 2:4 matches cg 4
May 06 11:22:40 fenced wait_messages cg 4 got all 1
May 06 11:22:40 fenced set_master from 2 to complete node 2
May 06 11:22:40 fenced prhin01-vm01 not a cluster member after 0 sec post_fail_delay
May 06 11:22:40 fenced fencing node prhin01-vm01
May 06 11:22:40 fenced fence prhin01-vm01 dev 0.0 agent fence_xvm result: error from agent
May 06 11:22:40 fenced fence prhin01-vm01 failed
May 06 11:22:43 fenced fencing node prhin01-vm01
May 06 11:22:43 fenced fence prhin01-vm01 dev 0.0 agent fence_xvm result: error from agent
May 06 11:22:43 fenced fence prhin01-vm01 failed
May 06 11:22:46 fenced fencing node prhin01-vm01
May 06 11:22:46 fenced fence prhin01-vm01 dev 0.0 agent fence_xvm result: error from agent
May 06 11:22:46 fenced fence prhin01-vm01 failed
Responses
Hi Janet,
You are right that the error you are seeing is caused by the agent itself returning a non-zero status, but its unclear to me from that log output why it is failing. Its good that fence_xvm and fence_node both seem to work when run manually, as that indicates your cluster.conf configuration is correct.
Something about the way you are triggering the node failure seems like it would be responsible for the issue you are seeing. How are you testing this scenario? Do you manually destroy the VM (for instance, with 'virsh destroy')? Or are you powering down the host its running on? Or using some other method?
Regards,
John Ruemker, RHCA
Red Hat Technical Account Manager
Online User Groups Moderator
Hello Janet,
Need your Help !
I have also setting up HA environment on KVM guest OS. I am having two dell servers on which I've installed RHEL 6.2 as a base OS. On each base OS I've installed seperate guest OS (RHEL 6.2) using KVM. I would like to create a cluster on these guest OS(KVM), for this I am trying to achieve fencing of these cluster nodes. I have gone through redhat docs for fence_xvm and fence_virt, I need your help to understand the basic idea to accomplish this.
Thanks.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
