fence_virt example with RHEL7 HA addon ?
Hi...
I'm trying to get a handle on RHEL 7 HA configuration. To that end, I have created two RHEL 7 RC2 VMs on a RHEL 7 RC2 host.
I followed the (skimpy) documentation in the RHEL 7 HA Admin Guide and setup a cluster.
[root@rhel7-n2 log]# pcs status
Cluster name: my_cluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Thu May 15 14:45:57 2014
Last change: Thu May 15 14:45:51 2014 via cibadmin on rhel7-n2.wtec
Stack: corosync
Current DC: rhel7-n1.wtec (2) - partition with quorum
Version: 1.1.10-29.el7-368c726
2 Nodes configured
0 Resources configured
Online: [ rhel7-n1.wtec rhel7-n2.wtec ]
Full list of resources:
PCSD Status:
rhel7-n2.wtec: Online
rhel7-n1.wtec: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
So far, so good, but I have been stuck at trying to create a fence with fence_virt.
This appears to need configuration on the host, from upstream pacemaker docs:
http://clusterlabs.org/wiki/Guest_Fencing#For_Guests_Running_on_a_Single_Host
My host can 'see' the guests and the key propagate to each of them:
[root@bl460g8-tux cluster]# fence_xvm -o list
rhel7-n1 01014c50-b680-4f92-86ac-4e40366abae4 on
rhel7-n2 0a9febda-1597-4926-99b8-d948ed7899d2 on
However I configure fence_virt, it always fails:
[root@rhel7-n2 ~]# pcs stonith create killme fence_virt
[root@rhel7-n2 ~]# pcs status
Cluster name: my_cluster
Last updated: Thu May 15 14:48:36 2014
Last change: Thu May 15 14:48:06 2014 via cibadmin on rhel7-n2.wtec
Stack: corosync
Current DC: rhel7-n1.wtec (2) - partition with quorum
Version: 1.1.10-29.el7-368c726
2 Nodes configured
1 Resources configured
Online: [ rhel7-n1.wtec rhel7-n2.wtec ]
Full list of resources:
killme (stonith:fence_virt): Stopped
Failed actions:
killme_start_0 on rhel7-n1.wtec 'unknown error' (1): call=41, status=Error, last-rc-change='Thu May 15 14:48:06 2014', queued=7014ms, exec=0ms
killme_start_0 on rhel7-n2.wtec 'unknown error' (1): call=41, status=Error, last-rc-change='Thu May 15 14:48:14 2014', queued=7012ms, exec=0ms
PCSD Status:
rhel7-n2.wtec: Online
rhel7-n1.wtec: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
I've tried both fence_virt and fence_xvm.
Neither works for me.
Any thoughts? Is this supposed to work in RHEL 7 RC2 ?
Thanks,
Rick
Responses
From one of our ClusterHA developers:
From what I can see it looks like the nodes were configured with pcs using rhel7-n2.wtec & rhel7-n1.wtec and fence_xvm/fence_virt sees the nodes as rhel7-n2 & rhel7-n1. Because of this mis-match fence_xvm may not be able to fence rhel7-n2.wtec because it thinks it's a different machine than rhel7-n2.
He should also be able to look in /var/log/messages and by searching for "killme" (the name of the fence device), we should see an error message detailing exactly what is causing it to fail to start. Those messages would show up around 14:48:06 & 14:48:14 on May 15th.
Is it possible for him to re-create and re-test the cluster with pcs using rhel7-n1/rhel7-n2 instead of with the rhel7-n1.wtec/rhel7-n2.wtec?
Rick,
I have another question: You talk about RHEL 7 RC2, how did you get it?
About your issue: Like Andrius states keeping your node names consistent is very important.
Switch from hostnames to FQDN and back causes HA configurations to break.
This issue was know for RH Cluster Suite 4 already.
Kind regards,
Jan Gerrit Kootstra
Richard,
Might want to check out these two KBASEs as well:
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
