High Availability of RHEV-M
Do we have any solution for failover of RHEV Manager for the HA for the Management GUI interface. If yes, kindly share some document regarding the same.
And if the RHEV manager goes down, how the virtual machines will be failed over or load balanced in case of hypervisor failure, if it happens. Because as per my understanding the manager has inbulit postgres database which contains the infomration of all the vms on the local database but in case manager is down, how the hypervisor will have the information about vm's status and load balancing policy.
Kindly explain or share some document regarding how to achieve the HA for RHEV-M.
Thanks,
Ashish
Responses
There are two ways to achieve high availability of RHEV-M.
Short term:
1 - Use RHCS to run a highly available virtual machine on top of two RHEL6 physical hardware using KVM.
See http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s1-virt_machine_resources-ccs-CA.html
2 - Use two physical machines and make services used by RHEVM cluster aware for failover using RHCS. A tech brief will be launched with instruction on this soon.
Long term:
An RFE has been filed to make RHEV-M machine running on any of the Hypervisors itself. If the hypervisor that runs RHEVM goes down, the RHEVM vm will failover to other hypervisors first.
If RHEVM goes, down failover of highly available vms will not work as well as the load balancing policies.
Hi! Is there any roadmap on when this feature will be available - "to make RHEV-M machine running on any of the Hypervisors itself. If the hypervisor that runs RHEVM goes down, the RHEVM vm will failover to other hypervisors first" ?
Thank you.
Hi Dmitry,
What you are asking about is not a trivial thing to do, because of the way RHEV was initially planned as a set of hypervisors managed by a central host. This means that lots of the HA functionality for VMs resides in the RHEV-M. This is why we don't support self hosted RHEV-M right now - it would be a chicken and egg situation.
We are looking into options of moving the HA functionality into the hosts, but like I said, it's not trivial, and will require some serious design and code changes.
For now, we offer multiple methods of making your RHEV-M HA by clustering the RHEV-M as a service or as a VM containing services in an HA cluster, so that if yo7ur RHEV-M goes down, you have another instance kicking in instead.
For RHEV 3.0, we have a tech brief called "Setting up a RHEV-M on a Highly Available Cluster":
https://access.redhat.com/knowledge/articles/216973
With RHEV 3.1 out, however, I'd recommend waiting for the RHEV 3.1 version of that document to come out. I'm going to be starting on that tech brief next week and hope to finish it the week after. I'll make a comment here when it is available.
Can we provide our input into the RFE?
I have two potential scenarios which would alleviate if not totally negate the requirement for RHEV-M that manges the infrastructure it sits on :
1) Have two RHEV clusters in separate Data Centers, each hosting the other's RHEV-M instance (e.g. we are thinking of putting together dev and prod environments, with the prod RHEV-M on the dev cluster and the dev RHEV-M on the prod cluster). If you are in a DR scenario where everything is down, you need option (2)
2) Have a well-documented and supported way to fire up the RHEV-M VM that sits on top of the virtual infrastructure it manages. I had a look at
Lost connection to RHEV-M. How do I manage my Virtual Machines?
https://access.redhat.com/knowledge/solutions/168593
which does not neary give enough information. Expanding it to include examples of obtaining the VM info and generating a command-line would make this actually useable.
Some thoughts on how the competition handles this - Sadique you may want to take them as input into the RFE: VMware is able to support a virtualised and HA vCenter Server (VCS) sitting on top of the hypervisor because of the following architectural decisions (and in brackets why the RHEV architecural decision works against getting this right):
a) A thick client, which can connect to a standalone hypervisor, discover its resources and avaliable VMs and start the VCS. (Web interface is down when RHEV-M is down. RHEV-M is the only way to start VMs)
b) The whole VM configuration is encapsulated in a config file (RHEV-M stores the VM config with other settings in a DB). Almost like RHEV is being out-Unixed by VMware, as per the "everything is a file" mantra.
c) A way to lock a VM as being in use on a host. If you start a VM on node 1 while directly connected to it with the vSphere client, you will get an error message when trying to start (or edit the VM config) on node 2. (on RHEV as per the above KB article, you have to make 100% sure that your VM is not running on another node before starting it up, which gets gets more cumbersome the higher the nodes in your cluster).
This approach ends up taking HA RHEV-M form a "chicken and egg" situation to a "bootstrap" problem. I assume that there are legacy architectural reasons for why RHEV-M is in this situation, but I would like to believe that they are solvable (unless of course the long-term plan is to replace or complement it with with another product which does not suffer from these limitations.
For our team, having to build a dedicated (albeit small) HA cluster, plus a 3rd server for DR is an extremely unattractive proposition. One option we are looking at is having our RHEV-M hosted on top of the Windows team's VMware environment, which some would consider embarassing, but hopefully only a short to medium-term workaround until Red Hat can give us a proper HA and DR architected RHEV-M which does not demand such a premium in terms of dedicated hardware.
If you use RHEV-M on RHCS, how does this work if you are attempting to host across two Data Centers? In the event you lost one Data Centre, where should your Luci server be located? And how does your storage need to be setup in this scenario? Or does this Clustering option only work within a Data Center?
Would a better solution be to host your RHEV-M as a KVM guest running on a single host, with it's own FC attached storage. You then replicate this at the second Data Center(Server + Storage). Storage is snapped regularly between the two storage systems? Can you quiesce RHEV-M or even the VM itself to create the snapshot? If a Data Center failure occured you could start up the KVM guest running RHEV-M on the dormant KVM Host at the second Data Center. It(the snapshot) maybe a few hours out of date though so I don't know the impact to the Hypervisors or VM's running on them.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
