REHV HA
Hi,
I'm looking at a doing a pilot in my company Pitney Bowes of RHEV. I've had it set up in my lab for a few weeks and really like it. We are a global VMware site, but in general we use VMs for engineering... not a lot of customer facing mission critical services... but saying that we do use HA and even as bullet proof as VMware say it is because it does not rely on the VCenter server but on elected nodes, I have had problems with it restarting VMs.
So, from what I have read, it sounds like RHEV is reliant on the RHEV manager to restart VMs. So it is a single point of failure, and if you ran RHEV manager as a VM on your cluster and the node it was on failed, it wouldn't get restarted. Just wondering if this is still the case? And if so are there plans to make the HA process more resilient?
As I said I have been reading, but not sure if what I am reading is totally up to date.
Thanks for your input!
Bill
Responses
HI Bill - you have touched on a hotly debated topic of comparing RHEV to "other" solutions ;-) I'm not currently as fluent with RHEV's capabilty as I once was... but I can still talk to these topics (I think).
Q: Is there DRS for RHEV? Correction: SRM for RHEV... not DRS. (it's been a while and I'm getting old I guess)
A: Sort of. Here is a white-paper discussing the approach for doing a site-to-site replication (and recovery) of RHEV
https://access.redhat.com/sites/default/files/attachments/2012_rhev_3_dr_0.pdf
Q: Can you make the RHEV-Manager Highly-Available?
A: Absolutely - You will find numerous posts about RHEV and clustering (which is not possible with VMware - I believe)
https://access.redhat.com/articles/216973
Q: Can VMs be managed while RHEV Manager is not available?
A: Certainly not as easy as using a native client (like another solution), but I believe it is possible. Some folks had asked whether RHEV-Manager could be setup as a guest on one of the Hypervisors (which leads me to believe that it is possible)
https://access.redhat.com/discussions/680473
I was previously much better at comparing the 2 products and I am likely not doing RHEV justice as they have made signficant improvements since 3.0/3.1. If your shop is engineering focused and relies on process, I believe you could match the capabilities of the products fairly well (i.e. you will need time to test DR/incident strategies and develop an SOP for them - as opposed to setup a product and rely on it to respond for you).
So - a few bits of advice:
* be sure that when you present RHEV, that you accurately convey what it can do. I have been at a shop where it was sold as something it wasn't - then when it came time to actually use some features that were promised (by our own folks) and RHEV could not deliver, the reputation of RHEV was a bit tarnished (and quite unfairly).
* I would try to avoid the discussion "product X can do this, what can RHEV do?" - even as part of the sales cycle, the discussion was generally useless. I instead tried to focus on what the customer needed to see whether RHEV could meet that need.
RHEV is a great product and can "hold it's own" - and I'm glad that it competes with the other vendors in that space.. it makes for a better product. It's not always the right solution for every customer/situation.
EDIT: I wasn't implying that you would do any of the things I had mentioned. It's tough to respond in a hypothetical way without using "you" or "me/I" ;-)
Hello William
Q: Is there DRS for RHEV?
If you are looking for a full failover solution for yor RHEV environment, things are a little simpler now than this document suggests. We have successfully deployed RHEV on replicated FC based storage using IBM Metromirror and designed it such that we can "failover" to our standby datacentre in the event of a catastrophic failure in our primary. We had the help of RH services to design, test and gained support approval for our environment.
There were a couple of changes required upstream to account for the change of LUN WWPN (which are now in mainline 3.4) , our RHEV Managers sit on physical tin as opposed to VM and its an "all or nothing failover", but it fits our requirement.
I'd be happy to pass on any detail.
Hi all
This is a very interesting discussion.
I'm in the middle of setting 18 servers up based on RHEV 3.4 and using mirrored IBM SAN (IBM SVC). Some of them also with local storage.
Not without a few moments of frustration though. But seems now I got it done so it's acceptable (not perfect) HA wise. Learned a lot which is great.
Using FC is in my understanding a bit different than iscsi HA-wise. Seems that iscsi works right out of the box while FC needs some manual work in case you loose all FC on one Host. Have to calculate all kind of scenarios in, right. So that's what I've been testing today.
This is why I'm also very interested in knowing if Red Hat is working on improving the HA feature when using FC.
I'm not using like a primary and a secondary standby site. I have for the FC setup 2 active sites. Is this what you have been testing?
So far what I have seen using FC, I wouldn't run a self-hosted RHEV-M. So our RHEV-M is running on VMware. Also we're using RHEV-H and I remember I once read that self-hosted RHEV-M is not possible on such.
Hi Richard
Ok, thanks for info.
Figured out that loosing SAN for our environment caused the VM's to go 'paused' to protect themselves from data corruption. That was a bit of a bummer...or what do you call it :-)
However did a lot of tests and figured out I can manually shutdown the paused VM and migrate afterwards.
So not as great as if one uses iscsi where failover is completely automatically, but for us still an acceptable solution.
Looking very much forward to see what improvement we will see in future releases.
But thanks for the inspiration regarding the rhev-m.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
