RHEV datacenter down after SAN export domain failure
Hi,
We run a RHEV datacenter setup with 4 hypervisors and about 50 VM for a IP Telephony test plant.
Today i was working on taking our old RHEV 2.2 setup down, and by accident i set the old rhev2
export domain volume offline on the SAN, and was not aware that it was attached in my running
RHEV 3.0 setup from when i imported the VM's from 2.2.
The result was a complete outage of the whole RHEV 3.0 datacenter for 2 hours.
When i set the export domain online again the datacenter went up after a while,
but nearly all VM's was down or in migrating state they did not came out of again.
After some time i restarted the 4 hosts one by one and started the VM's manually.
I know this was my fault, but i dont understand why a failure on the export domain
can cause so much trouble. I was not exporting or importing anything at that time.
I mostly write this to share my experience, it can maybe help Redhat to make a more hardened product.
I opened support case 00739729 on this issue.
Has any of you guys seen something like this ??
thanks,
Peter Calum
Responses
If this is true, a bug need to be filed to get it fixed. You are true that the an entire DC should not go down just because a lun used for attached export domain is not reachable.
We wil investigate this thorugh the case you opened and take appropriate action to reproduce and file a bugzilla with Engineering.
Hi Peter,
Thanks for following up with the solution, I'm glad Sadique was able to to help you resolve this. Just for future reference - I'd encourage you to communicate the solution "in your own words" if you'd like to share it with the Groups, rather than copy/pasting directly from correspondence in a support case. Thanks! :)
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
