RHEV datacenter down after SAN export domain failure
Hi,
We run a RHEV datacenter setup with 4 hypervisors and about 50 VM for a IP Telephony test plant.
Today i was working on taking our old RHEV 2.2 setup down, and by accident i set the old rhev2
export domain volume offline on the SAN, and was not aware that it was attached in my running
RHEV 3.0 setup from when i imported the VM's from 2.2.
The result was a complete outage of the whole RHEV 3.0 datacenter for 2 hours.
When i set the export domain online again the datacenter went up after a while,
but nearly all VM's was down or in migrating state they did not came out of again.
After some time i restarted the 4 hosts one by one and started the VM's manually.
I know this was my fault, but i dont understand why a failure on the export domain
can cause so much trouble. I was not exporting or importing anything at that time.
I mostly write this to share my experience, it can maybe help Redhat to make a more hardened product.
I opened support case 00739729 on this issue.
Has any of you guys seen something like this ??
thanks,
Peter Calum