Failed to connect Host to Storage Pool default

Latest response

I'm having a huge problem after removing a storage domain

 

Basically I moved all of my Virtual Machine's from one storage domain to another, during this process one machine was stuck in the process of moving "Image Locked" for days. I eventually gave up on saving this machine and removed and destroyed the original storage domain. After doing this, my evens log erupted in a storm of errors. My production datacenter is now down, my storage domains are down and I can't start any VM's that were not previously running. However all VM's that were running are still up on a single host.

 

Here are the steps I took to try and resolve this, and the corresponding errors

 

Activating storage domain:

Failed to activate Storage Domain (Data Center Default) by admin

Wrong Master domain or its version

 

Activating Host:

Failed to connect Host to Storage Pool default

 

 

This is repeating on my RHEVM /var/log/rhevm/rhevm.log

 

Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc

mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         324

mMessage                      Wrong Master domain or its version: 'SD=ddce1a5d-2bf9-4caa-841e-154675e1b198, pool=a43b0cf4-3af2-11e1-b9f5-001ec947b583'

 

I've also tried rebooting my RHEVM server and one hypervisor to no avail. Also the SpmStatus is "none" for both hosts

 

any help would be great!

 

 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         324
mMessage                      Wrong Master domain or its version: 'SD=ddce1a5d-2bf9-4caa-841e-154675e1b198, pool=a43b0cf4-3af2-11e1-b9f5-001ec947b583'
 

 

 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         324
mMessage                      Wrong Master domain or its version: 'SD=ddce1a5d-2bf9-4caa-841e-154675e1b198, pool=a43b0cf4-3af2-11e1-b9f5-001ec947b583'

Responses

Hi Tyler, 

 

Since you mentioned this is in production, please open a support case for the issue. Such cases require uploading full log sets and system information for examination, without that it's hardly possible to suggest a solution.

 

We try to help as much as possible here at the User Groups, but production outages should definitely be handled by support, where you will have an SLA and proper attention from all parties.

 

 

A brief look at the log snippet suggests this might be a case of the RHEv-M database holding a certain version of metadata and pointed to a master SD, while this is not the case on the physical storage any longer. This is something that can happen when the "Destroy" option is not used carefully - it simply removes a domain from the database, without touching the actual storage, and should only be used when you remove the LUN manually before you remove it from RHEV, and RHEV is left with no means of seeing the domain any longer.

 

When you open the support case, please provide a full log collector, including the hosts and the database dump, it should be enough to suggest a course of action for recovery.

For the future, if you have a VM stuck in image locked for an unreasonable amount of time, please open a case before doing any additional drastic steps. This sort of situation should not happen, and should be resolved on the spot.

 

 

Hi Tyler,

 

Is there a case opened for this with Red Hat support? If no, please open a case to have your logs analyzed. Provide case number if already done.

Thank you

 

I've opened Case #00595038 and attached the requested logs

https://access.redhat.com/support/cases/00595038