VM Stuck in an Invalid State
![]()
Hello everyone,
I have encountered a problem that I can't fix, and being a NFR I can't open a support case. One of my VM's went into a non-responding state during a shutdown, and since it can't be stopped or started.
- Putting the whole cluster into maintenance doesn't change anything.
- Using the REST Api to forcibly shutdown (as in Force-Remove) the VM fails because the VM is still in the Running state.
- I can't (and don't want to) simply destroy my cluster or datacenter for various reasons, the technical one being that RHEV-M still thinks one VM is running in the cluster - the faulty one.
Does this problem ring any bells? I could use some help ;).
Responses
Hello Fabrice,
This can happen when the host the VM is running on cannot be checked by RHEV-Manager (it lost network, vdsmd stopped working, the host crashed etc...).
The best thing you can do at this point is make sure all your hosts are "Up", and if that's not the case, see which one was running this VM and take it to maintenance of fence it.
If the host is "Up", and is operational, but the VM is still stuck in this state, it should be easy enough to reset this state, after having made sure that the VM is not actually running anywhere (we don't want to manually create a splitbrain).
So step 1 - check hosts.
Let me know how that goes
Dan
OK, as I understand from the screenshot, the VM has no host showing up, just a faulty status.
This used to be a bug sometime around the beta release and should be resolved now, can you please check what versions of rhev-* you have running? If this is not current, you will need to update to resolve the issue and prevent it from happening in the future.
As an interim solution, try to restart the jbossas service, if this doesn't help, we'll need to change the VM state manually in the database (usually done by scripts provided by senior support techs)
The only thing this helps with is finding the VM UUID without looking for it in the API or the database :)
What we need to do is issue something like
UPDATE vm_dynamic SET status=0 WHERE vm_id='a55532fa-066b-4329-a551-07b1bce6d577'
If I had yout database dump I'd be able to provide the exact script, since the above is from memory, I might have the field or table names wrong
Steps:
ssh to RHEV-M host as root
#service jbossas stop
#psql -U rhevm
rhevm=# update vm_dynamic set status = 0 where vm_guid = 'a55532fa-066b-4329-a551-07b1bce6d577' ;
rhevm=# \q
That's great to hear :) Please don't try such things without a support ticket in the future - this was pure hacking, and you should not have ended up in this state. I'll take your situation up to engineering, to see why the old beta bug can still be encountered.
Can you maybe elaborate on the events that lead to this state of things? So we can recreate this internally
Thanks for that, when you are ready, please let me know, and I'll provide some facility to upload the logs to, or will wget from your location - whatever is more convenient.
As for the solution itself - this sort of thing is rather dangerous, unless you're absolutely sure you know what you're doing in the right context, so again - when you do have access to support tickets, it would be much better to go through support in such cases. Sorry to nag, but I know I would myself start poking around a new and interesting system, being a techie :)
Cheers,
Dan
Just get a log-collector output, and use https://access.redhat.com/knowledge/solutions/61026 to upload. Then provide the file name here
Thanks,
Dan
I've seen this happen before, say where a hypervisor on early 2.2 experienced a kernel panic; or even more recently in 5.8 with a recent KVM bug.
There is one relatively easy fix for when a VM gets stuck in an unknown or invalid state after hypervisor outage, but it hasn't worked every time. The fix is to select the option "Confirm host has been rebooted" after ensuring that the VM is not running anywhere else; and any VMs which were on it will be marked as "down" instead, allowing you to start them again.
Kaerka
After putting the Host that the VM was running on in Maintenance mode, give it a reboot, wait for it to come online again and Activate it. Then from RHEV-M, right-click on the Host and choose "Confirm Host has been rebooted." This can also be done via the API by calling the Manual Fencing action.
This reassures the Manager that the Host indeed had been rebooted, thereby making it realise that the VM could no longer have been active and changing its status to powered off.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
