Time of VM migration when hypervisor fails

Latest response

Good day.

 

Can some one help me with this test situation:

 

I have 2 hosts for hypervisors with power management via ipmilan in one cluster and I'm using NFS storage. On this cluster I've started 1 virtual machine. For example, now it is running on first hypervisor.

 

After that in testing purpose I've crushed first hypervisor like this:

ssh root@hypervisor1

cat /dev/zero > /dev/mem

 

(But his power management module is still working).

 

After ~3 minutes RHEV Manager sad, that "Invalid status on Data Center Default. Setting Data Center status to Non-Responcive (On host hypervisor1, Error: Network error during communication with the Host.)."

 

Then after additional ~6 minutes RHEV Manager sad "Host hypervisor1 is non-responcive" and at last started migration of my VM to another hypervisor.

 

So, it's about 10 minutes before my VM come back online!

 

My question is: Is there any possibility to change some config-files for faster reaction in this situation? Can I configure my RHEV Manager to migrate VM in 20-30 seconds in this situation?

 

Thank you.

 

P.S. Sorry for bad English :(

Responses

Hi Ivan,

 

Yes, the rhevm-config tool provides several options that can shorten the timeout time, however, before you change these settings, you need to test them under your specific loads. We don't want to fence a host that is perfectly fine, just because a switch had a hard reset and it's non-responsive all of a sudden, so longer timeouts make sure we do not kill hosts without a good reason

 

BTW, I don't see any bad English :)

 

 

Cheers,

Dan

Thank you for fast reply. In my real system all lan connections, including lan switches, are doubled. I'll try your advice as soon as possible.