RHEV Guests high availability

Latest response

Can someone explain  how does RHEV-M  detects that a guest has gone offline  and as a result  has to follow HA  policy ?


As far as I understood   the RHEV  treats every guest as a process - if  it crashes  then a guest should be restarted .


If I  simulate   guest  kernel panic event  ( echo c> /proc/sysrq-trigger)  - nothing happens .But it should be since guest  stops to respond . What is the scope/kinds  of  guest's  events RHEV reacts to ?


Please clarify .


Vladimir Berezovski .



A VM is just a qemu-kvm process, to the host. If the process crashes, kvm will detect that and try to restart the guest.

If a host where the VMs have been running goes down, or loses a critical cluster resource, it will get fenced, and the VMs will be restarted on other hosts.

If the guest's process on the host keeps running, but the guest OS is in a faulty state, RHEV will not necessarily detect that. Otherwise, we would have to initiate HA for guests every time a guest reboots, or gets cleanly shut down.


So in short, the virtualization platform management will monitor everything required to provide a VMs platform, and if the VM is marked HA, once it loses a critical resourse, it will be treated. However, the virtualization platform cannot monitor the Guest OS - this is the job for other tools.