Why do we need fault tolerance in KVM?
Just like in traditional server fault-tolerance situations, there is a need for RHEL/KVM virtualization to be fault tolerant so that virtual machines are not dropped. Virtualization in a data-center environment demands fault tolerance. Ideally, an application should be fault tolerant and protect its data in the event of a failure (versus, for example, a 1TB server being redundant and replicating data). However, the reality is that there are a lot of legacy applications that we have to protect that are not fault tolerant. Our recommendation for new application development would be to author fault-tolerance applications.
What are we considering?
We are evaluating Kemari Fault Tolerance in RHEL/KVM to create fault-tolerant virtualized environments.
How does it work?
The goal of Kemari is to provide a fault-tolerant platform for virtualization environments so that in the event of a hardware failure, a virtual machine will fail over from compromised to properly operating hardware (a physical machine) in a way that is completely transparent to the guest operating system. In contrast to hardware-based fault-tolerant servers and HA servers, by abstracting hardware using virtualization, Kemari can be used on off-the-shelf hardware with no application modifications.
Kemari runs paired virtual machines in an active-passive configuration and achieves whole-system replication by continuously copying the state of the system (dirty pages and the state of the virtual devices) from the active node to the passive node. One interesting result of this is that during normal operation, only the active node is actually executing code.
Do you consider virtualization fault-tolerance to be necessary, or do you believe applications need to be fault tolerant?