Virtualization lags and hypervisor overcommitment

Updated -

Disclaimer

This article is written in a very general way, avoiding technicalities, in order to be easily digestible by any reader. Therefore it may contain statements that might not be properly technically correct, however, are still valid to some metaphorical extent. Implementation of virtualization technologies can vary widely; this article aims to discuss implications of host overcommittment where the virtualization implementation uses collections of processes or threads on the host to represent a virtual machine. As such, this article applies to KVM and may be applicable to other virtualization technologies like VMware, Hyper-V, etc but not others such as PowerPC. If this level of information is unknown for the target host, please consult the host's respective documentation.

TABLE OF CONTENTS

Virtual machine from kernel perspective

When a virtual machine [VM] runs, from the perspective of the host's kernel it's just a bunch of processes. Most of the virtualization context is handled by the hypervisor software.

Since all work is eventually being done by the processor, it is ultimately the job of the VM's CPUs (virtual CPUs [vCPUs]) to process everything the VM does.

How virtual CPUs work

Usually, each vCPU is represented by a process (or thread) on the host machine. Hence the vCPU is processing it's work (executing), only-and-exactly when the respective vCPU process is running on the host's CPU (physical CPU [pCPU]).

When the vCPU process is scheduled to run, the pCPU actually replaces content of it's registers and switches context to the vCPU's last point of execution (saved state). Then, when the vCPU needs to be rescheduled to let other process run on the host machine, the vCPU's execution is interrupted, it's state, context and registers saved, and the original pCPU's registers and context restored.
You can imagine it as being just a more complex context switch (compared to context switch when kernel switches between userspace tasks).

This way, the processor hardware simply keeps operating the same, without differentiating between virtual and physical CPU (simply said: the execution continues the same, just the numbers have changed).

What is a virtual CPU lag

Since the vCPU is a process, and a process can be rescheduled to let other process run, it creates a downtime for the vCPU's execution. Hence anything and everything that the vCPU is doing gets virtually paused. Note though that from perspective of the VM, the vCPU is running without pauses, i.e. it doesn't know that it has been paused (rescheduled).

The vCPU process rescheduling creates a timeframe, when the vCPU thinks it's running (from VM's perspective), but in reality it is paused. During this timeframe all work specific for that particular virtual CPU (including, but not limited to, timekeeping) is also paused - starting to lag behind. Therefore, when the vCPU process is again scheduled to run, the virtual CPU experiences a time-jump and it needs to catch up - namely triggering all the now-expired timers.

This event of a vCPU being rescheduled for some time and then catching up after being scheduled to run again is a prime example of a virtualization lag.

Consequences of vCPU lags

Generally, there is no harm for the vCPU after a rescheduling time-jump - the vCPU normally continues to operate and the workload is progressing (after it handles all the now-expired timers and other potential catch-up events ofc.). Although, all such lags can be noticeable in statistics, for example a single read operation suddenly took much longer time than other such reads for no obvious reason (time increased by the time the vCPU process was not running).

Small lags commonly go by unnoticed by anyone (though it's good to keep them in mind as a trade-off of running in virtualized environment). Longer lags however, can create noticeable disruptions resulting in degraded performance.

Further implications depend on actual applications (including the kernel itself) running in the VM, whether and how they are sensitive to or can handle such time-jumps.

Causes of vCPU lags

So a vCPU lag is caused by the vCPU process being rescheduled or otherwise interrupted - not being the actively executed process on the particular pCPU. Therefore any-and-all things that can cause a process to get stalled from executing can cause a vCPU lag.

A large category of reasons, commonly referenced as hypervisor overcommitment, are a quite common cause of vCPU lags. These are situations, when the hypervisor starts to have problems satisfying resource demands (namely memory and CPU time) of active VMs running on that hypervisor.

Generally you can imagine it as work overload - the host machine (or specifically it's CPUs) having more work to do than it can efficiently handle. While server machines don't tend to have too much problems processing occasional load spikes, virtualized environments can suffer hugely from such overloads.

Good practices to avoid vCPU lags

Hypervisors' software tend to have algorithms to mitigate and try to resolve such overcommit situations (for example memory ballooning), however, they are not able to completely prevent them. Also please note, that different hypervisors (KVM / VMWare ESXi / Hyper-V / ...) may implement such mitigations, as well as other algorithms, differently.

The best way to avoid overcommit situation is actually to mindfully design and provision your virtual environments, namely in regards to CPU and Memory resources:

  • Number of CPUs of all running VMs (sum) should not be greater than number of available CPUs on the host machine. (check also Appendix: Hyperthreading) (*)
  • Configuring the vCPUs to run on dedicated pCPUs (i.e. pinning) greatly reduces chance of a lag (*).
  • Consider how much memory all running VMs have configured - if the sum is greater than total available memory on the host, then you are risking overcommit situation (**).
  • If you have more NUMA nodes, mind the VMs' configuration:
    • Either a VM should fit into a single NUMA node (not have more memory than a single node);
    • Or configure proper NUMA setup (preferably 1:1) also for the VM, in order for the VM's kernel to be aware of the topology and manage the memory adequately.
  • Leaving some CPUs (ex. 1 per NUMA node) for the host to handle non-VM related processes is also a good practice.

(*) Allowing more vCPUs than total pCPUs effectively creates a situation when 2 or more vCPUs need to share a single pCPU's time. While this is a feature of virtualization in itself, it is a bit inconvenient in that the vCPUs don't know about the fact, hence are just experiencing the vCPU lag time jumps, making in-VM scheduling and time management less efficient. Hence it depends on the needs of the environment whether or not a CPU resource overcommitment is or not desirable.
(**) Note that some options (memory overcommit and vCPU pinning) should be considered as trade-offs, because they are removing certain attributes of virtualization - overcommitting memory can be useful feature and having the vCPU processes to be able to migrate between pCPUs may increase throughput and flexibility of the environment.

Ultimately, everything depends on the actual use-case/workload that you are using/building the virtual environment for.

Appendix

Hyperthreading

Note that logical CPUs, that are hyperthreads of physical processor cores, are (in terms of performance) not well suited for running guest VMs' vCPU processes. This is due to how hyperthreading works - the hyperhreads share the same processor core's computation power. Therefore in the situation, when a single processor core would run vCPU processes on each of it's hyperthreads, the performance of those virtual CPUs could be considerably degraded.
While this isn't particularly related to vCPU lags, it is a noteworthy point to consider when designing virtual environments.

Common issues attributed to vCPU lag

Comments