Chapter 10. Optimizing virtual machine performance

Virtual machines (VMs) always experience some degree of performance deterioration in comparison to the host. The following sections explain the reasons for this deterioration and provide instructions on how to minimize the performance impact of virtualization in RHEL 8, so that your hardware infrastructure resources can be used as efficiently as possible.

10.1. What influences virtual machine performance

VMs are run as user-space processes on the host. The hypervisor therefore needs to convert the host’s system resources so that the VMs can use them. As a consequence, a portion of the resources is consumed by the conversion, and the VM therefore cannot achieve the same performance efficiency as the host.

The impact of virtualization on system performance

More specific reasons for VM performance loss include:

  • Virtual CPUs (vCPUs) are implemented as threads on the host, handled by the Linux scheduler.
  • VMs do not automatically inherit optimization features, such as NUMA or huge pages, from the host kernel.
  • Disk and network I/O settings of the host might have a significant performance impact on the VM.
  • Network traffic typically travels to a VM through a software-based bridge.
  • Depending on the host devices and their models, there might be significant overhead due to emulation of particular hardware.

The severity of the virtualization impact on the VM performance is influenced by a variety factors, which include:

  • The number of concurrently running VMs.
  • The amount of virtual devices used by each VM.
  • The device types used by the VMs.

Reducing VM performance loss

RHEL 8 provides a number of features you can use to reduce the negative performance effects of virtualization. Notably:

Important

Tuning VM performance can have adverse effects on other virtualization functions. For example, it can make migrating the modified VM more difficult.

10.2. Optimizing virtual machine performance using tuned

The tuned utility is a tuning profile delivery mechanism that adapts RHEL for certain workload characteristics, such as requirements for CPU-intensive tasks or storage-network throughput responsiveness. It provides a number of tuning profiles that are pre-configured to enhance performance and reduce power consumption in a number of specific use cases. You can edit these profiles or create new profiles to create performance solutions tailored to your environment, including virtualized environments.

Red Hat recommends using the following profiles when using virtualization in RHEL 8:

  • For RHEL 8 virtual machines, use the virtual-guest profile. It is based on the generally applicable throughput-performance profile, but also decreases the swappiness of virtual memory.
  • For RHEL 8 virtualization hosts, use the virtual-host profile. This enables more aggressive writeback of dirty memory pages, which benefits the host performance.

Prerequisites

Procedure

To enable a specific tuned profile:

  1. List the available tuned profiles.

    # tuned-adm list
    
    Available profiles:
    - balanced             - General non-specialized tuned profile
    - desktop              - Optimize for the desktop use-case
    [...]
    - virtual-guest        - Optimize for running inside a virtual guest
    - virtual-host         - Optimize for running KVM guests
    Current active profile: balanced
  2. Optional: Create a new tuned profile or edit an existing tuned profile.

    For more information, see Customizing tuned profiles.

  3. Activate a tuned profile.

    # tuned-adm profile selected-profile
    • To optimize a virtualization host, use the virtual-host profile.

      # tuned-adm profile virtual-host
    • On a RHEL guest operating system, use the virtual-guest profile.

      # tuned-adm profile virtual-guest

Additional resources

10.3. Configuring virtual machine memory

To improve the performance of a virtual machine (VM), you can assign additional host RAM to the VM. Similarly, you can decrease the amount of memory allocated to a VM so the host memory can be allocated to other VMs or tasks.

To perform these actions, you can use the web console or the command-line interface.

10.3.1. Adding and removing virtual machine memory using the web console

To improve the performance of a virtual machine (VM) or to free up the host resources it is using, you can use the web console to adjust amount of memory allocated to the VM.

Prerequisites

  • The guest OS must be running the memory balloon drivers. To verify this is the case:

    1. Ensure the VM’s configuration includes the memballoon device:

      # virsh dumpxml testguest | grep memballoon
      <memballoon model='virtio'>
          </memballoon>

      If this commands displays any output and the model is not set to none, the memballoon device is present.

    2. Ensure the ballon drivers are running in the guest OS.

  • Optional: Obtain the information about the maximum memory and currently used memory for a VM. This will serve as a baseline for your changes, and also for verification.

    # virsh dominfo testguest
    Max memory:     2097152 KiB
    Used memory:    2097152 KiB
  • To use the web console to manage VMs, install the web console VM plug-in.

Procedure

  1. In the Virtual Machines interface, click a row with the name of the VMs for which you want to view and adjust the allocated memory.

    The row expands to reveal the Overview pane with basic information about the selected VMs.

  2. Click the value of the Memory line in the Overview pane.

    The Memory Adjustment dialog appears.

    virt memory cockpit
  3. Configure the virtual CPUs for the selected VM.

    • Maximum allocation - Sets the maximum amount of host memory that the VM can use for its processes. Increasing this value improves the performance potential of the VM, and reducing the value lowers the performance footprint the VM has on your host.

      Adjusting maximum memory allocation is only possible on a shut-off VM.

    • Current allocation - Sets a memory limit until the next VM reboot, up to the maximum allocation. You can use this to temporarily regulate the memory load that the VM has on the host, without changing the maximum VM allocation.
  4. Click Save.

    The memory allocation of the VM is adjusted.

Additional resources

10.3.2. Adding and removing virtual machine memory using the command-line interface

To improve the performance of a virtual machine (VM) or to free up the host resources it is using, you can use the CLI to adjust amount of memory allocated to the VM.

Prerequisites

  • The guest OS must be running the memory balloon drivers. To verify this is the case:

    1. Ensure the VM’s configuration includes the memballoon device:

      # virsh dumpxml testguest | grep memballoon
      <memballoon model='virtio'>
          </memballoon>

      If this commands displays any output and the model is not set to none, the memballoon device is present.

    2. Ensure the ballon drivers are running in the guest OS.

  • Optional: Obtain the information about the maximum memory and currently used memory for a VM. This will serve as a baseline for your changes, and also for verification.

    # virsh dominfo testguest
    Max memory:     2097152 KiB
    Used memory:    2097152 KiB

Procedure

  1. Adjust the maximum memory allocated to a VM. Increasing this value improves the performance potential of the VM, and reducing the value lowers the performance footprint the VM has on your host. Note that this change can only be performed on a shut-off VM, so adjusting a running VM requires a reboot to take effect.

    For example, to change the maximum memory that the testguest VM can use to 4096 MiB:

    # virt-xml testguest --edit --memory 4096
    Domain 'testguest' defined successfully.
    Changes will take effect after the domain is fully powered off.
  1. Optional: You can also adjust the memory currently used by the VM, up to the maximum allocation. This regulates the memory load that the VM has on the host until the next reboot, without changing the maximum VM allocation.

    # virsh setmem testguest --current 2048

Verification

  1. Confirm that the memory used by the VM has been updated:

    # virsh dominfo testguest
    Max memory:     4194304 KiB
    Used memory:    2097152 KiB
  2. Optional: If you adjusted the current VM memory, you can obtain the memory balloon statistics of the VM to evaluate how effectively it regulates its memory use.

     # virsh domstats --balloon testguest
    Domain: 'testguest'
      balloon.current=365624
      balloon.maximum=4194304
      balloon.swap_in=0
      balloon.swap_out=0
      balloon.major_fault=306
      balloon.minor_fault=156117
      balloon.unused=3834448
      balloon.available=4035008
      balloon.usable=3746340
      balloon.last-update=1587971682
      balloon.disk_caches=75444
      balloon.hugetlb_pgalloc=0
      balloon.hugetlb_pgfail=0
      balloon.rss=1005456

Additional resources

10.3.3. Additional resources

  • To increase the maximum memory of a running VM, you can attach a memory device to the VM. This is also referred to as memory hot plug. For details, see Attaching devices to virtual machines.

    Note that removing a memory device from a VM, also known as memory hot unplug, is not supported in RHEL 8, and Red Hat highly discourages its use.

10.4. Optimizing virtual machine I/O performance

The input and output (I/O) capabilities of a virtual machine (VM) can significantly limit the VM’s overall efficiency. To address this, you can optimize a VM’s I/O by configuring block I/O parameters.

10.4.1. Tuning block I/O in virtual machines

When multiple block devices are being used by one or more VMs, it might be important to adjust the I/O priority of specific virtual devices by modifying their I/O weights.

Increasing the I/O weight of a device increases its priority for I/O bandwidth, and therefore provides it with more host resources. Similarly, reducing a device’s weight makes it consume less host resources.

Note

Each device’s weight value must be within the 100 to 1000 range. Alternatively, the value can be 0, which removes that device from per-device listings.

Procedure

To display and set a VM’s block I/O parameters:

  1. Display the current <blkio> parameters for a VM:

    # virsh dumpxml VM-name

    <domain>
      [...]
      <blkiotune>
        <weight>800</weight>
        <device>
          <path>/dev/sda</path>
          <weight>1000</weight>
        </device>
        <device>
          <path>/dev/sdb</path>
          <weight>500</weight>
        </device>
      </blkiotune>
      [...]
    </domain>
  2. Edit the I/O weight of a specified device:

    # virsh blkiotune VM-name --device-weights device, I/O-weight

    For example, the following changes the weight of the /dev/sda device in the liftrul VM to 500.

    # virsh blkiotune liftbrul --device-weights /dev/sda, 500

10.4.2. Disk I/O throttling in virtual machines

When several VMs are running simultaneously, they can interfere with system performance by using excessive disk I/O. Disk I/O throttling in KVM virtualization provides the ability to set a limit on disk I/O requests sent from the VMs to the host machine. This can prevent a VM from over-utilizing shared resources and impacting the performance of other VMs.

To enable disk I/O throttling, set a limit on disk I/O requests sent from each block device attached to VMs to the host machine.

Procedure

  1. Use the virsh domblklist command to list the names of all the disk devices on a specified VM.

    # virsh domblklist rollin-coal
    Target     Source
    ------------------------------------------------
    vda        /var/lib/libvirt/images/rollin-coal.qcow2
    sda        -
    sdb        /home/horridly-demanding-processes.iso
  2. Set I/O limits for a block device attached to a VM using the virsh blkdeviotune command:

    # virsh blkdeviotune VM-name device --parameter limit

    For example, to throttle the sdb device on the rollin-coal VM to 1000 I/O operations per second and 50 MB per second throughput:

    # virsh blkdeviotune rollin-coal sdb --total-iops-sec 1000 --total-bytes-sec 52428800

Additional information

  • Disk I/O throttling can be useful in various situations, for example when VMs belonging to different customers are running on the same host, or when quality of service guarantees are given for different VMs. Disk I/O throttling can also be used to simulate slower disks.
  • I/O throttling can be applied independently to each block device attached to a VM and supports limits on throughput and I/O operations.

10.4.3. Enabling multi-queue virtio-scsi

When using virtio-scsi storage devices in your virtual machines (VMs), the multi-queue virtio-scsi feature provides improved storage performance and scalability. It enables each virtual CPU (vCPU) to have a separate queue and interrupt to use without affecting other vCPUs.

Procedure

  • To enable multi-queue virtio-scsi support for a specific VM, add the following to the VM’s XML configuration, where N is the total number of vCPU queues:

    <controller type='scsi' index='0' model='virtio-scsi'>
       <driver queues='N' />
    </controller>

10.5. Optimizing virtual machine CPU performance

Much like physical CPUs in host machines, vCPUs are critical to virtual machine (VM) performance. As a result, optimizing vCPUs can have a significant impact on the resource efficiency of your VMs. To optimize your vCPU:

  1. Adjust how many host CPUs are assigned to the VM. You can do this using the CLI or the web console.
  2. Ensure that the vCPU model is aligned with the CPU model of the host. For example, to set the testguest1 VM to use the CPU model of the host:

    # virt-xml testguest1 --edit --cpu host-model
  3. If your host machine uses Non-Uniform Memory Access (NUMA), you can also configure NUMA for its VMs. This maps the host’s CPU and memory processes onto the CPU and memory processes of the VM as closely as possible. In effect, NUMA tuning provides the vCPU with a more streamlined access to the system memory allocated to the VM, which can improve the vCPU processing effectiveness.

    For details, see Section 10.5.3, “Configuring NUMA in a virtual machine” and Section 10.5.4, “Sample vCPU performance tuning scenario”.

10.5.1. Adding and removing virtual CPUs using the command-line interface

To increase or optimize the CPU performance of a virtual machine (VM), you can add or remove virtual CPUs (vCPUs) assigned to the VM.

When performed on a running VM, this is also referred to as vCPU hot plugging and hot unplugging. However, note that vCPU hot unplug is not supported in RHEL 8, and Red Hat highly discourages its use.

Prerequisites

  • Optional: View the current state of the vCPUs in the targeted VM. For example, to display the number of vCPUs on the testguest VM:

    # virsh vcpucount testguest
    maximum      config         4
    maximum      live           2
    current      config         2
    current      live           1

    This output indicates that testguest is currently using 1 vCPU, and 1 more vCPu can be hot plugged to it to increase the VM’s performance. However, after reboot, the number of vCPUs testguest uses will change to 2, and it will be possible to hot plug 2 more vCPUs.

Procedure

  1. Adjust the maximum number of vCPUs that can be attached to a VM, which takes effect on the VM’s next boot.

    For example, to increase the maximum vCPU count for the testguest VM to 8:

    # virsh setvcpus testguest 8 --maximum --config

    Note that the maximum may be limited by the CPU topology, host hardware, the hypervisor, and other factors.

  2. Adjust the current number of vCPUs attached to a VM, up to the maximum configured in the previous step. For example:

    • To increase the number of vCPUs attached to the running testguest VM to 4:

      # virsh setvcpus testguest 4 --live

      This increases the VM’s performance and host load footprint of testguest until the VM’s next boot.

    • To permanently decrease the number of vCPUs attached to the testguest VM to 1:

      # virsh setvcpus testguest 1 --config

      This decreases the VM’s performance and host load footprint of testguest after the VM’s next boot. However, if needed, additional vCPUs can be hot plugged to the VM to temporarily increase its performance.

Verification

  • Confirm that the current state of vCPU for the VM reflects your changes.

    # virsh vcpucount testguest
    maximum      config         8
    maximum      live           4
    current      config         1
    current      live           4

Additional resources

10.5.2. Managing virtual CPUs using the web console

Using the RHEL 8 web console, you can review and configure virtual CPUs used by virtual machines (VMs) to which the web console is connected.

Prerequisites

Procedure

  1. In the Virtual Machines interface, click a row with the name of the VMs for which you want to view and configure virtual CPU parameters.

    The row expands to reveal the Overview pane with basic information about the selected VMs, including the number of virtual CPUs, and controls for shutting down and deleting the VM.

  2. Click the number of vCPUs in the Overview pane.

    The vCPU details dialog appears.

    cockpit configure vCPUs
    Note

    The warning in the vCPU details dialog only appears after the virtual CPU settings are changed.

  3. Configure the virtual CPUs for the selected VM.

    • vCPU Count - The number of vCPUs currently in use.

      Note

      The vCPU count cannot be greater than the vCPU Maximum.

    • vCPU Maximum - The maximum number of virtual CPUs that can be configured for the VM. If this value is higher than the vCPU Count, additional vCPUs can be attached to the VM.
    • Sockets - The number of sockets to expose to the VM.
    • Cores per socket - The number of cores for each socket to expose to the VM.
    • Threads per core - The number of threads for each core to expose to the VM.

      Note that the Sockets, Cores per socket, and Threads per core options adjust the CPU topology of the VM. This may be beneficial for vCPU performance and may impact the functionality of certain software in the guest OS. If a different setting is not required by your deployment, Red Hat recommends keeping the default values.

  4. Click Apply.

    The virtual CPUs for the VM are configured.

    Note

    Changes to virtual CPU settings only take effect after the VM is restarted.

Additional resources:

10.5.3. Configuring NUMA in a virtual machine

The following methods can be used to configure Non-Uniform Memory Access (NUMA) settings of a virtual machine (VM) on a RHEL 8 host.

Prerequisites

  • The host must be a NUMA-compatible machine. To detect whether this is the case, use the virsh nodeinfo command and see the NUMA cell(s) line:

    # virsh nodeinfo
    CPU model:           x86_64
    CPU(s):              48
    CPU frequency:       1200 MHz
    CPU socket(s):       1
    Core(s) per socket:  12
    Thread(s) per core:  2
    NUMA cell(s):        2
    Memory size:         67012964 KiB

    If the value of the line is 2 or greater, the host is NUMA-compatible.

Procedure

For ease of use, you can set up a VM’s NUMA configuration using automated utilities and services. However, manual NUMA setup is more likely to yield a significant performance improvement.

Automatic methods

  • Set the VM’s NUMA policy to Preferred. For example, to do so for the testguest5 VM:

    # virt-xml testguest5 --edit --vcpus placement=auto
    # virt-xml testguest5 --edit --numatune mode=preferred
  • Enable automatic NUMA balancing on the host:

    # echo 1 > /proc/sys/kernel/numa_balancing
  • Use the numad command to automatically align the VM CPU with memory resources.

    # numad

Manual methods

  1. Pin specific vCPU threads to a specific host CPU or range of CPUs. This is also possible on non-NUMA hosts and VMs, and is recommended as a safe method of vCPU performance improvement.

    For example, the following commands pin vCPU threads 0 to 5 of the testguest6 VM to host CPUs 1, 3, 5, 7, 9, and 11, respectively:

    # virsh vcpupin testguest6 0 1
    # virsh vcpupin testguest6 1 3
    # virsh vcpupin testguest6 2 5
    # virsh vcpupin testguest6 3 7
    # virsh vcpupin testguest6 4 9
    # virsh vcpupin testguest6 5 11

    Afterwards, you can verify whether this was successful:

    # virsh vcpupin testguest6
    VCPU   CPU Affinity
    ----------------------
    0      1
    1      3
    2      5
    3      7
    4      9
    5      11
  2. After pinning vCPU threads, you can also pin QEMU process threads associated with a specified VM to a specific host CPU or range of CPUs. For example, the following commands pin the QEMU process thread of testguest6 to CPUs 13 and 15, and verify this was successful:

    # virsh emulatorpin testguest6 13,15
    # virsh emulatorpin testguest6
    emulator: CPU Affinity
    ----------------------------------
           *: 13,15
  3. Finally, you can also specify which host NUMA nodes will be assigned specifically to a certain VM. This can improve the host memory usage by the VM’s vCPU. For example, the following commands set testguest6 to use host NUMA nodes 3 to 5, and verify this was successful:

    # virsh numatune testguest6 --nodeset 3-5
    # virsh numatune testguest6

Additional resources

10.5.4. Sample vCPU performance tuning scenario

To obtain the best vCPU performance possible, Red Hat recommends using manual vcpupin, emulatorpin, and numatune settings together, for example like in the following scenario.

Starting scenario

  • Your host has the following hardware specifics:

    • 2 NUMA nodes
    • 3 CPU cores on each node
    • 2 threads on each core

    The output of virsh nodeinfo of such a machine would look similar to:

    # virsh nodeinfo
    CPU model:           x86_64
    CPU(s):              12
    CPU frequency:       3661 MHz
    CPU socket(s):       2
    Core(s) per socket:  3
    Thread(s) per core:  2
    NUMA cell(s):        2
    Memory size:         31248692 KiB
  • You intend to modify an existing VM to have 8 vCPUs, which means that it will not fit in a single NUMA node.

    Therefore, you should distribute 4 vCPUs on each NUMA node and make the vCPU topology resemble the host topology as closely as possible. This means that vCPUs that run as sibling threads of a given physical CPU should be pinned to host threads on the same core. For details, see the Solution below:

Solution

  1. Obtain the information on the host topology:

    # virsh capabilities

    The output should include a section that looks similar to the following:

    <topology>
      <cells num="2">
        <cell id="0">
          <memory unit="KiB">15624346</memory>
          <pages unit="KiB" size="4">3906086</pages>
          <pages unit="KiB" size="2048">0</pages>
          <pages unit="KiB" size="1048576">0</pages>
          <distances>
            <sibling id="0" value="10" />
            <sibling id="1" value="21" />
          </distances>
          <cpus num="6">
            <cpu id="0" socket_id="0" core_id="0" siblings="0,3" />
            <cpu id="1" socket_id="0" core_id="1" siblings="1,4" />
            <cpu id="2" socket_id="0" core_id="2" siblings="2,5" />
            <cpu id="3" socket_id="0" core_id="0" siblings="0,3" />
            <cpu id="4" socket_id="0" core_id="1" siblings="1,4" />
            <cpu id="5" socket_id="0" core_id="2" siblings="2,5" />
          </cpus>
        </cell>
        <cell id="1">
          <memory unit="KiB">15624346</memory>
          <pages unit="KiB" size="4">3906086</pages>
          <pages unit="KiB" size="2048">0</pages>
          <pages unit="KiB" size="1048576">0</pages>
          <distances>
            <sibling id="0" value="21" />
            <sibling id="1" value="10" />
          </distances>
          <cpus num="6">
            <cpu id="6" socket_id="1" core_id="3" siblings="6,9" />
            <cpu id="7" socket_id="1" core_id="4" siblings="7,10" />
            <cpu id="8" socket_id="1" core_id="5" siblings="8,11" />
            <cpu id="9" socket_id="1" core_id="3" siblings="6,9" />
            <cpu id="10" socket_id="1" core_id="4" siblings="7,10" />
            <cpu id="11" socket_id="1" core_id="5" siblings="8,11" />
          </cpus>
        </cell>
      </cells>
    </topology>
  2. Optional: Test the performance of the VM using the applicable tools and utilities.
  3. Set up and mount 1 GiB huge pages on the host:

    1. Add the following line to the host’s kernel command line:

      default_hugepagesz=1G hugepagesz=1G
    2. Create the /etc/systemd/system/hugetlb-gigantic-pages.service file with the following content:

      [Unit]
      Description=HugeTLB Gigantic Pages Reservation
      DefaultDependencies=no
      Before=dev-hugepages.mount
      ConditionPathExists=/sys/devices/system/node
      ConditionKernelCommandLine=hugepagesz=1G
      
      [Service]
      Type=oneshot
      RemainAfterExit=yes
      ExecStart=/etc/systemd/hugetlb-reserve-pages.sh
      
      [Install]
      WantedBy=sysinit.target
    3. Create the /etc/systemd/hugetlb-reserve-pages.sh file with the following content:

      #!/bin/sh
      
      nodes_path=/sys/devices/system/node/
      if [ ! -d $nodes_path ]; then
      	echo "ERROR: $nodes_path does not exist"
      	exit 1
      fi
      
      reserve_pages()
      {
      	echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
      }
      
      reserve_pages 4 node1
      reserve_pages 4 node2

      This reserves four 1GiB huge pages from node1 and four 1GiB huge pages from node2.

    4. Make the script created in the previous step executable:

      # chmod +x /etc/systemd/hugetlb-reserve-pages.sh
    5. Enable huge page reservation on boot:

      # systemctl enable hugetlb-gigantic-pages
  4. Use the virsh edit command to edit the XML configuration of the VM you wish to optimize, in this example super-VM:

    # virsh edit super-vm
  5. Adjust the XML configuration of the VM in the following way:

    1. Set the VM to use 8 static vCPUs. Use the <vcpu/> element to do this.
    2. Pin each of the vCPU threads to the corresponding host CPU threads that it mirrors in the topology. To do so, use the <vcpupin/> elements in the <cputune> section.

      Note that, as shown by the virsh capabilities utility above, host CPU threads are not ordered sequentially in their respective cores. In addition, the vCPU threads should be pinned to the highest available set of host cores on the same NUMA node. For a table illustration, see the Additional Resources section below.

      The XML configuration for steps a. and b. can look similar to:

      <cputune>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='4'/>
        <vcpupin vcpu='2' cpuset='2'/>
        <vcpupin vcpu='3' cpuset='5'/>
        <vcpupin vcpu='4' cpuset='7'/>
        <vcpupin vcpu='5' cpuset='10'/>
        <vcpupin vcpu='6' cpuset='8'/>
        <vcpupin vcpu='7' cpuset='11'/>
        <emulatorpin cpuset='6,9'/>
      </cputune>
    3. Set the VM to use 1 GiB huge pages:

      <memoryBacking>
        <hugepages>
          <page size='1' unit='GiB'/>
        </hugepages>
      </memoryBacking>
    4. Configure the VM’s NUMA nodes to use memory from the corresponding NUMA nodes on the host. To do so, use the <memnode/> elements in the <numatune/> section:

      <numatune>
        <memory mode="preferred" nodeset="1"/>
        <memnode cellid="0" mode="strict" nodeset="0"/>
        <memnode cellid="1" mode="strict" nodeset="1"/>
      </numatune>
    5. Ensure the CPU mode is set to host-passthrough, and that the CPU uses cache in passthrough mode:

      <cpu mode="host-passthrough">
        <topology sockets="2" cores="2" threads="2"/>
        <cache mode="passthrough"/>
  6. The resulting XML configuration of the VM should include a section similar to the following:

    [...]
      <memoryBacking>
        <hugepages>
          <page size='1' unit='GiB'/>
        </hugepages>
      </memoryBacking>
      <vcpu placement='static'>8</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='4'/>
        <vcpupin vcpu='2' cpuset='2'/>
        <vcpupin vcpu='3' cpuset='5'/>
        <vcpupin vcpu='4' cpuset='7'/>
        <vcpupin vcpu='5' cpuset='10'/>
        <vcpupin vcpu='6' cpuset='8'/>
        <vcpupin vcpu='7' cpuset='11'/>
        <emulatorpin cpuset='6,9'/>
      </cputune>
      <numatune>
        <memory mode="preferred" nodeset="1"/>
        <memnode cellid="0" mode="strict" nodeset="0"/>
        <memnode cellid="1" mode="strict" nodeset="1"/>
      </numatune>
      <cpu mode="host-passthrough">
        <topology sockets="2" cores="2" threads="2"/>
        <cache mode="passthrough"/>
        <numa>
          <cell id="0" cpus="0-3" memory="2" unit="GiB">
            <distances>
              <sibling id="0" value="10"/>
              <sibling id="1" value="21"/>
            </distances>
          </cell>
          <cell id="1" cpus="4-7" memory="2" unit="GiB">
            <distances>
              <sibling id="0" value="21"/>
              <sibling id="1" value="10"/>
            </distances>
          </cell>
        </numa>
      </cpu>
    </domain>
  7. Optional: Test the performance of the VM using the applicable tools and utilities to evaluate the impact of the VM’s optimization.

Additional resources

  • The following tables illustrate the connections between the vCPUs and the host CPUs they should be pinned to:

    Table 10.1. Host topology

    CPU threads

    0

    3

    1

    4

    2

    5

    6

    9

    7

    10

    8

    11

    Cores

    0

    1

    2

    3

    4

    5

    Sockets

    0

    1

    NUMA nodes

    0

    1

    Table 10.2. VM topology

    vCPU threads

    0

    1

    2

    3

    4

    5

    6

    7

    Cores

    0

    1

    2

    3

    Sockets

    0

    1

    NUMA nodes

    0

    1

    Table 10.3. Combined host and VM topology

    vCPU threads

     

    0

    1

    2

    3

     

    4

    5

    6

    7

    Host CPU threads

    0

    3

    1

    4

    2

    5

    6

    9

    7

    10

    8

    11

    Cores

    0

    1

    2

    3

    4

    5

    Sockets

    0

    1

    NUMA nodes

    0

    1

    In this scenario, there are 2 NUMA nodes and 8 vCPUs. Therefore, 4 vCPU threads should be pinned to each node.

    In addition, Red Hat recommends leaving at least a single CPU thread available on each node for host system operations.

    Because in this example, each NUMA node houses 3 cores, each with 2 host CPU threads, the set for node 0 translates as follows:

    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='4'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='5'/>

10.6. Optimizing virtual machine network performance

Due to the virtual nature of a VM’s network interface card (NIC), the VM loses a portion of its allocated host network bandwidth, which can reduce the overall workload efficiency of the VM. The following tips can minimize the negative impact of virtualization on the virtual NIC (vNIC) throughput.

Procedure

Use any of the following methods and observe if it has a beneficial effect on your VM network performance:

Enable the vhost_net module

On the host, ensure the vhost_net kernel feature is enabled:

# lsmod | grep vhost
vhost_net              32768  1
vhost                  53248  1 vhost_net
tap                    24576  1 vhost_net
tun                    57344  6 vhost_net

If the output of this command is blank, enable the vhost_net kernel module:

# modprobe vhost_net
Set up multi-queue virtio-net

To set up the multi-queue virtio-net feature for a VM, use the virsh edit command to edit to the XML configuration of the VM. In the XML, add the following to the <devices> section, and replace N with the number of vCPUs in the VM, up to 16:

<interface type='network'>
      <source network='default'/>
      <model type='virtio'/>
      <driver name='vhost' queues='N'/>
</interface>

If the VM is running, restart it for the changes to take effect.

Set up vhost zero-copy transmit

If using a network with large packet size, enable the vhost zero-copy transmit feature.

Note that this feature only improves the performance when transmitting large packets between a guest network and an external network. It does not affect performance for guest-to-guest and guest-to-host workloads. In addition, it is likely to have a negative impact on the performance of small packet workloads.

Also, enabling zero-copy transmit can cause head-of-line blocking of packets, which may create a potential security risk.

To enable vhost zero-copy transmit:

  1. On the host, disable the vhost-net kernel module:

    # modprobe -r vhost_net
  2. Re-enable the vhost-net module with the zero-copy parameter turned on:

    # modprobe vhost-net experimental_zcopytx=1
  3. Check whether zero-copy transmit was enabled successfully:

    # cat /sys/module/vhost_net/parameters/experimental_zcopytx
    1
Batching network packets

In Linux VM configurations with a long transmission path, batching packets before submitting them to the kernel may improve cache utilization. To set up packet batching, use the following command on the host, and replace tap0 with the name of the network interface that the VMs use:

# ethtool -C tap0 rx-frames 128
SR-IOV
If your host NIC supports SR-IOV, use SR-IOV device assignment for your vNICs. For more information, see Managing SR-IOV devices.

Additional resources

10.7. Virtual machine performance monitoring tools

To identify what consumes the most VM resources and which aspect of VM performance needs optimization, performance diagnostic tools, both general and VM-specific, can be used.

Default OS performance monitoring tools

For standard performance evaluation, you can use the utilities provided by default by your host and guest operating systems:

  • On your RHEL 8 host, as root, use the top utility or the system monitor application, and look for qemu and virt in the output. This shows how much host system resources your VMs are consuming.

    • If the monitoring tool displays that any of the qemu or virt processes consume a large portion of the host CPU or memory capacity, use the perf utility to investigate. For details, see below.
    • In addition, if a vhost_net thread process, named for example vhost_net-1234, is displayed as consuming an excessive amount of host CPU capacity, consider using virtual network optimization features, such as multi-queue virtio-net.
  • On the guest operating system, use performance utilities and applications available on the system to evaluate which processes consume the most system resources.

    • On Linux systems, you can use the top utility.
    • On Windows systems, you can use the Task Manager application.

perf kvm

You can use the perf utility to collect and analyze virtualization-specific statistics about the performance of your RHEL 8 host. To do so:

  1. On the host, install the perf package:

    # yum install perf
  2. Use the perf kvm stat command to display perf statistics for your virtualization host:

    • For real-time monitoring of your hypervisor, use the perf kvm stat live command.
    • To log the perf data of your hypervisor over a period of time, activate the logging using the perf kvm stat record command. After the command is canceled or interrupted, the data is saved in the perf.data.guest file, which can be analyzed using the perf kvm stat report command.
  3. Analyze the perf output for types of VM-EXIT events and their distribution. For example, the PAUSE_INSTRUCTION events should be infrequent, but in the following output, the high occurrence of this event suggests that the host CPUs are not handling the running vCPUs well. In such a scenario, consider shutting down some of your active VMs, removing vCPUs from these VMs, or tuning the performance of the vCPUs.

    # perf kvm stat report
    
    Analyze events for all VMs, all VCPUs:
    
    
                 VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
    
      EXTERNAL_INTERRUPT     365634    31.59%    18.04%      0.42us  58780.59us    204.08us ( +-   0.99% )
               MSR_WRITE     293428    25.35%     0.13%      0.59us  17873.02us      1.80us ( +-   4.63% )
        PREEMPTION_TIMER     276162    23.86%     0.23%      0.51us  21396.03us      3.38us ( +-   5.19% )
       PAUSE_INSTRUCTION     189375    16.36%    11.75%      0.72us  29655.25us    256.77us ( +-   0.70% )
                     HLT      20440     1.77%    69.83%      0.62us  79319.41us  14134.56us ( +-   0.79% )
                  VMCALL      12426     1.07%     0.03%      1.02us   5416.25us      8.77us ( +-   7.36% )
           EXCEPTION_NMI         27     0.00%     0.00%      0.69us      1.34us      0.98us ( +-   3.50% )
           EPT_MISCONFIG          5     0.00%     0.00%      5.15us     10.85us      7.88us ( +-  11.67% )
    
    Total Samples:1157497, Total events handled time:413728274.66us.

    Other event types that can signal problems in the output of perf kvm stat include:

For more information on using perf to monitor virtualization performance, see the perf-kvm man page.

numastat

To see the current NUMA configuration of your system, you can use the numastat utility, which is provided by installing the numactl package.

The following shows a host with 4 running VMs, each obtaining memory from multiple NUMA nodes. This is not optimal for vCPU performance, and warrants adjusting:

# numastat -c qemu-kvm

Per-node process memory usage (in MBs)
PID              Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
---------------  ------ ------ ------ ------ ------ ------ ------ ------ -----
51722 (qemu-kvm)     68     16    357   6936      2      3    147    598  8128
51747 (qemu-kvm)    245     11      5     18   5172   2532      1     92  8076
53736 (qemu-kvm)     62    432   1661    506   4851    136     22    445  8116
53773 (qemu-kvm)   1393      3      1      2     12      0      0   6702  8114
---------------  ------ ------ ------ ------ ------ ------ ------ ------ -----
Total              1769    463   2024   7462  10037   2672    169   7837 32434

In contrast, the following shows memory being provided to each VM by a single node, which is significantly more efficient.

# numastat -c qemu-kvm

Per-node process memory usage (in MBs)
PID              Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
---------------  ------ ------ ------ ------ ------ ------ ------ ------ -----
51747 (qemu-kvm)      0      0      7      0   8072      0      1      0  8080
53736 (qemu-kvm)      0      0      7      0      0      0   8113      0  8120
53773 (qemu-kvm)      0      0      7      0      0      0      1   8110  8118
59065 (qemu-kvm)      0      0   8050      0      0      0      0      0  8051
---------------  ------ ------ ------ ------ ------ ------ ------ ------ -----
Total                 0      0   8072      0   8072      0   8114   8110 32368